Approximating the Minimum Chain Completion problem (2009)
Tomás Feder, Heikki Mannila, Evimaria Terzi
A bipartite graph G = (U, V, E) is a chain graph [9] if there is a bijection π: {1,..., |U|} → U such that Γ (π (1)) ⊇ Γ (π (2)) ⊇... ⊇ Γ (π (|U|)), where Γ is a function that maps a...
Assessing data mining results via swap randomization (2009)
Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas
The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...
Query Significance in Databases via Randomizations (2009)
Ojala, Markus, Garriga, Gemma C., Gionis, Aristides, Mannila, Heikki
Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries....
Finding Subgroups having Several Descriptions: Algorithms for Redescription Mining (2009)
Arianna Gallo, Pauli Miettinen, Heikki Mannila
Given a 0-1 dataset, we consider the redescription mining task introduced by Ramakrishnan, Parida, and Zaki. The problem is to find subsets of the rows that can be (approximately) defined by at least...
A randomized approximation algorithm for computing bucket orders (2009)
Ukkonen, Antti, Puolamäki, Kai, Gionis, Aristides, Mannila, Heikki
We show that a simple randomized algorithm has an expected constant factor approximation guarantee for fitting bucket orders to a set of pairwise preferences.
What is the dimension of your binary data? (2008)
Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...
ABSTRACT Finding Partial Orders from Unordered 0-1 Data (2008)
Antti Ukkonen, Mikael Fortelius, Heikki Mannila
In applications such as paleontology and medical genetics the 0-1 data has an underlying unknown order (the ages of the fossil sites, the locations of markers in the genome). The order might be total...
Standing Out in a Crowd: Selecting Attributes for Maximum Visibility (2008)
Muhammed Miah, Gautam Das, Vagelis Hristidis, Heikki Mannila
Abstract — In recent years, there has been significant interest in development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in...
Finding links and initiators: a graph reconstruction problem (2008)
Mannila, Heikki, Terzi, Evimaria
Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i,j) of M indicates that signal j has been observed (or not...
Research Constrained hidden Markov models for population-based haplotyping (2008)
Bmc Bioinformatics, Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila
This is an open access article distributed under the terms of the Creative Commons Attribution License
Determining significance of pairwise co-occurrences of events in bursty sequences (2008)
Haiminen, Niina, Mannila, Heikki, Terzi, Evimaria
Abstract Background Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain...
Macadamia: Master's Programme in Machine Learning and Data Mining (2008)
Raiko, Tapani, Puolamäki, Kai, Karhunen, Juha, Hollmen, Jaakko, Honkela, Anttí, Kaski, Samuel, ...
Macadamia is a two-year master’s programme for machine learning and data mining given in the Department of Information and Computer Science at Helsinki University of Technology. This paper...
15.2 Discovering orderings (2008)
Heikki Mannila, Jaakko Hollmén, Kai Puolamäki, Ella Bingham, Johan Himberg, Robert Gwadera, ...
15.1 Data mining at the Pattern Discovery group The Pattern Discovery group in Otaniemi concentrates on combinations of pattern discovery and probabilistic modeling in data mining: pattern discovery...
ABSTRACT Nestedness and Segmented Nestedness (2008)
Consider each row of a 0-1 dataset as the subset of the columns for which the row has an 1. Then a dataset is nested, if for all pairs of rows one row is either a superset or subset of the other. The...
From Data to Knowledge Research Unit (2008)
Heikki Mannila, Jaakko Hollmén, Ella Bingham, Johan Himberg, Anne Patrikainen, Salla Ruosaari, ...
17.1 Data mining The work concentrates on combinations of pattern discovery and probabilistic modeling in data mining: pattern discovery aims at finding local phenomena, while modeling often aims at...
47. An Efficient Method for Association Mapping in Phase-unknown Genotype Data (2008)
Petteri Sevon, Päivi Onkamo, Hannu Toivonen, Heikki Mannila
In genetic association analysis a researcher tries to find shared alleles or haplotypes in a group of patients, which would be much rarer in controls. The analysis is based on availability of...
ÓÖ��Ö�Ò � Ö�Ð�Ø�ÓÒ×��Ô × ��ØÛ��Ò Ø� � �Ú�ÒØ × �Ò � ÓÐÐ � Ø�ÓÒ Ó� (2008)
Ó � ��«�Ö�ÒØ �Ú�ÒØ ØÝÔ� × Ì�Ù × Ø� � Ñ�Ø�Ó� × × �Ð� × ØÓ ��Ò �Ð � Ð�Ö� � ��Ø � ×�Ø × �Ò � �Ò � �...
Research Constrained hidden Markov models for population-based haplotyping (2008)
Bmc Bioinformatics, Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila
This is an open access article distributed under the terms of the Creative Commons Attribution License
Constrained Hidden Markov Models for Population-based Haplotyping Extended abstract (2008)
Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila
Analysis of genetic variation in human populations is critical to the understanding of the genetic basis for complex diseases. Although genomes of several species have been sequenced, it is still too...
Optimal Segmentation Using Tree Models (2008)
Robert Gwadera, Aristides Gionis, Heikki Mannila
Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunication. Many real-life sequences have a strong segmental structure, with segments...
What is the dimension of your binary data? (2008)
Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...
Gene Mapping by Haplotype Pattern Mining Hannu T.T. Toivonen1 (2008)
Kari Vasko, Vesa Ollikainen, Petteri Sevon, Heikki Mannila, Juha Kere
Abstract Genetic markers are being increasingly utilized in genemapping. Discovery of associations between markers and patient phenotypes-- such as a disease status-- enablesthe identification of...
ABSTRACT Finding Partial Orders from Unordered 0-1 Data (2008)
Antti Ukkonen, Mikael Fortelius, Heikki Mannila
In applications such as paleontology and medical genetics the 0-1 data has an underlying unknown order (the ages of the fossil sites, the locations of markers in the genome). The order might be total...
gmopuloQalmaden.ibm.com (2008)
Dimitrios Gunopulos, Heikki Mannila
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...
Random projection in dimensionality reduction:Applications to image and text data
Constrained Hidden Markov Models for Population-based Haplotyping Extended abstract (2008)
Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila
Analysis of genetic variation in human populations is critical to the understanding of the genetic basis for complex diseases. Although genomes of several species have been sequenced, it is still too...
Finding Trees From Unordered 0--1 Data (2008)
Hannes Heikinheimo, Heikki Mannila
Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0--1 data and consider the problem of...
Feature Selection in Taxonomies with Applications to Paleontology. (2008)
Garriga, Gemma, Ukkonen, Antti, Mannila, Heikki
Taxonomies for a set of features occur in many real-world domains. An example is provided by paleontology, where the task is to determine the age of a fossil site on the basis of the taxa that have...
Banded structure in binary matrices (2008)
Garriga, Gemma, Junttila, Esa, Mannila, Heikki
A 0--1 matrix has a banded structure if both rows and columns can be permuted so that the non-zero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its...
Macadamia: Master's programme in machine learning and data mining (2008)
Raiko, Tapani, Puolamäki, Kai, Karhunen, Juha, Hollmen, Jaakko, Honkela, Antti, Mannila, Heikki, ...
Macadamia is a two-year master’s programme for machine learning and data mining given in the Department of Information and Computer Science at Helsinki University of Technology. This paper...
A Perspective on Databases and Data Mining (2007)
Data Mining, M. Holsheimer, M. Kersten, H. Mannila, H. Toivonen, Issn -x, ...
and their applications. SMC is sponsored by the Netherlands Organization for Scientific Research (NWO). CWI is a member of
Similarity of Event Sequences (Extended Abstract) (2007)
Heikki Mannila, Pirjo Ronkainen
Sequences of events are an important form of data that occurs in many application domains, such as telecommunications, biostatistics, user interface design, etc. We present a simple model for...
User Interactivity in Very Large Scale Data Mining (2007)
Stefan Wrobel, Dietrich Wettschereck, Inkeri Verkamo, Arno Siebes, Heikki Mannila, Fred Kwakkel, ...
Knowledge discovery is widely considered to be an interactive and iterative process. The data mining phase of KDD is, on the other hand, often assumed to be an indivisible step. We argue that user...
Learning, mining, or modeling? - A case study from paleoecology (2007)
Heikki Mannila, Hannu Toivonen, Atte Korhola, Heikki Olander
Exploratory data mining, machine learning, and statistical modeling all have a role in discovery science. We describe a paleoecological reconstruction problem where Bayesian methods are useful and...
An Algorithm for Learning Hierarchical Classifiers (2007)
Jyrki Kivinen, Heikki Mannila, Esko Ukkonen, Jaak Vilo
1 Introduction. In [4] an Occam algorithm ([1]) was introduced for PAC learning certain kind of decision lists from classified examples. Such decision lists, or hierarchical rules as we call them,...
Pivi Onkamo, Kari Vasko, Petteri Sevon, Heikki Mannila, Juha Kere
Center for Scientific Computing;
Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila
Abstract. One of the most challenging problems in data manipulation in the future is to be able to efficiently handle very large databases but also multiple induced properties or generalizations in...
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...
S. Zilberstein, David Heckerman, Heikki Mannila, Daryl Pregibon, Ramasamy Uthurusamy
[18] Padhraic Smyth and David Wolpert. Anytime exploratory data analysis for massive data sets. In
Predictive Profiles for Transaction Data using Finite Mixture (2007)
Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila
Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...
Data Mining and Knowledge Discovery KL2159-03/5253512 October 24, 2003 19:30 (2007)
Uncorrected Data Mining, Heikki Mannila
It has been claimed that the discovery of association rules is well suited for applications of market basket analysis to reveal regularities in the purchase behaviour of customers. However today, one...
Hinneburg, Alexander, Mannila, Heikki, Kaislaniemi, Samuli, Nevalainen, Terttu, Raumolin-Brunberg, Helena
Estimating the relative frequencies of linguistic features is a fundamental task in linguistic computation. As the amount of text or speech that is available from a given user of the language...
Comparing segmentations by applying randomization techniques (2007)
Haiminen, Niina, Mannila, Heikki, Terzi, Evimaria
Abstract Background There exist many segmentation techniques for genomic sequences, and the segmentations can also be based on many different biological features. We show how to evaluate and compare...
Constrained hidden Markov models for population-based haplotyping (2007)
Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki
Abstract Background Haplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important...
Constrained hidden Markov models for population-based haplotyping (2007)
Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki
Background. Haplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important intermediate...
Finding Low-Entropy Sets and Trees from Binary Data (2007)
Heikinheimo, Hannes, Hinkkanen, Eino, Mannila, Heikki, Mielikäinen, Taneli, Seppänen, Jouni
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value...
Assessing Data Mining Results via Swap Randomization (2007)
Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis
The problem of assessing the significance of data mining results on high-dimensional 0–1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and...
Optimal Segmentation Using Tree Models (2007)
Robert Gwadera, Aristides Gionis, Heikki Mannila
Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunications. Many real-life sequences have a strong segmental structure, with...
Niina Haiminen, Heikki Mannila, Evimaria Terzi, Open Access
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
Analysis of Linux Evolution Using Aligned Source Code Segments (2006)
Rasinen, Antti, Hollmen, Jaakko, Mannila, Heikki
The Linux operating system embodies a development history of 15 years and community effort of hundreds of voluntary developers. We examine the structure and evolution of the Linux kernel by...
The Discrete Basis Problem (2006)
Miettinen, Pauli, Mielikäinen, Taneli, Gionis, Aristides, Das, Gautam, Mannila, Heikki
Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the...
Assessing data mining results via swap randomization (2006)
Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis
The problem of assessing the significance of data mining results on high-dimensional 0--1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...
Algorithms for Discovering Bucket Orders from Data (2006)
Gionis, Aristides, Mannila, Heikki, Puolamäki, Kai, Ukkonen, Antti
Ordering and ranking items of different types are important tasks in various applications, such as query processing and scientific data mining. A total order for the items can be misleading, since...
Constrained Hidden Markov Models for Population-based Haplotyping (Extended Abstract) (2006)
Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki
We propose a simple haplotype reconstruction method that is based on iterative refinement and regularization of constrained Hidden Markov Models. We show that it gives results comparable to the...
Spectral ordering and biochronology of European fossil mammals (2006)
Mikael Fortelius, Aristides Gionis, Jukka Jernvall, Heikki Mannila
Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods (2006)
Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...
Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods (2006)
Kai Puolamäki, Mikael Fortelius, Heikki Mannila
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...
What is the dimension of your binary data? (2006)
Tatti, Nikolaj, Mielikäinen, Taneli, Gionis, Aristides, Mannila, Heikki
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...
Segmentation and dimensionality reduction (2006)
Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi
Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...
Assessing data mining results via swap randomization (2006)
Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas
The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...
The discrete basis problem (2006)
Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila
Abstract. Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another...
Segmentation and dimensionality reduction (2006)
Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi
Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...
The discrete basis problem (2006)
Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila
Abstract—Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data and another describing...
Constrained hidden Markov models for population-based haplotyping (Extended Abstract) (2006)
Mielikainen, Taneli, Eronen, Lauri, Toivonen, H, Mannila, Heikki
Mining chains of relations (2005)
Afrati, Foto, Das, Gautam, Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis
Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...
Title in English: Methods for Comparing Subspace Clusterings (2005)
Tekijä Anne Patrikainen, Työn Prof, Heikki Mannila
Aliavaruusklusterointimenetelmät etsivät samankaltaisten datapisteiden rykelmiä data-avaruuden eri aliavaruuksista. Nämä menetelmät yhdistelevät ja yleistävät klusterointia ja...
Parameter-free spatial data mining using MDL (2005)
Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...
Parameter-free spatial data mining using MDL (2005)
Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...
Boolean Formulas and Frequent Sets (2005)
Jouni Seppnen And, Heikki Mannila
We consider the problem of how one can estimate the support of Boolean queries given a collection of frequent itemsets. We describe an algorithm that truncates the inclusion-exclusion sum to include...
A Hidden Markov Technique for Haplotype Reconstruction (2005)
Pasi Rastas, Mikko Koivisto, Heikki Mannila, Esko Ukkonen
We give a new algorithm for the genotype phasing problem.
Mining Chains of Relations (2005)
Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas
Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...
Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in...
E.: A Hidden Markov Technique for Haplotype Reconstruction (2005)
Pasi Rastas, Mikko Koivisto, Heikki Mannila, Esko Ukkonen
Abstract. We give a new algorithm for the genotype phasing problem. Our solution is based on a hidden Markov model for haplotypes. The model has a uniform structure, unlike most solutions proposed so...
Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with them. This problem, clustering aggregation, appears naturally in various...
Segmentation algorithms for time series and sequence data (2005)
Aristides Gionis, Heikki Mannila
data
Mining Chains of Relations (2005)
Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas
Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...
Data Mining: The Next Generation (2005)
Ramakrishnan, Raghu, Agrawal, Rakesh, Freytag, Johann-Christoph, Bollinger, Toni, Clifton, Christopher W., Dzeroski, Saso, ...
Data Mining (DM) has enjoyed great popularity in recent years, with advances in both research and commercialization. The first generation of DM research and development has yielded several...
Relational link-based ranking (2004)
Mannila, Heikki, Terzi, Evimaria
Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...
Fast and Robust General Purpose Clustering Algorithms (2004)
Estivill-Castro, Vladimir, Yang, Jianhua, Heikki Mannila
General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative...
Relational link-based ranking (2004)
Mannila, Heikki, Terzi, Evimaria
Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...
Relational link-based ranking (2004)
GEERTS, Floris, Mannila, Heikki, Terzi, Evimaria
Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...
Clustered segmentations (2004)
Aristides Gionis, Heikki Mannila, Evimaria Terzi
The problem of sequence and time-series segmentation has been discussed widely and it has been applied successfully in a variety of areas, including computational genomics, data analysis for...
Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that...
Relational link-based ranking (2004)
Floris Geerts, Heikki Mannila, Evimaria Terzi
Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...
Geometric and Combinatorial Tiles in 0-1 Data (2004)
Aristides Gionis, Heikki Mannila, Jouni K. Seppänen
In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0-1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a...
Relational Link-Based Ranking (2004)
Floris Geerts, Heikki Mannila, Evimaria Terzi
Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...
Hidden Markov Modelling Techniques for Haplotype Analysis (2004)
Mikko Koivisto, Teemu Kivioja, Heikki Mannila, Pasi Rastas, Esko Ukkonen
A hidden Markov model is introduced for descriptive modelling the mosaic--like structures of haplotypes, due to iterated recombinations within a population. Methods using the minimum description...
Subspace clustering of high-dimensional binary data - a probabilistic approach (2004)
Anne Patrikainen, Heikki Mannila
Abstract Subspace clustering refers to clustering where only someof the attributes are relevant for a given cluster. We describe a probabilistic model for subspace clustering,together with an...
A simple algorithm for topic identification in 0–1 data (2003)
Abstract. Topics in 0–1 datasets are sets of variables whose occurrences are positively connected together. Earlier, we described a simple generative topic model. In this paper we show that, given...
Rule discovery and probabilistic modeling for onomastic data (2003)
Antti Leino, Heikki Mannila, Ritva Liisa Pitkänen
Abstract. The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We...
Finding recurrent sources in sequences (2003)
Aristides Gionis, Heikki Mannila
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or...
The pattern ordering problem (2003)
Taneli Mielikäinen, Heikki Mannila
Abstract. Many pattern discovery methods provide fast tools for finding the frequently occurring patterns in large data sets. Such pattern collections can also be used to approximate the underlying...
A Simple Algorithm for Topic Identification in 0-1 Data (2003)
Jouni K. Seppänen, Ella Bingham, Heikki Mannila
Topics in 01 datasets are sets of variables whose occurrences are positively connected together. Earlier, we described a simple generative topic model. In this paper we show that, given data...
Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data. (2003)
Jaakko Hollmén, Jouni K. Seppänen, Heikki Mannila
We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures...
Discovering All Most Specific Sentences (2003)
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, Ram Sewak Sharma
Data mining can be viewed, in many cases, as the task of computing a representation of a theory...
• Local patterns: episodes, sequential patterns, etc. • Clustering (2003)
Heikki Mannila, Aris Gionis, Marko Salmenkivi, Teija Kujala
• Possibly high-d alphabet • Time series (continuous [high-d] values) Basic questions on sequences • Prediction: what is the next event in the sequence? • Sequential supervised learning...
A Theory of Inductive Query Answering (2002)
De Raedt,Luc, Jaeger,Manfred, Lee,Sau Dan, Mannila,Heikki
We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive que ries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....
A Theory of Inductive Query Answering (2002)
De Raedt, Luc, Jaeger, Manfred, Lee, Sau Dan, Mannila, Heikki
We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive que ries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...
intensity models and reversible jump MCMC (2002)
Marko Salmenkivi, Juha Kere, Heikki Mannila
The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points...
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...
A Theory of Inductive Query Answering (2002)
Luc De Raedt, Manfred Jaeger, Sau Dan Lee, Heikki Mannila, Inst Fur Informatik, Georges Koehler Allee
We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive queries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....
A Theory of Inductive Query Answering (2002)
Luc De Raedt, Manfred Jaeger, Sau Dan Lee, Heikki Mannila, Inst Fur Informatik, Georges Koehler Allee
We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive queries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....
Genome segmentation using piecewise constant intensity models and reversible jump MCMC (2002)
Salmenkivi, Marko, Kere, Juha, Mannila, Heikki
The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points...
Principles of Data Mining / D. Hand, H. Mannila, P. Smyth ; pról. de Thomas Dietterich. (2001)
Hand, David, Mannila, Heikki, Smyth, Padhraic
Incluye bibliografía
Extracting the context of a mobile device user (2001)
Mäntyjärvi, Jani, Himberg, Johan, Korpipää, Panu, Mannila, Heikki
8th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems, 445 - 450
Time Series Segmentation for Context Recognition in Mobile Devices (2001)
Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmaki
Recognizing the context of use is important in making mobile devices as simple to use as possible. Finding out what the user's situation is can help the device and underlying service in...
Random projection in dimensionality reduction: applications to image and text data (2001)
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results...
Igor V. Cadez, Padhraic Smyth, Heikki Mannila
Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual...
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2001)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
We investigate the problem of generating fast approximate answers to queries posed to large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...
Predictive Profiles for Transaction Data using Finite Mixture (2001)
Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila
Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...
Decomposition of Event Sequences into Independent Components (2001)
Heikki Mannila, Dmitry Rusakov
Monitoring a large telecommunication network results in an extensive log of alarms of different types that occurred in the system. Similar log files are also produced in mobile commerce, in web...
Decomposition of Event Sequences into Independent Components (2001)
Heikki Mannila, Dmitry Rusakov
this paper we describe a theoretical framework and practical methods for finding event sequence decompositions. These methods use the probabilistic modeling of the event generating process. The...
Decomposition of event sequences into independent components (2001)
Heikki Mannila, Dmitry Rusakov
Many real-world processes result in an extensive logs of sequences of events, i.e., events coupled with time of occurrence. Examples of such process logs include alarms produced by a large...
Gyllenberg, Helge G., Gyllenberg, Mats, Koski, Timo, Lund, Tatu, Mannila, Heikki, Meek, Christopher
http://www.tucs.fi/Publications/techreports/TR338.php
Data mining applied to linkage disequilibrium mapping (2000)
Päivi Onkamo, Kari Vasko, Vesa Ollikainen, Petteri Sevon, Heikki Mannila, ...
We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...
Rule Discovery in Telecommunication Alarm Data (1999)
Mika Klemettinen, Heikki Mannila, Hannu Toivonen
Fault management is an important but difficult area of telecommunication...
Specifying and simulating complex models using Bassist (1999)
Mannila, Heikki, Toivonen, Hannu Tt., Salmenkivi, Marko, Laakso, Karri-Pekka
Querying inductive databases: A case study on the MINE RULE operator (1998)
Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila
Abstract. Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern,...
Time-Series Similarity Problems and Well-Separated Geometric Sets (1998)
Béla Bollobas, Gautam Das, Dimitrios Gunopulos, Heikki Mannila
Given a pair of nonidentical complex objects, defining (and determining) how similar they are to each other is a nontrivial problem. In data mining applications, one frequently needs to determine the...
Rule Discovery From Time Series (1998)
Gautam Das, King-ip Lin, Heikki Mannila, Gopal Renganathan, Padhraic Smyth
We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such...
Modeling KDD Processes within the Inductive Database Framework (1998)
Jean-François Boulicaut, Mika Klemettinen, H. Mannila, Heikki Mannila
. One of the most challenging problems in data manipulation in the future is to be able to efficiently handle very large databases but also multiple induced properties or generalizations in that...
Similarity of Event Sequences (1998)
Heikki Mannila, Pirjo Ronkainen
Sequences of events are an important form of data that occurs in many application domains, such as telecommunications, biostatistics, user interface design, etc. We present a simple model for...
Bassist - a tool for MCMC simulation of statistical models (1998)
Hannu Toivonen, Heikki Mannila, Marko Salmenkivi, Karri-pekka Laakso
In this paper we give a short overview of MCMC simulation and the Bassist system, and describe some of the applications in which Bassist has been used.
Reasoning with Examples: Propositional Formulae and Database Dependencies (1998)
Roni Khardon, Heikki Mannila, Dan Roth
For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...
Mervi Eerola, Heikki Mannila, Marko Salmenkivi
this paper we describe the use of the BASSIST system when modelling the occurrences of middle ear infection (acute otitis media, AOM). Previous modelling on AOM by nonparametric Bayesian methods has...
Querying inductive databases: A case study on the MINE RULE operator (1998)
Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila
Abstract. Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern,...
Similarity of Attributes by External Probes (1997)
Gautam Das, Heikki Mannila, Pirjo Ronkainen
In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an...
Similarity of Attributes by External Probes (1997)
Gautam Das, Gautam Das, Heikki Mannila, Heikki Mannila, Pirjo Ronkainen, Pirjo Ronkainen
In data mining, similarity between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an important...
Time-Series Similarity Problems and Well-Separated Geometric Sets (1997)
Ela Bollob'as, Gautam Das, Dimitrios Gunopulos, Heikki Mannila
Given a pair of nonidentical complex objects, defining (and determining) how similar they are to each other is a nontrivial problem. In data mining applications, one frequently needs to determine the...
Data mining, hypergraph transversals, and machine learning (1997)
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...
Distance Measures for Point Sets and Their Computation (1997)
We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational...
Data mining, Machine Learning (1997)
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...
Levelwise Search and Borders of Theories in Knowledge Discovery (1997)
Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all...
Discovery of Frequent Episodes in Event Sequences (1997)
Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen, A. Inkeri Verkamo, A. Inkeri Verkamo
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. We consider the problem of discovering frequently occurring episodes in such...
Data mining, Hypergraph Transversals, and Machine Learning (1997)
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...
Discovery of Frequent Episodes in Event Sequences (1997)
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a...
Multiple uses of frequent sets and condensed representations (Extended Abstract) (1996)
Heikki Mannila, Hannu Toivonen
In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used...
On an algorithm for finding all interesting sentences (Extended Abstract) (1996)
Heikki Mannila, Mpi Informatik, Im Stadtwaldt, Hannu Toivonen
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a...
Attribute-Oriented Induction and Conceptual Clustering (1996)
Oskari Heinonen, Oskari Heinonen, Heikki Mannila, Heikki Mannila
Attribute-oriented induction is a method for knowledge discovery in databases that has recently been described and widely applied by Han et al. In this note we point out the close connection between...
Discovering Generalized Episodes Using Minimal Occurrences (1996)
Heikki Mannila, Hannu Toivonen
Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible...
Constructing tailored SGML documents (1996)
Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, Greger Lindén, ...
this paper we analyse the assembly process and some examples of documents that contain supplementary information for intelligent document assembly. 2 Intelligent document assembly
Rule Discovery in Alarm Databases (1996)
Kimmo Hätönen, Kimmo Hatonen, Mika Klemettinen, Mika Klemettinen, Heikki Mannila, Heikki Mannila, ...
Telecommunication networks produce large amounts of alarm information daily. This data contains potentially valuable knowledge about the network. We present a methodology for the analysis of large...
Intelligent Assembly of Structured Documents (1996)
Helena Ahonen, Helena Ahonen, Barbara Heikkinen, Barbara Heikkinen, Pekka Kilpeläinen, Oskari Heinonen, ...
An intelligent document contains information about its structure, its contents and its environment. This information supports intelligent document assembly, i.e., the effective reuse of existing...
Data Mining as Selective Theory Extraction in Probabilistic Logic (1996)
Manfred Jaeger, Heikki Mannila, Emil Weydert, Mpi Informatik, Im Stadtwald
ing from this specific example we are led to the following formulation of data mining problems in general: given a database r, a language L for expressing statements about the data, and a criterion...
Mika Klemettinen, Mika Klemettinen, Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen
We introduce a methodology for knowledge discovery in databases (KDD) where one first discovers large collections of patterns at once, and then performs interactive retrievals from the collection of...
TASA: Telecommunication Alarm Sequence Analyzer or: How to enjoy faults in your network (1996)
Kimmo Hätönen, Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen
Today's large and complex telecommunication networks produce large amounts of alarms daily. The sequence of alarms contains valuable knowledge about the behavior of the network, but much of the...
Discoveringgeneralized Episodes Using Minimal Occurrences (1996)
Heikki Mannila Andhannutoivonen, Heikki Mannila
Sequences of events are an important special form of datathat arises in several contexts, including telecommunications, user interface studies, and epidemiology. Wepresentageneral andflexible...
Mannila: Recognizing renamable generalized propositional Horn formulas is NP-complete (1995)
Yamasaki and Doshita have dened an extension of the class of propositional Horn formulas; later, Gallo and Scutella generalized this class to a hierarchy 0 1 : : : k : ::, where 0 is the set of Horn...
A Perspective on Databases and Data Mining (1995)
Marcel Holsheimer, Martin Kersten, Heikki Mannila, Hannu Toivonen
We discuss the use of database methods for data mining. Recently impressive results have been achieved for some data mining problems using highly specialized and clever data structures. We study how...
Discovering frequent episodes in sequences (Extended Abstract) (1995)
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. In this paper we consider the problem of recognizing frequent episodes in such...
Reasoning with Examples: Propositional Formulae and Database Dependencies (1995)
Roni Khardon, Roni Khardon, Heikki Mannila, Heikki Mannila, Dan Roth, Dan Roth
For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...
Mannila: Recognizing renamable generalized propositional Horn formulas is NP-complete (1995)
Thomas Eiter, Pekka Kilpeläinen, Heikki Mannila
Yamasaki and Doshita have defined an extension of the class of propositional Horn formulas; later, Gallo and Scutellà generalized this class to a hierarchy Γ0 ⊆ Γ1 ⊆... ⊆ Γk ⊆..., where...
Adding Disjunction to Datalog (Extended Abstract) (1994)
Thomas Eiter, Georg Gottlob, Heikki Mannila
) Thomas Eiter Georg Gottlob Heikki Mannila Christian Doppler Laboratory for Expert Systems, Information Systems Department Technical University of Vienna, Paniglgasse 16, A-1040 Wien, Austria...
Computing Discrete Fréchet Distance (1994)
Thomas Eiter, Thomas Eiter, Heikki Mannila, Heikki Mannila
The Frechet distance between two curves in a metric space is a measure of the similarity between the curves. We present a discrete variation of this measure.
Efficient Algorithms for Discovering Association Rules (1994)
Heikki Mannila, Hannu Toivonen, Inkeri Verkamo
Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Agrawal, Imielinski,...
Generating grammars for SGML tagged texts lacking DTD (1994)
Helena Ahonen, Heikki Mannila, Erja Nikunen
We describe a technique for forming a context free grammar for a document that has some kind of tagging --- structural or typographical --- but no concise description of the structure is available....
Finding Interesting Rules from Large Sets of Discovered Association Rules (1994)
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo
Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also...
Improved Methods for Finding Association Rules (1994)
Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen, A. Inkeri Verkamo, A. Inkeri Verkamo
Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Agrawal, Imielinski,...
The Power of Sampling in Knowledge Discovery (1993)
Jyrki Kivinen, Jyrki Kivinen, Heikki Mannila, Heikki Mannila
We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M . We define two different...
Learning Hierarchical Rule Sets (1993)
Jyrki Kivinen, Heikki Mannila, Esko Ukkonen
We present an algorithm for learning sets of rules that are organized into up to k levels. Each level can contain an arbitrary number of rules "if c then l" where l is the class associated...
We investigate the well-known anomalous differences in the approximability properties of NP-complete optimization problems. We define a notion of polynomial time reduction between optimization...
On approximation preserving reductions: Complete problems and robust measures (1987)
We investigate the well-known anomalous differences in the approximability properties of NP-complete optimization problems. We define a notion of polynomial time reduction between optimization...
Design-By-Example: A Design Tool for Relational Databases (1985)
Bitton, Dina, Mannila, Heikki, Raiha, Kari-Jouko
In recent years, research in relational design theory and in query optimization has established a firm ground for designing well-structured logical and physical database schemes. However, the design...
Design-By-Example: A Design Tool for Relational Databases (1985)
Bitton, Dina, Mannila, Heikki, Raiha, Kari-Jouko
In recent years, research in relational design theory and in query optimization has established a firm ground for designing well-structured logical and physical database schemes. However, the design...
Design by Example: An Application of Armstrong Relations (1985)
Mannila, Heikki, Raiha, Kari-Jouko
Example relations, and especially Armstrong relations, can be used as user-friendly representations of dependency sets. In this paper we analyze the use of Armstrong relations in database design with...
Design by Example: An Application of Armstrong Relations (1985)
Mannila, Heikki, Raiha, Kari-Jouko
Example relations, and especially Armstrong relations, can be used as user-friendly representations of dependency sets. In this paper we analyze the use of Armstrong relations in database design with...
Data Mining Applied to Linkage Disequilibrium Mapping
Toivonen, Hannu T. T., Onkamo, Päivi, Vasko, Kari, Ollikainen, Vesa, Sevon, Petteri, Mannila, Heikki, ...
We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...
Hovatta, Iiris, Varilo, Teppo, Suvisaari, Jaana, Terwilliger, Joseph D., Ollikainen, Vesa, Arajärvi, Ritva, ...
Schizophrenia is a severe mental disorder affecting ∼1% of the world's population. Here, we report the results from a three-stage genomewide screen performed in a study sample from an internal...
Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods
Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...
Data Mining Applied to Linkage Disequilibrium Mapping
Toivonen, Hannu T. T., Onkamo, Päivi, Vasko, Kari, Ollikainen, Vesa, Sevon, Petteri, Mannila, Heikki, ...
We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...
Hovatta, Iiris, Varilo, Teppo, Suvisaari, Jaana, Terwilliger, Joseph D., Ollikainen, Vesa, Arajärvi, Ritva, ...
Schizophrenia is a severe mental disorder affecting ∼1% of the world's population. Here, we report the results from a three-stage genomewide screen performed in a study sample from an internal...
Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods
Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...
Constrained hidden Markov models for population-based haplotyping
Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki
Higher origination and extinction rates in larger mammals
Liow, Lee Hsiang, Fortelius, Mikael, Bingham, Ella, Lintulaakso, Kari, Mannila, Heikki, Flynn, Larry, ...
Do large mammals evolve faster than small mammals or vice versa? Because the answer to this question contributes to our understanding of how life-history affects long-term and large-scale...
Reply to Vilar et al.: Sleep or hide, better for survival anytime
Liow, Lee Hsiang, Fortelius, Mikael, Bingham, Ella, Lintulaakso, Kari, Mannila, Heikki, Flynn, Larry, ...
Reasoning with Examples: Propositional Formulae and Database Dependencies
Roni Khardon, Heikki Mannila, Dan Roth
For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...
Approximating a Collection of Frequent Sets
Foto Afrati National, Foto Afrati, Aristides Gionis, Heikki Mannila
One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is...
Complexity control in a mixture model by the Hardy-Weinberg equilibrium
Bingham, Ella, Mannila, Heikki
A method of complexity control in multinomial mixture modeling of multiple-marker genotype data, imposing the Hardy-Weinberg equilibrium (HWE) between the genotype values, is studied. This is a very...
David J. Hand, Heikki Mannila, Padhraic Smyth
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically,...