Heikki Mannila

Approximating the Minimum Chain Completion problem (2009)

Tomás Feder, Heikki Mannila, Evimaria Terzi

A bipartite graph G = (U, V, E) is a chain graph [9] if there is a bijection π: {1,..., |U|} → U such that Γ (π (1)) ⊇ Γ (π (2)) ⊇... ⊇ Γ (π (|U|)), where Γ is a function that maps a...

Assessing data mining results via swap randomization (2009)

Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

Query Significance in Databases via Randomizations (2009)

Ojala, Markus, Garriga, Gemma C., Gionis, Aristides, Mannila, Heikki

Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries....

Finding Subgroups having Several Descriptions: Algorithms for Redescription Mining (2009)

Arianna Gallo, Pauli Miettinen, Heikki Mannila

Given a 0-1 dataset, we consider the redescription mining task introduced by Ramakrishnan, Parida, and Zaki. The problem is to find subsets of the rows that can be (approximately) defined by at least...

A randomized approximation algorithm for computing bucket orders (2009)

Ukkonen, Antti, Puolamäki, Kai, Gionis, Aristides, Mannila, Heikki

We show that a simple randomized algorithm has an expected constant factor approximation guarantee for fitting bucket orders to a set of pairwise preferences.

What is the dimension of your binary data? (2008)

Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

ABSTRACT Finding Partial Orders from Unordered 0-1 Data (2008)

Antti Ukkonen, Mikael Fortelius, Heikki Mannila

In applications such as paleontology and medical genetics the 0-1 data has an underlying unknown order (the ages of the fossil sites, the locations of markers in the genome). The order might be total...

Standing Out in a Crowd: Selecting Attributes for Maximum Visibility (2008)

Muhammed Miah, Gautam Das, Vagelis Hristidis, Heikki Mannila

Abstract — In recent years, there has been significant interest in development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in...

Finding links and initiators: a graph reconstruction problem (2008)

Mannila, Heikki, Terzi, Evimaria

Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i,j) of M indicates that signal j has been observed (or not...

Research Constrained hidden Markov models for population-based haplotyping (2008)

Bmc Bioinformatics, Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila

This is an open access article distributed under the terms of the Creative Commons Attribution License

Determining significance of pairwise co-occurrences of events in bursty sequences (2008)

Haiminen, Niina, Mannila, Heikki, Terzi, Evimaria

Abstract Background Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain...

Macadamia: Master's Programme in Machine Learning and Data Mining (2008)

Raiko, Tapani, Puolamäki, Kai, Karhunen, Juha, Hollmen, Jaakko, Honkela, Anttí, Kaski, Samuel, ...

Macadamia is a two-year master’s programme for machine learning and data mining given in the Department of Information and Computer Science at Helsinki University of Technology. This paper...

15.2 Discovering orderings (2008)

Heikki Mannila, Jaakko Hollmén, Kai Puolamäki, Ella Bingham, Johan Himberg, Robert Gwadera, ...

15.1 Data mining at the Pattern Discovery group The Pattern Discovery group in Otaniemi concentrates on combinations of pattern discovery and probabilistic modeling in data mining: pattern discovery...

ABSTRACT Nestedness and Segmented Nestedness (2008)

Heikki Mannila

Consider each row of a 0-1 dataset as the subset of the columns for which the row has an 1. Then a dataset is nested, if for all pairs of rows one row is either a superset or subset of the other. The...

From Data to Knowledge Research Unit (2008)

Heikki Mannila, Jaakko Hollmén, Ella Bingham, Johan Himberg, Anne Patrikainen, Salla Ruosaari, ...

17.1 Data mining The work concentrates on combinations of pattern discovery and probabilistic modeling in data mining: pattern discovery aims at finding local phenomena, while modeling often aims at...

47. An Efficient Method for Association Mapping in Phase-unknown Genotype Data (2008)

Petteri Sevon, Päivi Onkamo, Hannu Toivonen, Heikki Mannila

In genetic association analysis a researcher tries to find shared alleles or haplotypes in a group of patients, which would be much rarer in controls. The analysis is based on availability of...

ÓÖ��Ö�Ò � Ö�Ð�Ø�ÓÒ×��Ô × ��ØÛ��Ò Ø� � �Ú�ÒØ × �Ò � ÓÐÐ � Ø�ÓÒ Ó� (2008)

Heikki Mannila

Ó � ��«�Ö�ÒØ �Ú�ÒØ ØÝÔ� × Ì�Ù × Ø� � Ñ�Ø�Ó� × × �Ð� × ØÓ ��Ò �Ð � Ð�Ö� � ��Ø � ×�Ø × �Ò � �Ò � �...

Research Constrained hidden Markov models for population-based haplotyping (2008)

Bmc Bioinformatics, Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila

This is an open access article distributed under the terms of the Creative Commons Attribution License

Constrained Hidden Markov Models for Population-based Haplotyping Extended abstract (2008)

Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila

Analysis of genetic variation in human populations is critical to the understanding of the genetic basis for complex diseases. Although genomes of several species have been sequenced, it is still too...

Optimal Segmentation Using Tree Models (2008)

Robert Gwadera, Aristides Gionis, Heikki Mannila

Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunication. Many real-life sequences have a strong segmental structure, with segments...

What is the dimension of your binary data? (2008)

Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

Gene Mapping by Haplotype Pattern Mining Hannu T.T. Toivonen1 (2008)

Kari Vasko, Vesa Ollikainen, Petteri Sevon, Heikki Mannila, Juha Kere

Abstract Genetic markers are being increasingly utilized in genemapping. Discovery of associations between markers and patient phenotypes-- such as a disease status-- enablesthe identification of...

ABSTRACT Finding Partial Orders from Unordered 0-1 Data (2008)

Antti Ukkonen, Mikael Fortelius, Heikki Mannila

In applications such as paleontology and medical genetics the 0-1 data has an underlying unknown order (the ages of the fossil sites, the locations of markers in the genome). The order might be total...

gmopuloQalmaden.ibm.com (2008)

Dimitrios Gunopulos, Heikki Mannila

Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...

\Lambda (2008)

Ella Bingham, Heikki Mannila

Random projection in dimensionality reduction:Applications to image and text data

Constrained Hidden Markov Models for Population-based Haplotyping Extended abstract (2008)

Niels L, Taneli Mielikäinen, Lauri Eronen, Hannu Toivonen, Heikki Mannila

Analysis of genetic variation in human populations is critical to the understanding of the genetic basis for complex diseases. Although genomes of several species have been sequenced, it is still too...

Finding Trees From Unordered 0--1 Data (2008)

Hannes Heikinheimo, Heikki Mannila

Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0--1 data and consider the problem of...

Feature Selection in Taxonomies with Applications to Paleontology. (2008)

Garriga, Gemma, Ukkonen, Antti, Mannila, Heikki

Taxonomies for a set of features occur in many real-world domains. An example is provided by paleontology, where the task is to determine the age of a fossil site on the basis of the taxa that have...

Banded structure in binary matrices (2008)

Garriga, Gemma, Junttila, Esa, Mannila, Heikki

A 0--1 matrix has a banded structure if both rows and columns can be permuted so that the non-zero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its...

Macadamia: Master's programme in machine learning and data mining (2008)

Raiko, Tapani, Puolamäki, Kai, Karhunen, Juha, Hollmen, Jaakko, Honkela, Antti, Mannila, Heikki, ...

Macadamia is a two-year master’s programme for machine learning and data mining given in the Department of Information and Computer Science at Helsinki University of Technology. This paper...

A Perspective on Databases and Data Mining (2007)

Data Mining, M. Holsheimer, M. Kersten, H. Mannila, H. Toivonen, Issn -x, ...

and their applications. SMC is sponsored by the Netherlands Organization for Scientific Research (NWO). CWI is a member of

Similarity of Event Sequences (Extended Abstract) (2007)

Heikki Mannila, Pirjo Ronkainen

Sequences of events are an important form of data that occurs in many application domains, such as telecommunications, biostatistics, user interface design, etc. We present a simple model for...

User Interactivity in Very Large Scale Data Mining (2007)

Stefan Wrobel, Dietrich Wettschereck, Inkeri Verkamo, Arno Siebes, Heikki Mannila, Fred Kwakkel, ...

Knowledge discovery is widely considered to be an interactive and iterative process. The data mining phase of KDD is, on the other hand, often assumed to be an indivisible step. We argue that user...

Learning, mining, or modeling? - A case study from paleoecology (2007)

Heikki Mannila, Hannu Toivonen, Atte Korhola, Heikki Olander

Exploratory data mining, machine learning, and statistical modeling all have a role in discovery science. We describe a paleoecological reconstruction problem where Bayesian methods are useful and...

An Algorithm for Learning Hierarchical Classifiers (2007)

Jyrki Kivinen, Heikki Mannila, Esko Ukkonen, Jaak Vilo

1 Introduction. In [4] an Occam algorithm ([1]) was introduced for PAC learning certain kind of decision lists from classified examples. Such decision lists, or hierarchical rules as we call them,...

y (2007)

Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila

Abstract. One of the most challenging problems in data manipulation in the future is to be able to efficiently handle very large databases but also multiple induced properties or generalizations in...

1 Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2007)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...

Predictive Profiles for Transaction Data using Finite Mixture (2007)

Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila

Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...

Data Mining and Knowledge Discovery KL2159-03/5253512 October 24, 2003 19:30 (2007)

Uncorrected Data Mining, Heikki Mannila

It has been claimed that the discovery of association rules is well suited for applications of market basket analysis to reveal regularities in the purchase behaviour of customers. However today, one...

How to Handle Small Samples: Bootstrap and Bayesian Methods in the Analysis of Linguistic Change (2007)

Hinneburg, Alexander, Mannila, Heikki, Kaislaniemi, Samuli, Nevalainen, Terttu, Raumolin-Brunberg, Helena

Estimating the relative frequencies of linguistic features is a fundamental task in linguistic computation. As the amount of text or speech that is available from a given user of the language...

Comparing segmentations by applying randomization techniques (2007)

Haiminen, Niina, Mannila, Heikki, Terzi, Evimaria

Abstract Background There exist many segmentation techniques for genomic sequences, and the segmentations can also be based on many different biological features. We show how to evaluate and compare...

Constrained hidden Markov models for population-based haplotyping (2007)

Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki

Abstract Background Haplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important...

Constrained hidden Markov models for population-based haplotyping (2007)

Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki

Background. Haplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important intermediate...

Finding Low-Entropy Sets and Trees from Binary Data (2007)

Heikinheimo, Hannes, Hinkkanen, Eino, Mannila, Heikki, Mielikäinen, Taneli, Seppänen, Jouni

The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value...

Assessing Data Mining Results via Swap Randomization (2007)

Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

The problem of assessing the significance of data mining results on high-dimensional 0–1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and...

Optimal Segmentation Using Tree Models (2007)

Robert Gwadera, Aristides Gionis, Heikki Mannila

Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunications. Many real-life sequences have a strong segmental structure, with...

Analysis of Linux Evolution Using Aligned Source Code Segments (2006)

Rasinen, Antti, Hollmen, Jaakko, Mannila, Heikki

The Linux operating system embodies a development history of 15 years and community effort of hundreds of voluntary developers. We examine the structure and evolution of the Linux kernel by...

The Discrete Basis Problem (2006)

Miettinen, Pauli, Mielikäinen, Taneli, Gionis, Aristides, Das, Gautam, Mannila, Heikki

Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the...

Assessing data mining results via swap randomization (2006)

Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

The problem of assessing the significance of data mining results on high-dimensional 0--1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

Algorithms for Discovering Bucket Orders from Data (2006)

Gionis, Aristides, Mannila, Heikki, Puolamäki, Kai, Ukkonen, Antti

Ordering and ranking items of different types are important tasks in various applications, such as query processing and scientific data mining. A total order for the items can be misleading, since...

Constrained Hidden Markov Models for Population-based Haplotyping (Extended Abstract) (2006)

Landwehr, Niels, Mielikäinen, Taneli, Eronen, Lauri, Toivonen, Hannu, Mannila, Heikki

We propose a simple haplotype reconstruction method that is based on iterative refinement and regularization of constrained Hidden Markov Models. We show that it gives results comparable to the...

Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods (2006)

Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...

Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods (2006)

Kai Puolamäki, Mikael Fortelius, Heikki Mannila

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...

What is the dimension of your binary data? (2006)

Tatti, Nikolaj, Mielikäinen, Taneli, Gionis, Aristides, Mannila, Heikki

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

Segmentation and dimensionality reduction (2006)

Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi

Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...

Assessing data mining results via swap randomization (2006)

Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

The discrete basis problem (2006)

Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila

Abstract. Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another...

Segmentation and dimensionality reduction (2006)

Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi

Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...

The discrete basis problem (2006)

Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila

Abstract—Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data and another describing...

Mining chains of relations (2005)

Afrati, Foto, Das, Gautam, Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Title in English: Methods for Comparing Subspace Clusterings (2005)

Tekijä Anne Patrikainen, Työn Prof, Heikki Mannila

Aliavaruusklusterointimenetelmät etsivät samankaltaisten datapisteiden rykelmiä data-avaruuden eri aliavaruuksista. Nämä menetelmät yhdistelevät ja yleistävät klusterointia ja...

Parameter-free spatial data mining using MDL (2005)

Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos

Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...

Parameter-free spatial data mining using MDL (2005)

Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos

Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...

Boolean Formulas and Frequent Sets (2005)

Jouni Seppnen And, Heikki Mannila

We consider the problem of how one can estimate the support of Boolean queries given a collection of frequent itemsets. We describe an algorithm that truncates the inclusion-exclusion sum to include...

Mining Chains of Relations (2005)

Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Clustering aggregation (2005)

Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in...

E.: A Hidden Markov Technique for Haplotype Reconstruction (2005)

Pasi Rastas, Mikko Koivisto, Heikki Mannila, Esko Ukkonen

Abstract. We give a new algorithm for the genotype phasing problem. Our solution is based on a hidden Markov model for haplotypes. The model has a uniform structure, unlike most solutions proposed so...

Clustering aggregation (2005)

Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with them. This problem, clustering aggregation, appears naturally in various...

Mining Chains of Relations (2005)

Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Data Mining: The Next Generation (2005)

Ramakrishnan, Raghu, Agrawal, Rakesh, Freytag, Johann-Christoph, Bollinger, Toni, Clifton, Christopher W., Dzeroski, Saso, ...

Data Mining (DM) has enjoyed great popularity in recent years, with advances in both research and commercialization. The first generation of DM research and development has yielded several...

Relational link-based ranking (2004)

Mannila, Heikki, Terzi, Evimaria

Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...

Fast and Robust General Purpose Clustering Algorithms (2004)

Estivill-Castro, Vladimir, Yang, Jianhua, Heikki Mannila

General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative...

Relational link-based ranking (2004)

Mannila, Heikki, Terzi, Evimaria

Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...

Relational link-based ranking (2004)

GEERTS, Floris, Mannila, Heikki, Terzi, Evimaria

Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...

Clustered segmentations (2004)

Aristides Gionis, Heikki Mannila, Evimaria Terzi

The problem of sequence and time-series segmentation has been discussed widely and it has been applied successfully in a variety of areas, including computational genomics, data analysis for...

Dense itemsets (2004)

Heikki Mannila

Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that...

Relational link-based ranking (2004)

Floris Geerts, Heikki Mannila, Evimaria Terzi

Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...

Geometric and Combinatorial Tiles in 0-1 Data (2004)

Aristides Gionis, Heikki Mannila, Jouni K. Seppänen

In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0-1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a...

Relational Link-Based Ranking (2004)

Floris Geerts, Heikki Mannila, Evimaria Terzi

Link analysis methods show that the interconnections between web pages have lots of valuable information. The link analysis methods are, however, inherently oriented towards analyzing binary...

Hidden Markov Modelling Techniques for Haplotype Analysis (2004)

Mikko Koivisto, Teemu Kivioja, Heikki Mannila, Pasi Rastas, Esko Ukkonen

A hidden Markov model is introduced for descriptive modelling the mosaic--like structures of haplotypes, due to iterated recombinations within a population. Methods using the minimum description...

Subspace clustering of high-dimensional binary data - a probabilistic approach (2004)

Anne Patrikainen, Heikki Mannila

Abstract Subspace clustering refers to clustering where only someof the attributes are relevant for a given cluster. We describe a probabilistic model for subspace clustering,together with an...

A simple algorithm for topic identification in 0–1 data (2003)

Ella Bingham, Heikki Mannila

Abstract. Topics in 0–1 datasets are sets of variables whose occurrences are positively connected together. Earlier, we described a simple generative topic model. In this paper we show that, given...

Rule discovery and probabilistic modeling for onomastic data (2003)

Antti Leino, Heikki Mannila, Ritva Liisa Pitkänen

Abstract. The naming of natural features, such as hills, lakes, springs, meadows etc., provides a wealth of linguistic information; the study of the names and naming systems is called onomastics. We...

Finding recurrent sources in sequences (2003)

Aristides Gionis, Heikki Mannila

Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or...

The pattern ordering problem (2003)

Taneli Mielikäinen, Heikki Mannila

Abstract. Many pattern discovery methods provide fast tools for finding the frequently occurring patterns in large data sets. Such pattern collections can also be used to approximate the underlying...

A Simple Algorithm for Topic Identification in 0-1 Data (2003)

Jouni K. Seppänen, Ella Bingham, Heikki Mannila

Topics in 0–1 datasets are sets of variables whose occurrences are positively connected together. Earlier, we described a simple generative topic model. In this paper we show that, given data...

Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data. (2003)

Jaakko Hollmén, Jouni K. Seppänen, Heikki Mannila

We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures...

Discovering All Most Specific Sentences (2003)

Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, Ram Sewak Sharma

Data mining can be viewed, in many cases, as the task of computing a representation of a theory...

• Local patterns: episodes, sequential patterns, etc. • Clustering (2003)

Heikki Mannila, Aris Gionis, Marko Salmenkivi, Teija Kujala

• Possibly high-d alphabet • Time series (continuous [high-d] values) Basic questions on sequences • Prediction: what is the next event in the sequence? • Sequential supervised learning...

A Theory of Inductive Query Answering (2002)

De Raedt,Luc, Jaeger,Manfred, Lee,Sau Dan, Mannila,Heikki

We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive que ries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....

A Theory of Inductive Query Answering (2002)

De Raedt, Luc, Jaeger, Manfred, Lee, Sau Dan, Mannila, Heikki

We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive que ries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....

Topics in 0-1 Data (2002)

Ella Bingham, Heikki Mannila

Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...

intensity models and reversible jump MCMC (2002)

Marko Salmenkivi, Juha Kere, Heikki Mannila

The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points...

Topics in 0-1 Data (2002)

Ella Bingham, Heikki Mannila

Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...

Topics in 0-1 Data (2002)

Ella Bingham, Heikki Mannila

Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain...

A Theory of Inductive Query Answering (2002)

Luc De Raedt, Manfred Jaeger, Sau Dan Lee, Heikki Mannila, Inst Fur Informatik, Georges Koehler Allee

We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive queries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....

A Theory of Inductive Query Answering (2002)

Luc De Raedt, Manfred Jaeger, Sau Dan Lee, Heikki Mannila, Inst Fur Informatik, Georges Koehler Allee

We introduce the boolean inductive query evaluation problem, which is concerned with answering inductive queries that are arbitrary boolean expressions over monotonic and anti-monotonic predicates....

Genome segmentation using piecewise constant intensity models and reversible jump MCMC (2002)

Salmenkivi, Marko, Kere, Juha, Mannila, Heikki

The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points...

Extracting the context of a mobile device user (2001)

Mäntyjärvi, Jani, Himberg, Johan, Korpipää, Panu, Mannila, Heikki

8th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems, 445 - 450

Time Series Segmentation for Context Recognition in Mobile Devices (2001)

Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmaki

Recognizing the context of use is important in making mobile devices as simple to use as possible. Finding out what the user's situation is can help the device and underlying service in...

Random projection in dimensionality reduction: applications to image and text data (2001)

Ella Bingham, Heikki Mannila

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results...

Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction (2001)

Igor V. Cadez, Padhraic Smyth, Heikki Mannila

Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual...

Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2001)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

We investigate the problem of generating fast approximate answers to queries posed to large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...

Predictive Profiles for Transaction Data using Finite Mixture (2001)

Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila

Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...

Decomposition of Event Sequences into Independent Components (2001)

Heikki Mannila, Dmitry Rusakov

Monitoring a large telecommunication network results in an extensive log of alarms of different types that occurred in the system. Similar log files are also produced in mobile commerce, in web...

Decomposition of Event Sequences into Independent Components (2001)

Heikki Mannila, Dmitry Rusakov

this paper we describe a theoretical framework and practical methods for finding event sequence decompositions. These methods use the probabilistic modeling of the event generating process. The...

Decomposition of event sequences into independent components (2001)

Heikki Mannila, Dmitry Rusakov

Many real-world processes result in an extensive logs of sequences of events, i.e., events coupled with time of occurrence. Examples of such process logs include alarms produced by a large...

Data mining applied to linkage disequilibrium mapping (2000)

Päivi Onkamo, Kari Vasko, Vesa Ollikainen, Petteri Sevon, Heikki Mannila, ...

We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...

Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...

Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...

Rule Discovery in Telecommunication Alarm Data (1999)

Mika Klemettinen, Heikki Mannila, Hannu Toivonen

Fault management is an important but difficult area of telecommunication...

Querying inductive databases: A case study on the MINE RULE operator (1998)

Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila

Abstract. Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern,...

Time-Series Similarity Problems and Well-Separated Geometric Sets (1998)

Béla Bollobas, Gautam Das, Dimitrios Gunopulos, Heikki Mannila

Given a pair of nonidentical complex objects, defining (and determining) how similar they are to each other is a nontrivial problem. In data mining applications, one frequently needs to determine the...

Rule Discovery From Time Series (1998)

Gautam Das, King-ip Lin, Heikki Mannila, Gopal Renganathan, Padhraic Smyth

We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such...

Modeling KDD Processes within the Inductive Database Framework (1998)

Jean-François Boulicaut, Mika Klemettinen, H. Mannila, Heikki Mannila

. One of the most challenging problems in data manipulation in the future is to be able to efficiently handle very large databases but also multiple induced properties or generalizations in that...

Similarity of Event Sequences (1998)

Heikki Mannila, Pirjo Ronkainen

Sequences of events are an important form of data that occurs in many application domains, such as telecommunications, biostatistics, user interface design, etc. We present a simple model for...

Bassist - a tool for MCMC simulation of statistical models (1998)

Hannu Toivonen, Heikki Mannila, Marko Salmenkivi, Karri-pekka Laakso

In this paper we give a short overview of MCMC simulation and the Bassist system, and describe some of the applications in which Bassist has been used.

Reasoning with Examples: Propositional Formulae and Database Dependencies (1998)

Roni Khardon, Heikki Mannila, Dan Roth

For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...

Frailty Factors and Time-dependent Hazards in Modelling Ear Infections in Children Using BASSIST (1998)

Mervi Eerola, Heikki Mannila, Marko Salmenkivi

this paper we describe the use of the BASSIST system when modelling the occurrences of middle ear infection (acute otitis media, AOM). Previous modelling on AOM by nonparametric Bayesian methods has...

Querying inductive databases: A case study on the MINE RULE operator (1998)

Jean-francois Boulicaut, Mika Klemettinen, Heikki Mannila

Abstract. Knowledge discovery in databases (KDD) is a process that can include steps like forming the data set, data transformations, discovery of patterns, searching for exceptions to a pattern,...

Similarity of Attributes by External Probes (1997)

Gautam Das, Heikki Mannila, Pirjo Ronkainen

In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an...

Similarity of Attributes by External Probes (1997)

Gautam Das, Gautam Das, Heikki Mannila, Heikki Mannila, Pirjo Ronkainen, Pirjo Ronkainen

In data mining, similarity between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an important...

Time-Series Similarity Problems and Well-Separated Geometric Sets (1997)

Ela Bollob'as, Gautam Das, Dimitrios Gunopulos, Heikki Mannila

Given a pair of nonidentical complex objects, defining (and determining) how similar they are to each other is a nontrivial problem. In data mining applications, one frequently needs to determine the...

Data mining, hypergraph transversals, and machine learning (1997)

Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen

Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...

Distance Measures for Point Sets and Their Computation (1997)

Thomas Eiter, Heikki Mannila

We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational...

Data mining, Machine Learning (1997)

Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen

Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...

Levelwise Search and Borders of Theories in Knowledge Discovery (1997)

Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen

One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all...

Discovery of Frequent Episodes in Event Sequences (1997)

Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen, A. Inkeri Verkamo, A. Inkeri Verkamo

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. We consider the problem of discovering frequently occurring episodes in such...

Data mining, Hypergraph Transversals, and Machine Learning (1997)

Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen

Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with...

Discovery of Frequent Episodes in Event Sequences (1997)

Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a...

Multiple uses of frequent sets and condensed representations (Extended Abstract) (1996)

Heikki Mannila, Hannu Toivonen

In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used...

On an algorithm for finding all interesting sentences (Extended Abstract) (1996)

Heikki Mannila, Mpi Informatik, Im Stadtwaldt, Hannu Toivonen

Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a...

Attribute-Oriented Induction and Conceptual Clustering (1996)

Oskari Heinonen, Oskari Heinonen, Heikki Mannila, Heikki Mannila

Attribute-oriented induction is a method for knowledge discovery in databases that has recently been described and widely applied by Han et al. In this note we point out the close connection between...

Discovering Generalized Episodes Using Minimal Occurrences (1996)

Heikki Mannila, Hannu Toivonen

Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible...

Constructing tailored SGML documents (1996)

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, Greger Lindén, ...

this paper we analyse the assembly process and some examples of documents that contain supplementary information for intelligent document assembly. 2 Intelligent document assembly

Rule Discovery in Alarm Databases (1996)

Kimmo Hätönen, Kimmo Hatonen, Mika Klemettinen, Mika Klemettinen, Heikki Mannila, Heikki Mannila, ...

Telecommunication networks produce large amounts of alarm information daily. This data contains potentially valuable knowledge about the network. We present a methodology for the analysis of large...

Intelligent Assembly of Structured Documents (1996)

Helena Ahonen, Helena Ahonen, Barbara Heikkinen, Barbara Heikkinen, Pekka Kilpeläinen, Oskari Heinonen, ...

An intelligent document contains information about its structure, its contents and its environment. This information supports intelligent document assembly, i.e., the effective reuse of existing...

Data Mining as Selective Theory Extraction in Probabilistic Logic (1996)

Manfred Jaeger, Heikki Mannila, Emil Weydert, Mpi Informatik, Im Stadtwald

ing from this specific example we are led to the following formulation of data mining problems in general: given a database r, a language L for expressing statements about the data, and a criterion...

Interactive Exploration of Discovered Knowledge: A Methodology for Interaction, and Usability Studies (1996)

Mika Klemettinen, Mika Klemettinen, Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen

We introduce a methodology for knowledge discovery in databases (KDD) where one first discovers large collections of patterns at once, and then performs interactive retrievals from the collection of...

TASA: Telecommunication Alarm Sequence Analyzer or: How to enjoy faults in your network (1996)

Kimmo Hätönen, Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen

Today's large and complex telecommunication networks produce large amounts of alarms daily. The sequence of alarms contains valuable knowledge about the behavior of the network, but much of the...

Discoveringgeneralized Episodes Using Minimal Occurrences (1996)

Heikki Mannila Andhannutoivonen, Heikki Mannila

Sequences of events are an important special form of datathat arises in several contexts, including telecommunications, user interface studies, and epidemiology. Wepresentageneral andflexible...

Mannila: Recognizing renamable generalized propositional Horn formulas is NP-complete (1995)

Thomas Eiter, Heikki Mannila

Yamasaki and Doshita have dened an extension of the class of propositional Horn formulas; later, Gallo and Scutella generalized this class to a hierarchy 0 1 : : : k : ::, where 0 is the set of Horn...

A Perspective on Databases and Data Mining (1995)

Marcel Holsheimer, Martin Kersten, Heikki Mannila, Hannu Toivonen

We discuss the use of database methods for data mining. Recently impressive results have been achieved for some data mining problems using highly specialized and clever data structures. We study how...

Discovering frequent episodes in sequences (Extended Abstract) (1995)

Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. In this paper we consider the problem of recognizing frequent episodes in such...

Reasoning with Examples: Propositional Formulae and Database Dependencies (1995)

Roni Khardon, Roni Khardon, Heikki Mannila, Heikki Mannila, Dan Roth, Dan Roth

For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...

Mannila: Recognizing renamable generalized propositional Horn formulas is NP-complete (1995)

Thomas Eiter, Pekka Kilpeläinen, Heikki Mannila

Yamasaki and Doshita have defined an extension of the class of propositional Horn formulas; later, Gallo and Scutellà generalized this class to a hierarchy Γ0 ⊆ Γ1 ⊆... ⊆ Γk ⊆..., where...

Adding Disjunction to Datalog (Extended Abstract) (1994)

Thomas Eiter, Georg Gottlob, Heikki Mannila

) Thomas Eiter Georg Gottlob Heikki Mannila Christian Doppler Laboratory for Expert Systems, Information Systems Department Technical University of Vienna, Paniglgasse 16, A-1040 Wien, Austria...

Computing Discrete Fréchet Distance (1994)

Thomas Eiter, Thomas Eiter, Heikki Mannila, Heikki Mannila

The Frechet distance between two curves in a metric space is a measure of the similarity between the curves. We present a discrete variation of this measure.

Efficient Algorithms for Discovering Association Rules (1994)

Heikki Mannila, Hannu Toivonen, Inkeri Verkamo

Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Agrawal, Imielinski,...

Generating grammars for SGML tagged texts lacking DTD (1994)

Helena Ahonen, Heikki Mannila, Erja Nikunen

We describe a technique for forming a context free grammar for a document that has some kind of tagging --- structural or typographical --- but no concise description of the structure is available....

Finding Interesting Rules from Large Sets of Discovered Association Rules (1994)

Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo

Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also...

Improved Methods for Finding Association Rules (1994)

Heikki Mannila, Heikki Mannila, Hannu Toivonen, Hannu Toivonen, A. Inkeri Verkamo, A. Inkeri Verkamo

Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Agrawal, Imielinski,...

The Power of Sampling in Knowledge Discovery (1993)

Jyrki Kivinen, Jyrki Kivinen, Heikki Mannila, Heikki Mannila

We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M . We define two different...

Learning Hierarchical Rule Sets (1993)

Jyrki Kivinen, Heikki Mannila, Esko Ukkonen

We present an algorithm for learning sets of rules that are organized into up to k levels. Each level can contain an arbitrary number of rules "if c then l" where l is the class associated...

On Approximation Preserving Reductions: Complete Problems and Robust Measures (Revised Version) (1990)

Pekka Orponen, Heikki Mannila

We investigate the well-known anomalous differences in the approximability properties of NP-complete optimization problems. We define a notion of polynomial time reduction between optimization...

On approximation preserving reductions: Complete problems and robust measures (1987)

Pekka Orponen, Heikki Mannila

We investigate the well-known anomalous differences in the approximability properties of NP-complete optimization problems. We define a notion of polynomial time reduction between optimization...

Design-By-Example: A Design Tool for Relational Databases (1985)

Bitton, Dina, Mannila, Heikki, Raiha, Kari-Jouko

In recent years, research in relational design theory and in query optimization has established a firm ground for designing well-structured logical and physical database schemes. However, the design...

Design-By-Example: A Design Tool for Relational Databases (1985)

Bitton, Dina, Mannila, Heikki, Raiha, Kari-Jouko

In recent years, research in relational design theory and in query optimization has established a firm ground for designing well-structured logical and physical database schemes. However, the design...

Design by Example: An Application of Armstrong Relations (1985)

Mannila, Heikki, Raiha, Kari-Jouko

Example relations, and especially Armstrong relations, can be used as user-friendly representations of dependency sets. In this paper we analyze the use of Armstrong relations in database design with...

Design by Example: An Application of Armstrong Relations (1985)

Mannila, Heikki, Raiha, Kari-Jouko

Example relations, and especially Armstrong relations, can be used as user-friendly representations of dependency sets. In this paper we analyze the use of Armstrong relations in database design with...

Data Mining Applied to Linkage Disequilibrium Mapping

Toivonen, Hannu T. T., Onkamo, Päivi, Vasko, Kari, Ollikainen, Vesa, Sevon, Petteri, Mannila, Heikki, ...

We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...

A Genomewide Screen for Schizophrenia Genes in an Isolated Finnish Subpopulation, Suggesting Multiple Susceptibility Loci

Hovatta, Iiris, Varilo, Teppo, Suvisaari, Jaana, Terwilliger, Joseph D., Ollikainen, Vesa, Arajärvi, Ritva, ...

Schizophrenia is a severe mental disorder affecting ∼1% of the world's population. Here, we report the results from a three-stage genomewide screen performed in a study sample from an internal...

Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods

Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...

Data Mining Applied to Linkage Disequilibrium Mapping

Toivonen, Hannu T. T., Onkamo, Päivi, Vasko, Kari, Ollikainen, Vesa, Sevon, Petteri, Mannila, Heikki, ...

We introduce a new method for linkage disequilibrium mapping: haplotype pattern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a...

A Genomewide Screen for Schizophrenia Genes in an Isolated Finnish Subpopulation, Suggesting Multiple Susceptibility Loci

Hovatta, Iiris, Varilo, Teppo, Suvisaari, Jaana, Terwilliger, Joseph D., Ollikainen, Vesa, Arajärvi, Ritva, ...

Schizophrenia is a severe mental disorder affecting ∼1% of the world's population. Here, we report the results from a three-stage genomewide screen performed in a study sample from an internal...

Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods

Puolamäki, Kai, Fortelius, Mikael, Mannila, Heikki

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full...

Higher origination and extinction rates in larger mammals

Liow, Lee Hsiang, Fortelius, Mikael, Bingham, Ella, Lintulaakso, Kari, Mannila, Heikki, Flynn, Larry, ...

Do large mammals evolve faster than small mammals or vice versa? Because the answer to this question contributes to our understanding of how life-history affects long-term and large-scale...

Reasoning with Examples: Propositional Formulae and Database Dependencies

Roni Khardon, Heikki Mannila, Dan Roth

For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under...

Approximating a Collection of Frequent Sets

Foto Afrati National, Foto Afrati, Aristides Gionis, Heikki Mannila

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is...

Complexity control in a mixture model by the Hardy-Weinberg equilibrium

Bingham, Ella, Mannila, Heikki

A method of complexity control in multinomial mixture modeling of multiple-marker genotype data, imposing the Hardy-Weinberg equilibrium (HWE) between the genotype values, is studied. This is a very...

Principles of Data Mining

David J. Hand, Heikki Mannila, Padhraic Smyth

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically,...