Aristides Gionis

Searching the Wikipedia with Contextual Information (2009)

Antti Ukkonen, Carlos Castillo, Aristides Gionis, Debora Donato

We propose a framework for searching the Wikipedia with contextual information. Our framework extends the typical keyword search, by considering queries of the type 〈q, p〉, where q is a set of...

Topical Query Decomposition (2009)

Francesco Bonchi, Debora Donato, Carlos Castillo, Aristides Gionis

We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting documents...

Topical Query Decomposition (2009)

Francesco Bonchi, Debora Donato, Carlos Castillo, Aristides Gionis

We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting documents...

The query-flow graph: model and applications (2009)

Paolo Boldi, Debora Donato, Francesco Bonchi, Aristides Gionis, Carlos Castillo, Sebastiano Vigna

Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as...

The Query-flow Graph: Model and Applications (2009)

Paolo Boldi, Debora Donato, Francesco Bonchi, Aristides Gionis, Carlos Castillo, Sebastiano Vigna

Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as...

Mining graph evolution rules (2009)

Berlingerio, Michele, Bonchi, Francesco, Gionis, Aristides

US patent "Network Graph Evolution Rule Generation" filed by Yahoo! (The four authors listed as inventors)

Dr. Searcher and Mr. Browser: A unified hyperlink-click graph (2009)

Barbara Poblete, Carlos Castillo, Aristides Gionis

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web’s hyperlink and click graphs. The...

Assessing data mining results via swap randomization (2009)

Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

Query Significance in Databases via Randomizations (2009)

Ojala, Markus, Garriga, Gemma C., Gionis, Aristides, Mannila, Heikki

Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries....

A randomized approximation algorithm for computing bucket orders (2009)

Ukkonen, Antti, Puolamäki, Kai, Gionis, Aristides, Mannila, Heikki

We show that a simple randomized algorithm has an expected constant factor approximation guarantee for fitting bucket orders to a set of pairwise preferences.

What is the dimension of your binary data? (2008)

Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

Abstract (2008)

Minos Garofalakis, Aristides Gionis, S. Seshadri, Strand Genomics, Rajeev Rastogi, Kyuseok Shim

XML is rapidly emerging as the new standard for data representation and exchange on the Web. Unlike HTML, tags in XML documents describe the semantics of the data and not how it is to be displayed....

ABSTRACT Finding recurrent sources in sequences (2008)

Aristides Gionis

Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or...

Abstract (2008)

Aristides Gionis, S. Seshadri, Strand Genomics, Rajeev Rastogi, Kyuseok Shim

XML is rapidly emerging as the new standard for data representation and exchange on the Web. Document Type Descriptors (DTDs) contain valuable information on the structure of XML documents and thus...

Abstract (2008)

Minos Garofalakis, Aristides Gionis, S. Seshadri, Strand Genomics, Rajeev Rastogi, Kyuseok Shim

XML is rapidly emerging as the new standard for data representation and exchange on the Web. Unlike HTML, tags in XML documents describe the semantics of the data and not how it is to be displayed....

Abstract XTRACT: A System for Extracting Document Type Descriptors from XML Documents (2008)

Minos Garofalakis, Aristides Gionis

XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema...

Optimal Segmentation Using Tree Models (2008)

Robert Gwadera, Aristides Gionis, Heikki Mannila

Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunication. Many real-life sequences have a strong segmental structure, with segments...

Algorithms for Discovering Bucket Orders from Data ABSTRACT (2008)

Aristides Gionis, Kai Puolamäki

Ordering and ranking items of different types are important tasks in various applications, such as query processing and scientific data mining. A total order for the items can be misleading, since...

What is the dimension of your binary data? (2008)

Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis, Heikki Mannila

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

Mining the graph structures of the web (2008)

Aristides Gionis

Graph structures is a general way of modeling entities and their relationships and they are used to describe a wide variety of data including the Internet, the Web, social networks, metabolic...

ABSTRACT (2008)

Luca Becchetti, Paolo Boldi, Carlos Castillo, Aristides Gionis

We present two algorithms for the approximate computation of the number of triangles incident to every node in a large graph. Both algorithms are based on the idea of min-wise independent...

ABSTRACT Dimension Induced Clustering (2008)

Aristides Gionis, Spiros Papadimitriou

It is commonly assumed that high-dimensional datasets contain points most of which are located in low-dimensional manifolds. Detection of low-dimensional clusters is an extremely useful task for...

Finding high-quality content in social media with an application to community-based question answering (2008)

Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne, Eugene Agichtein, ...

ABSTRACT: The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in...

A.: Query-log mining for detecting spam (2008)

Carlos Castillo, Claudio Corsi, Debora Donato, Paolo Ferragina, Aristides Gionis

Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

ABSTRACT (2007)

Aristides Gionis

High-dimensional collections of 0-1 data occur in many applications. The attributes in such data sets are typically considered to be unordered. However, in many cases there is a natural total or...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

y (2007)

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, ...

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of...

z (2007)

Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani

We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We...

DTD Inference from XML Documents: The XTRACT Approach (2007)

Minos Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, K. Shim, Strand Genomics, ...

XML is rapidly emerging as the new standard for data representation and exchange on the Web. Document Type Descriptors (DTDs) contain valuable information on the structure of XML documents and thus...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

ABSTRACT (2007)

Taher H. Haveliwala, Aristides Gionis

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Assessing Data Mining Results via Swap Randomization (2007)

Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

The problem of assessing the significance of data mining results on high-dimensional 0–1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and...

Estimating the number of citations using author reputation (2007)

Carlos Castillo, Debora Donato, Aristides Gionis

Abstract. We study the problem of predicting the popularity of items in a dynamic environment in which authors post continuously new items and provide feedback on existing items. This problem can be...

Optimal Segmentation Using Tree Models (2007)

Robert Gwadera, Aristides Gionis, Heikki Mannila

Sequence data are abundant in application areas such as computational biology, environmental sciences, and telecommunications. Many real-life sequences have a strong segmental structure, with...

The Discrete Basis Problem (2006)

Miettinen, Pauli, Mielikäinen, Taneli, Gionis, Aristides, Das, Gautam, Mannila, Heikki

Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the...

Assessing data mining results via swap randomization (2006)

Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

The problem of assessing the significance of data mining results on high-dimensional 0--1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

Algorithms for Discovering Bucket Orders from Data (2006)

Gionis, Aristides, Mannila, Heikki, Puolamäki, Kai, Ukkonen, Antti

Ordering and ranking items of different types are important tasks in various applications, such as query processing and scientific data mining. A total order for the items can be misleading, since...

What is the dimension of your binary data? (2006)

Tatti, Nikolaj, Mielikäinen, Taneli, Gionis, Aristides, Mannila, Heikki

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the...

Segmentation and dimensionality reduction (2006)

Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi

Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...

Assessing data mining results via swap randomization (2006)

Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and...

The discrete basis problem (2006)

Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila

Abstract. Matrix decomposition methods represent a data matrix as a product of two smaller matrices: one containing basis vectors that represent meaningful concepts in the data, and another...

k-Anonymization with Minimal Loss of Information (2006)

Aristides Gionis, Tamir Tassa

The technique of k-anonymization allows the releasing of databases that contain personal information while ensuring some degree of individual privacy. Given a database D that needs to be released,...

Segmentation and dimensionality reduction (2006)

Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila, Evimaria Terzi

Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of the representation of the original data. In...

The discrete basis problem (2006)

Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, Heikki Mannila

Abstract—Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data and another describing...

Mining chains of relations (2005)

Afrati, Foto, Das, Gautam, Gionis, Aristides, Mannila, Heikki, Mielikäinen, Taneli, Tsaparas, Panayiotis

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Parameter-free spatial data mining using MDL (2005)

Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos

Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...

Parameter-free spatial data mining using MDL (2005)

Spiros Papadimitriou, Aristides Gionis, Panayiotis Tsaparas, Risto A. Väisänen, Heikki Mannila, Christos Faloutsos

Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatial correlation and...

Mining Chains of Relations (2005)

Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Clustering aggregation (2005)

Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in...

Clustering aggregation (2005)

Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with them. This problem, clustering aggregation, appears naturally in various...

Mining Chains of Relations (2005)

Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, Panayiotis Tsaparas

Traditional data mining applications consider the problem of mining a single relation between two attributes. For example, in a scientific bibliography database, authors are related to papers, and we...

Clustered segmentations (2004)

Aristides Gionis, Heikki Mannila, Evimaria Terzi

The problem of sequence and time-series segmentation has been discussed widely and it has been applied successfully in a variety of areas, including computational genomics, data analysis for...

The Price of Validity in Dynamic Networks (2004)

Mayank Bawa, Aristides Gionis, Hector Garcia-molina, Rajeev Motwani

Massive-scale self-administered networks like Peer-to-Peer and Sensor Networks have data distributed across thousands of participant hosts. These networks are highly dynamic with short-lived hosts...

Geometric and Combinatorial Tiles in 0-1 Data (2004)

Aristides Gionis, Heikki Mannila, Jouni K. Seppänen

In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0-1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a...

Finding recurrent sources in sequences (2003)

Aristides Gionis, Heikki Mannila

Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or...

Estimating aggregates on a peer-to-peer network (2003)

Mayank Bawa, Hector Garcia-molina, Aristides Gionis, Rajeev Motwani

As Peer-to-Peer (P2P) networks become popular, there is an emerging need to collect a variety of statistical summary information about the participating nodes. The P2P networks of today lack...

2 (2002)

Mayur Datar, Aristides Gionis, Rajeev Motwani, Rina Panigrahy

Abstract We consider the problem max csp over multi-valued domains with variables ranging over sets of size si ^ s and constraints involving kj ^ k variables. We study two algorithms with...

Maintaining Stream Statistics over Sliding Windows (2002)

Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani

Abstract. We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model....

Maintaining Stream Statistics over Sliding Windows (2002)

Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani

Abstract. We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model....

Evaluating Strategies for Similarity Search on the Web (2002)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Efficient and tunable similar set retrieval (2001)

Aristides Gionis, Dimitrios Gunopulos, Nick Koudas

Set value attributes are a concise and natural way to model complex data sets. Modern Object Relational systems support set value attributes and allow various query capabilities on them. In this...

Scalable Techniques for Clustering the Web (Extended Abstract) (2000)

Taher H. Haveliwala, Aristides Gionis, Piotr Indyk

Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries,...

Finding Interesting Associations without Support Pruning (2000)

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, ...

Association-rule mining has heretofore relied on the conditionof high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of...

Finding interesting associations without support pruning (2000)

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, ...

Abstract Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only...

Similarity search in high dimensions via hashing (1999)

Aristides Gionis, Piotr Indyk, Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building...

Similarity Search in High Dimensions via Hashing (1999)

Aristides Gionis, Piotr Indyk, Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building...

Finding Interesting Associations without Support Pruning (1999)

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Jeffrey D. Ullman, ...

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of...

Similarity search in high dimensions via hashing (1999)

Aristides Gionis, Piotr Indyk, Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building...

Similarity Search in High Dimensions via Hashing (1997)

Aristides Gionis, Piotr Indyk, Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building...

Automated 3D Phenotype Analysis Using Data Mining

Plyusnin, Ilya, Evans, Alistair R., Karme, Aleksis, Gionis, Aristides, Jernvall, Jukka

The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the...

Finding Interesting Associations without Support Pruning

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, ...

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of...

Approximating a Collection of Frequent Sets

Foto Afrati National, Foto Afrati, Aristides Gionis, Heikki Mannila

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is...