Guaranteeing Correctness and Availability in P2P Range (2009)
Prakash Linga, Adina Crainiceanu, Johannes Gehrke, Jayavel Shanmugasudaram
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
A vision for petabyte data management and analysis services for the Arecibo telescope (2009)
Manuel Calimlim, Jim Cordes, Alan Demers, Julia Deneva, Johannes Gehrke, Dan Kifer, ...
We survey the initial steps of a project to build a data management and data mining system for astronomy data generated by the Arecibo Telescope. The total amount of data that our project will have...
Nitin Gupta, Advisor Prof, Johannes Gehrke, Minor Advisor, Prof Robert Bloomfield, Advisor Prof, ...
Honors Database Management; Personalization and extensibility of web applications; Infrastructure for virtual worlds and online collaboration; Financial and quantitative
Indexing For Function Approximation (2009)
Biswanath P, Mirek Riedewald, Stephen B. Pope, Johannes Gehrke
Simulation is one of the most powerful tools that scientists have at their disposal for studying and understanding realworld physical phenomena. In order to be realistic, the mathematical models...
Robert Albright, Alan Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee
We propose to demonstrate SGL, a language and system for writing computer games using data management techniques. We will demonstrate a complete game built using the system, and show how complex game...
Hilda: A high-level language for data-drivenweb applications (2009)
Fan Yang, Jayavel Shanmugasundaram, Mirek Riedewald, Johannes Gehrke, Alan Demers
We propose Hilda, a high-level language for developing data-driven web applications. The primary benefits of Hilda over existing development platforms are: (a) it uses a unified data model for all...
Nitin Gupta, Alan Demers, Johannes Gehrke
We propose to demonstrate SEMMO, a consistency server for MMOs. The key features of SEMMO are its novel distributed consistency protocol and system architecture. The distributed nature of the engine...
Robert Albright, Alan Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee
We propose to demonstrate SGL, a language and system for writing computer games using data management techniques. We will demonstrate a complete game built using the system, and show how complex game...
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system, enabling us to tailor the P2P infrastructure to the specific needs of various...
From Declarative Languages to Declarative Processing (2009)
In Computer Games, Ben Sowell, Alan Demers, Johannes Gehrke, Nitin Gupta, Haoyuan Li, ...
Recent work has shown that we can dramatically improve the performance of computer games and simulations through declarative processing: Character AI can be written in an imperative scripting...
Differential Privacy via Wavelet Transforms (2009)
Xiao, Xiaokui, Wang, Guozhang, Gehrke, Johannes
Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, {\em $\epsilon$-differential privacy} provides one of the strongest...
From Declarative Languages to Declarative Processing in Computer Games (2009)
Sowell, Benjamin, Demers, Alan, Gehrke, Johannes, Gupta, Nitin, Li, Haoyuan, White, Walker
Recent work has shown that we can dramatically improve the performance of computer games and simulations through declarative processing: Character AI can be written in an imperative scripting...
Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, ...
This paper describes a unification of two different SQL extensions for streams and its associated semantics. We use the data models from Oracle and StreamBase as our examples. Oracle uses a...
ABSTRACT Large-Scale Collaborative Analysis and Extraction of Web Data (2009)
Felix Weigel, Biswanath Panda, Mirek Riedewald, Johannes Gehrke, Manuel Calimlim
Archived web data is a great resource for scientific research, but poses serious challenges in data processing and management. We demonstrate the Web Lab Collaboration Server, a platform and service...
Goetz, Michaela, Machanavajjhala, Ashwin, Wang, Guozhang, Xiao, Xiaokui, Gehrke, Johannes
Search engine companies collect the "database of intentions", the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are...
ABSTRACT Guaranteeing Correctness and Availability in (2009)
Pp Range Indices, Prakash Linga, Adina Crainiceanu, Johannes Gehrke, Jayavel Shanmugasudaram
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
Database Research Opportunities in Computer Games ABSTRACT (2009)
Walker White, Christoph Koch, Nitin Gupta, Johannes Gehrke, Alan Demers
In this paper, we outline several ways in which the database community can contribute to the development of technology for computer games. We outline the architecture of different types of computer...
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher
Categories and Subject Descriptors
David J. Martin, Johannes Gehrke, Joseph Y. Halpern
Internet search results are a growing and highly profitable advertising platform. Search providers auction advertising slots to advertisers on their search result pages. Due to the high volume of...
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher
Categories and Subject Descriptors
Database Research Opportunities in Computer Games ABSTRACT (2008)
Walker White, Christoph Koch, Nitin Gupta, Johannes Gehrke, Alan Demers
In this paper, we outline several ways in which the database community can contribute to the development of technology for computer games. We outline the architecture of different types of computer...
ABSTRACT Indexing for Function Approximation (2008)
Biswanath P, Mirek Riedewald, Stephen B. Pope, Johannes Gehrke
Simulation is one of the most powerful tools that scientists have at their disposal for studying and understanding realworld physical phenomena. In order to be realistic, the mathematical models...
A. Balmin, V. Hristidis, Y. Papakonstantinou, Objectrank Authority-based, Lars Brenna, Alan J. Demers, ...
OK- Don’t panic that the preliminary reading list is so long! Avi and I wanted to include a smorgasboard of papers in a subset of areas. We don’t expect you to read or be familiar with all these...
Privacy: Theory meets Practice on the Map (2008)
Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, Lars Vilhuber
Abstract — In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics...
A vision for petabyte data management and analysis services for the Arecibo telescope (2008)
Manuel Calimlim, Jim Cordes, Alan Demers, Julia Deneva, Johannes Gehrke, Dan Kifer, ...
We survey the initial steps of a project to build a data management and data mining system for astronomy data generated by the Arecibo Telescope. The total amount of data that our project will have...
Toward Expressive and Scalable Sponsored Search Auctions (2008)
Martin, David J., Gehrke, Johannes, Halpern, Joseph Y.
Internet search results are a growing and highly profitable advertising platform. Search providers auction advertising slots to advertisers on their search result pages. Due to the high volume of...
Muthuramakrishnan Venkitasubramaniam, Ashwin Machanavajjhala, David Martin, Johannes Gehrke
The CVS (Concurrent Versions System) software is a popular method for recording modifications to data objects, in addition to concurrent access to data in a multi-user environment. In current...
Thesis: Approximate Query Answering over Data Streams (2008)
Upson Hall, Abhinandan Das, Advisor Prof, Johannes Gehrke
This work introduces novel sketch based methods that permit high quality selectivity estimation for spatial joins and range queries. Our synopses can be constructed in a single scan over the input,...
We introduce scalability for computer games as the next frontier for techniques from data management. A very important aspect of computer games is the artificial intelligence (AI) of non-player...
Amdb A Design, Tool Access, Methods Marcel Kornacker, Mehul Shah, Joseph M. Hellerstein, Michael Cammert, ...
is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters,...
Dimitris Papadias, Yufei Tao, Jun Zhang, Nikos Mamoulis, Qiongmao Shen, Jimeng Sun, ...
VLDB Conference......................................................................� � � ÓÚ�Ö
Vassilis Christophides, Gregory Karvounarakis, Aimilia Magkanaraki, Dimitris Plexousakis, Val Tannen, Information Integration, ...
Jaguar: Java in Next-Generation Database Systems (2008)
This project explores fundamental systems issues in query processing performance. We investigate this problem from three different directions: client-server processing, heterogeneous environments,...
• Database Privacy, Distributed algorithms, and Data Mining. (2008)
Abhinandan Das, Advisor Prof, Johannes Gehrke
REPRESENTATIVE RESEARCH • Online Approximation Techniques for Spatial Databases This work introduces novel sketch based methods that permit high quality selectivity estimation for spatial joins and...
Towards Expressive Publish/Subcribe Systems (2008)
Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald
Abstract. Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring...
Database Research Opportunities in Computer Games ABSTRACT (2008)
Walker White, Christoph Koch, Nitin Gupta, Johannes Gehrke, Alan Demers
In this paper, we outline several ways in which the database community can contribute to the development of technology for computer games. We outline the architecture of different types of computer...
Dissertation: Building Compressed Database Systems (2008)
Database Compression, Database Security, Sensor Networks, Adviser Prof, Johannes Gehrke, Adviser Prof, ...
Manager: Dr. Surajit Chaudhuri Designed and implemented a tool to automatically map XML data to relational data for storage and query. The tool takes into account both the logical design of mapping...
ABSTRACT Guaranteeing Correctness and Availability in (2008)
Pp Range Indices, Prakash Linga, Adina Crainiceanu, Johannes Gehrke, Jayavel Shanmugasudaram
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
High-Speed Function Approximation (2008)
Biswanath P, Mirek Riedewald, Johannes Gehrke
We address a new learning problem where the goal is to build a predictive model that minimizes prediction time (the time taken to make a prediction) subject to a constraint on model accuracy. Our...
Conference on Knowledge Discovery and Data Mining (KDD) in August 2003. The competition focused on mining the complex real-life social network inherent in the e-print arXiv (arXiv.org). We describe...
Johannes Gehrke, Rajmohan Rajaraman
In this project, we adopted a database approach to unite the seemingly conflicting requirements of scalability and flexibility in monitoring the physical world. The objective of this research was to...
ABSTRACT What is “Next ” in Event Processing? (2008)
Walker White, Mirek Riedewald, Johannes Gehrke, Alan Demers
Event processing systems have wide applications ranging from managing events from RFID readers to monitoring RSS feeds. Consequently, there exists much work on them in the literature. The prevalent...
A Storage and Indexing Framework for P2P Systems ABSTRACT (2008)
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system. This framework enables us to tailor the P2P infrastructure to the specific...
Three Case Studies of Large-Scale Data Flows (2008)
William Y. Arms, Selcuk Aya, Manuel Calimlim, Johannes Gehrke, Dave Lifka, Mirek Riedewald, ...
We survey three examples of large-scale scientific workflows that we are working with at Cornell: the Arecibo sky survey, the CLEO high-energy particle physics experiment, and the Web Lab project for...
Abstract Semantic Approximation of Data Stream Joins ∗ (2008)
Abhinandan Das, Johannes Gehrke, Mirek Riedewald
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding...
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher
We propose a demonstration of Cayuga, a complex event monitoring system for high speed data streams. Our demonstration will show Cayuga applied to monitoring Web feeds; the demo will illustrate the...
A Storage and Indexing Framework for P2P Systems ABSTRACT (2008)
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system. This framework enables us to tailor the P2P infrastructure to the specific...
Data Integration, Where Does, Time Go, Len Seligman, Arnon Rosenthal, Paul Lehner, ...
The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...
Design Languages Performance (2008)
Fan Yang, Nitin Gupta, Nicholas Gerner, Xin Qi, Alan Demers, Johannes Gehrke, ...
Data-driven web applications are usually structured in three tiers with different programming models at each tier. This division forces developers to manually partition application functionality...
ABSTRACT Guaranteeing Correctness and Availability in (2008)
Pp Range Indices, Prakash Linga, Adina Crainiceanu, Johannes Gehrke, Jayavel Shanmugasudaram
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
Adina Crainiceanu, Johannes Gehrke
Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and...
Abstract BOAT—Optimistic Decision Tree Construction (2008)
Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-yin Loh
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to...
Michael Stonebraker, Mitch Cherniack, Magdalena Balazinska, Hari Balakrishnan, Michael J. Franklin, Joseph M. Hellerstein, ...
The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC
We introduce scalability for computer games as the next frontier for techniques from data management. A very important aspect of computer games is the artificial intelligence (AI) of non-player...
The Claremont report on database research (2008)
Agrawal, Rakesh, Ailamaki, Anastasia, Bernstein, Philip A., Brewer, Eric A., Carey, Michael J., Chaudhuri, Surajit, ...
Declarative processing for computer games (2008)
Walker White, Benjamin Sowell, Johannes Gehrke, Alan Demers
Most game developers think of databases as nothing more than a persistence solution. However, database research is concerned with the wider problem of declarative processing. In this paper we...
We present a performance study of the MAFIA algorithm for mining maximal frequent itemsets from a transactional database. In a thorough experimental analysis, we isolate the effects of individual...
We present a performance study of the MAFIA algorithm for mining maximal frequent itemsets from a transactional database. In a thorough experimental analysis, we isolate the effects of individual...
ABSTRACT How to Quickly Find a Witness (2007)
Daniel Kifer, Johannes Gehrke, Cristian Bucila
The subfield of itemset mining is essentially a collection of algorithms. Whenever a new type of constraint is discovered, a specialized algorithm is proposed to handle it. All of these algorithms...
Rohit Ananthakrishna, Abhinandan Das, S. Muthukrishnan, Johannes Gehrke, Divesh Srivastava
In many applications such as IP network management, data arrives in streams, and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a...
Abstract CACTUS–Clustering Categorical Data Using Summaries (2007)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering...
1 Query Processing with Heterogeneous Resources (Technical Report) (2007)
Tobias Mayr, Johannes Gehrke, Praveen Seshadri
1 In emerging systems, CPUs and memory are integrated into active disks, controllers, and network interconnects. Query processing on these new multiprocessor systems must consider the heterogeneity...
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system, enabling us to tailor the P2P infrastructure to the specific needs of various...
Software—distributed systems General Terms Algorithms (2007)
Adina Crainiceanu, Prakash Linga, Johannes Gehrke, Jayavel Shanmugasundaram
We propose a new distributed, fault-tolerant Peer-to-Peer index structure for resource discovery applications called the P-tree. P-trees efficiently support range queries in addition to equality...
Scalable Winner Determination in Advertising Auctions (2007)
Martin, David, Gehrke, Johannes, Halpern, Joseph
Internet search results are a growing and highly profitable advertising platform. Search providers auction advertising slots to advertisers on their search result pages. Due to the high volume of...
Scalable Winner Determination in Advertising Auctions (2007)
Martin, David, Gehrke, Johannes, Halpern, Joseph
Internet search results are a growing and highly profitable advertising platform. Search providers auction advertising slots to advertisers on their search result pages. Due to the high volume of...
High-Speed Function Approximation (2007)
Panda, Biswanath, Riedewald, Mirek, Gehrke, Johannes, Pope, Stephen
Learning methods for predictive models have traditionally focused on prediction quality and model building time, while prediction time(the time taken to make a prediction) is often ignored. However,...
High-Speed Function Approximation (2007)
Panda, Biswanath, Riedewald, Mirek, Gehrke, Johannes, Pope, Stephen
Learning methods for predictive models have traditionally focused on prediction quality and model building time, while prediction time(the time taken to make a prediction) is often ignored. However,...
Worst-Case Background Knowledge for Privacy-Preserving Data Publishing (2007)
Martin, David J., Kifer, Daniel, Machanavajjhala, Ashwin, Gehrke, Johannes, Halpern, Joseph Y.
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
Plagiarism Detection in arXiv (2007)
Sorokina, Daria, Gehrke, Johannes, Warner, Simeon, Ginsparg, Paul
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14...
Worst-case background knowledge in privacy (2007)
David J. Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, Joseph Y. Halpern
Recent work has shown the necessity of considering an attacker’s background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
A unified platform for data driven web applictions with automatic client-server partitioning (2007)
Fan Yang, Nitin Gupta, Nicholas Gerner, Xin Qi, Alan Demers, Johannes Gehrke, ...
Data-driven web applications are usually structured in three tiers with different programming models at each tier. This division forces developers to manually partition application functionality...
Cayuga: A general purpose event monitoring system (2007)
Alan Demers, Johannes Gehrke, Biswanath P
System for scalable event processing. We present a query language based on Cayuga Algebra for naturally expressing complex event patterns. We also describe several novel system design and...
Worst-case background knowledge in privacy (2007)
David J. Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, Joseph Y. Halpern
Recent work has shown the necessity of considering an attacker’s background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
Worst-case background knowledge in privacy (2007)
David J. Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, Joseph Y. Halpern
Recent work has shown the necessity of considering an attacker’s background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
Cayuga: A general purpose event monitoring system (2007)
Alan Demers, Johannes Gehrke, Biswanath P
System for scalable event processing. We present a query language based on Cayuga Algebra for naturally expressing complex event patterns. We also describe several novel system design and...
Worst-Case Background Knowledge in Privacy (2006)
Martin, David, Kifer, Daniel, Machanavajjhala, Ashwin, Gehrke, Johannes, Halpern, Joseph
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
Worst-Case Background Knowledge in Privacy (2006)
Martin, David, Kifer, Daniel, Machanavajjhala, Ashwin, Gehrke, Johannes, Halpern, Joseph
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what...
Plagiarism Detection in arXiv (2006)
Sorokina, Daria, Gehrke, Johannes, Warner, Simeon, Ginsparg, Paul
We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by...
Plagiarism Detection in arXiv (2006)
Sorokina, Daria, Gehrke, Johannes, Warner, Simeon, Ginsparg, Paul
We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by...
White, Walker, Riedewald, Mirek, Gehrke, Johannes, Demers, Alan
Event processing systems have wide applications ranging from monitoring RSS feeds to managing events from RFID readers, and there exists much work on them in the literature. Many competing temporal...
White, Walker, Riedewald, Mirek, Gehrke, Johannes, Demers, Alan
Event processing systems have wide applications ranging from monitoring RSS feeds to managing events from RFID readers, and there exists much work on them in the literature. Many competing temporal...
Plagiarism detection in arxiv (2006)
Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14...
Towards Expressive Publish/Subscribe Systems (2006)
Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald, Walker White
Abstract. Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring...
Automatic clientserver partitioning of data driven web applications (2006)
Nicholas Gerner, Fan Yang, Alan Demers, Johannes Gehrke, Mirek Riedewald, Jayavel Shanmugasundaram
An important class of applications are data-driven web applications, i.e., web applications that run on top of a back-end database system. Examples of such applications include online shopping sites,...
Hilda: A high-level language for data-driven web applications (2006)
Fan Yang, Jayavel Shanmugasundaram, Mirek Riedewald, Johannes Gehrke, Alan Demers
We propose Hilda, a high-level language for developing data-driven web applications. The primary benefits of Hilda over existing development platforms are: (a) it uses a unified data model for all...
Hilda: A high-level language for data-driven web applications (2006)
Fan Yang, Jayavel Shanmugasundaram, Mirek Riedewald, Johannes Gehrke, Alan Demers
We propose Hilda, a high-level language for developing data-driven web applications. The primary benefits of Hilda over existing development platforms are: (a) it uses a unified data model for all...
l-Diversity: Privacy Beyond k-Anonymity (2006)
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In...
Automatic clientserver partitioning of data driven web applications (2006)
Nicholas Gerner, Fan Yang, Alan Demers, Johannes Gehrke, Mirek Riedewald, Jayavel Shanmugasundaram
Current application development tools provide completely different programming models for the application server (e.g., Java and J2EE) and the client web browser (e.g., JavaScript and HTML)....
l-Diversity: Privacy Beyond k-Anonymity (2006)
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, Muthuramakrishnan Venkitasubramaniam
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In...
ℓ-diversity: Privacy beyond k-anonymity (2006)
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, Muthuramakrishnan Venkitasubramaniam
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In...
A General Algebra and Implementation for Monitoring Event Streams (2005)
Demers, Alan, Gehrke, Johannes, Hong, Mingsheng, Riedewald, Mirek, White, Walker
Recently there has been considerable research on Data Stream Management Systems (DSMS) to support analysis of data that arrives rapidly in high-speed streams. Most of these systems have very...
A General Algebra and Implementation for Monitoring Event Streams (2005)
Demers, Alan, Gehrke, Johannes, Hong, Mingsheng, Riedewald, Mirek, White, Walker
Recently there has been considerable research on Data Stream Management Systems (DSMS) to support analysis of data that arrives rapidly in high-speed streams. Most of these systems have very...
Guaranteeing Correctness and Availability in P2P Range Indices (2005)
Linga, Prakash, Crainiceanu, Adina, Gehrke, Johannes, Shanmugasundaram, Jayavel
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
Guaranteeing Correctness and Availability in P2P Range Indices (2005)
Linga, Prakash, Crainiceanu, Adina, Gehrke, Johannes, Shanmugasundaram, Jayavel
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
Guaranteeing correctness and availability in P2P range indices (2005)
Prakash Linga, Adina Crainiceanu, Johannes Gehrke, Jayavel Shanmugasudaram
New and emerging P2P applications require sophisticated range query capability and also have strict requirements on query correctness, system availability and item availability. While there has been...
Automatic Subspace Clustering of High Dimensional Data (2005)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user...
Multi-query optimization for sensor networks (2005)
Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke
Abstract. The widespread dissemination of small-scale sensor nodes has sparked interest in a powerful new database abstraction for sensor networks: Clients “program” the sensors through queries...
Multi-query optimization for sensor networks (2005)
Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke
Abstract. The widespread dissemination of small-scale sensor nodes has sparked interest in a powerful new database abstraction for sensor networks: Clients “program” the sensors through queries...
Index Structures for Matching XML Twigs Using Relational Query Processors (2004)
Chen, Zhiyuan, Gehrke, Johannes, Korn, Flip, Koudas, Nick, Shanmugasundaram, Jayavel, Srivastava, Divesh
Various index structures have been proposed to speed up the evaluation of XML path expressions. However, existing XML path indices suffer from at least one of three limitations: they focus only on...
Index Structures for Matching XML Twigs Using Relational Query Processors (2004)
Chen, Zhiyuan, Gehrke, Johannes, Korn, Flip, Koudas, Nick, Shanmugasundaram, Jayavel, Srivastava, Divesh
Various index structures have been proposed to speed up the evaluation of XML path expressions. However, existing XML path indices suffer from at least one of three limitations: they focus only on...
Approximation Techniques for Spatial Data (2004)
Das, Abhinandan, Gehrke, Johannes, Riedewald, Mirek
Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing...
Approximation Techniques for Spatial Data (2004)
Das, Abhinandan, Gehrke, Johannes, Riedewald, Mirek
Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing...
P-Ring: An Index Structure for Peer-to-Peer Systems (2004)
Crainiceanu, Adina, Linga, Prakash, Machanavajjhala, Ashwin, Gehrke, Johannes, Shanmugasundaram, Jayavel
Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range...
P-Ring: An Index Structure for Peer-to-Peer Systems (2004)
Crainiceanu, Adina, Linga, Prakash, Machanavajjhala, Ashwin, Gehrke, Johannes, Shanmugasundaram, Jayavel
Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range...
Semantic Approximation of Data Stream Joins (2004)
Das, Abhinandan, Gehrke, Johannes, Riedewald, Mirek
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding...
Semantic Approximation of Data Stream Joins (2004)
Das, Abhinandan, Gehrke, Johannes, Riedewald, Mirek
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding...
Querying Peer-to-Peer Networks Using P-Trees (2004)
Crainiceanu, Adina, Linga, Prakash, Gehrke, Johannes, Shanmugasundaram, Jayavel
Peer-to-peer (P2P) systems provide a robust, scalable and decentralized way to share and publish data. However, most existing P2P systems only provide a very rudimentary query facility; they only...
Querying Peer-to-Peer Networks Using P-Trees (2004)
Crainiceanu, Adina, Linga, Prakash, Gehrke, Johannes, Shanmugasundaram, Jayavel
Peer-to-peer (P2P) systems provide a robust, scalable and decentralized way to share and publish data. However, most existing P2P systems only provide a very rudimentary query facility; they only...
Hybrid Push-Pull Query Processing for Sensor Networks (2004)
Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke, Rajmohan Rajaraman
Abstract: A powerful database abstraction for sensor networks has recently emerged in which clients program the sensors using a declarative query language. Existing work assumes that data is either...
Wavescheduling: energy-efficient data dissemination for sensor networks (2004)
Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke, Rajmohan Rajaraman
Abstract Sensor networks are being increasingly deployed for diverse monitoring applications. Event data are collected at various sensors and sent to selected storage nodes for further in-network...
An indexing framework for peer-to-peer systems (2004)
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
Current peer-to-peer (P2P) indices are monolithic pieces of software that address only a subset of the desired functionality for P2P databases. For instance, Chord [6] provides reliability and...
P-tree: a p2p index for resource discovery applications (2004)
Adina Crainiceanu, Prakash Linga, Johannes Gehrke, Jayavel Shanmugasundaram
We propose a new distributed, fault-tolerant Peer-to-Peer index structure for resource discovery applications called the P-tree. P-trees can efficiently support range queries in addition to equality...
P-ring: An index structure for peer-to-peer systems (2004)
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range...
Semantic approximation of data stream joins (2004)
An Das, Johannes Gehrke, Mirek Riedewald
Abstract—We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints...
Querying peer-to-peer networks using p-trees (2004)
Adina Crainiceanu, Prakash Linga, Johannes Gehrke, Jayavel Shanmugasundaram
Peer-to-peer (P2P) systems provide a robust, scalable and decentralized way to share and publish data. However, most existing P2P systems only provide a very rudimentary query facility; they only...
Approximation Techniques for Spatial Data (2004)
Abhinandan Das, Johannes Gehrke, Mirek Riedewald
Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing...
Querying Peer-to-Peer Networks Using P-Trees (2004)
Adina Crainiceanu Prakash, Prakash Linga, Johannes Gehrke, Jayavel Shanmugasundaram
We propose a new distributed, fault-tolerant peer-to-peer index structure called the P-tree. P-trees e#ciently evaluate range queries in addition to equality queries.
P-tree: a p2p index for resource discovery applications (2004)
Adina Crainiceanu, Prakash Linga, Johannes Gehrke, Jayavel Shanmugasundaram
We propose a new distributed, fault-tolerant Peer-to-Peer index structure for resource discovery applications called the P-tree. P-trees can efficiently support range queries in addition to equality...
Hybrid Push-Pull Query Processing for Sensor Networks (2004)
Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke, Rajmohan Rajaraman
Abstract: A powerful database abstraction for sensor networks has recently emerged in which clients program the sensors using a declarative query language. Existing work assumes that data is either...
Sketch-based multi-query processing over data streams (2004)
Alin Dobra, Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi
Abstract. Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited...
Sketch-based multi-query processing over data streams (2004)
Alin Dobra, Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory....
An indexing framework for peer-to-peer systems (2004)
Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram
Current peer-to-peer (P2P) indices are monolithic pieces of software that address only a subset of the desired functionality for P2P databases. For instance, Chord [6] provides reliability and...
Approximation techniques for spatial data (2004)
Abhinandan Das, Johannes Gehrke, Mirek Riedewald
Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing...
The Cougar project: A work-in-progress report (2003)
Alan Demers, Johannes Gehrke, Rajmohan Rajaraman, Niki Trigoni, Yong Yao
Abstract — We present an update on the status of the Cougar Sensor Database Project in which we are investigating a database approach to sensor networks: Clients “program ” the sensors through...
Query Processing in Sensor Networks (2003)
Johannes Gehrke, Samuel Madden
sensors are small wireless computing devices that sense information such as light and humidity at extremely high resolutions. A smart sensor query-processing architecture using database technology...
The Cougar project: A work-in-progress report (2003)
Alan Demers, Johannes Gehrke, Rajmohan Rajaraman, Niki Trigoni, Yong Yao
Gossip-Based Computation of Aggregate Information (2003)
David Kempe, Alin Dobra, Johannes Gehrke
between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become the norm rather...
Limiting Privacy Breaches in Privacy Preserving Data Mining (2003)
Alexandre Evfimievski, Johannes Gehrke, Ramakrishnan Srikant
There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One approach for this...
Query Processing in Sensor Networks (2003)
Johannes Gehrke, Samuel Madden
sensors are small wireless computing devices that sense information such as light and humidity at extremely high resolutions. A smart sensor query-processing architecture using database technology...
Query Processing in Sensor Networks (2003)
Johannes Gehrke, Samuel Madden
sensors are small wireless computing devices that sense information such as light and humidity at extremely high resolutions. A smart sensor query-processing architecture using database technology...
Energy-Efficient Data Management For Sensor (2003)
Alan Demers, Johannes Gehrke, Rajmohan Rajaraman, Niki Trigoni, Yong Yao
We give a status update of the Cougar Project, in which we investigate a database approach to sensor networks: Clients "program" the sensors through queries in a high-level declarative...
Sequential PAttern Mining using a Bitmap Representation (2002)
Jay Ayres, Jason Flannick, Johannes Gehrke, Tomi Yiu
We introduce a new algorithm for mining sequential patterns. Our algorithm is especially efficient when the sequential patterns in the database are very long. We introduce a novel depth-first search...
DualMiner: A dual-pruning algorithm for itemsets with constraints (2002)
Cristian Bucilă, Johannes Gehrke, Daniel Kifer
Abstract. Recently, constraint-based mining of itemsets for questions like “find all frequent itemsets whose total price is at least $50 ” has attracted much attention. Two classes of...
A theoretical framework for learning from a pool of disparate data sources (2002)
Shai Ben-david, Johannes Gehrke, Reba Schuller
Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic...
Privacy preserving mining of association rules (2002)
Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke
We present a framework for mining association rules from transactions consisting of categorical items where the data has been randomized to preserve privacy of individual transactions. While it is...
DualMiner: A dual-pruning algorithm for itemsets with constraints (2002)
Cristian Bucila, Johannes Gehrke, Daniel Kifer
Constraint-based mining of itemsets for questions such as “find all frequent itemsets where the total price is at least $50 ” has received much attention recently. Two classes of constraints,...
Sequential PAttern Mining using a Bitmap Representation (2002)
Jay Ayres, Johannes Gehrke, Tomi Yiu, Jason Flannick
We introduce a new algorithm for mining sequential patterns. Our algorithm is especially efficient when the sequential patterns in the database are very long. We introduce a novel depth-first search...
Gadt: A probability space adt for representing and querying the physical world (2002)
Anton Faradjian, Johannes Gehrke
Large sensor networks are being widely deployed for measurement, detection, and monitoring applications. Many of these applications involve database systems to store and process data from the...
The Cougar approach to in-network query processing in sensor networks (2002)
The widespread distribution and availability of smallscale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. One such example is a sensor...
Least Expected Cost Query Optimization: What Can We Expect? (2002)
Francis Chu Joseph, Joseph Halpern, Johannes Gehrke
A standard assumption in the database query optimization literature is that it is adequate to optimize for the "typical" case---that is, the case in which various parameters (e.g., the...
The Cougar approach to in-network query processing in sensor networks (2002)
The widespread distribution and availability of small-scale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. One such example is a sensor...
Συστήματα διαχείρισης βάσεων δεδομένων : Α΄τόμος (2002)
Ramakrishnan, Raghu, Gehrke, Johannes
Το συγκεκριμένο τεκμήριο είναι σε μορφή Portable Document Format (pdf)
Συστήματα διαχείρισης βάσεων δεδομένων : Β΄ τόμος (2002)
Ramakrishnan, Raghu, Gehrke, Johannes
Το συγκεκριμένο τεκμήριο είναι σε μορφή Portable Document Format (pdf)
MAFIA: A maximal frequent itemset algorithm for transactional databases (2001)
Doug Burdick, Manuel Calimlim, Johannes Gehrke
We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search...
Privacy-enabled Management, Customer Data, Günter Karjoth, Matthias Schunter, Michael Waidner, Dan Boneh, ...
is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters,...
Towards Sensor Database Systems (2001)
Philippe Bonnet Johannes, Johannes Gehrke, Praveen Seshadri
Sensor networks are being widely deployed for measurement, detection and surveillance applications. In these new applications, users issue long-running queries over a combination of stored data and...
On Computing Correlated Aggregates Over Continual Data Streams (2001)
Johannes Gehrke Cornell, Johannes Gehrke
A&$B?6@ 6&(!!"C 198 %D# F@6)! 280-45380 80 9890-46410 46410 2= + 28939-4747 /!C&$G# 5!& G# 28779 )IA).J3 &$)JKDL(MN(0OLAPAQ -44320 "/))08!! $'' ' )3...
Query Processing with Heterogeneous Resources (2000)
Mayr, Tobias, Bonnet, Philippe, Gehrke, Johannes, Seshadri, Praveen
In emerging systems, CPUs and memory are integrated into active disks, controllers, and network interconnects. Query processing on these new multiprocessor systems must consider the heterogeneity of...
Query Processing with Heterogeneous Resources (2000)
Mayr, Tobias, Bonnet, Philippe, Gehrke, Johannes, Seshadri, Praveen
In emerging systems, CPUs and memory are integrated into active disks, controllers, and network interconnects. Query processing on these new multiprocessor systems must consider the heterogeneity of...
DEMON: Mining and monitoring evolving data (2000)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through...
Querying the physical world (2000)
Johannes Gehrke, Praveen Seshadri
In the next decade, millions of sensors and small-scale mobile devices will integrate processors, memory and communication capabilities. Networks of devices will be widely deployed for monitoring...
DEMON: Mining and monitoring evolving data (2000)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through...
Querying the physical world (2000)
Johannes Gehrke, Praveen Seshadri
In the next decade, millions of sensors and small-scale mobile devices will integrate processors, memory, and communication capabilities. Networks of devices will be widely deployed for monitoring...
DEMON: Mining and monitoring evolving data (2000)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through...
Querying the physical world (2000)
Johannes Gehrke, Praveen Seshadri
In the next decade, millions of sensors and small-scale mobile devices will integrate processors, memory and communication capabilities. Networks of devices will be widely deployed for measurement,...
Querying the Physical World (2000)
Philippe Bonnet, Johannes Gehrke, Praveen Seshadri
ions for Representing Devices In the warehousing approach, discussed in Section 1, devices are not part of the database system. They are accessed using a predefined extraction procedure that...
Querying the Physical World (2000)
Philippe Bonnet Johannes, Johannes Gehrke, Praveen Seshadri
ly, a device supports a set of functions and allows a certain amount of processing to be done directly at the device. A function either (a) acquires, stores and processes data (e.g., measure rainfall...
DEMON: Mining and monitoring evolving data (2000)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
AbstractÐData mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date...
RainForest - a Framework for Fast Decision Tree Construction of Large Datasets (2000)
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly...
Research Interests Systems and Networking Education (2000)
Curriculum Vitæ, Advisers Jayavel Shanmugasundaram, Johannes Gehrke, Ken Birman, Adviser Jayavel Shanmugasundaram
Involved in building a database layer for p2p systems as part of the PEPPER project at Cornell. Working on indexing data items in a p2p system. In particular, – Developed a new distributed hash...
Query Processing in a Device Database System (1999)
Bonnet, Philippe, Gehrke, Johannes, Mayr, Tobias, Seshadri, Praveen
In the next decade, networks of devices will be widely deployed for measurement, detection and surveillance applications. Millions of sensors and small-scale mobile devices will integrate processors,...
Query Processing in a Device Database System (1999)
Bonnet, Philippe, Gehrke, Johannes, Mayr, Tobias, Seshadri, Praveen
In the next decade, networks of devices will be widely deployed for measurement, detection and surveillance applications. Millions of sensors and small-scale mobile devices will integrate processors,...
A framework for measuring changes in data characteristics (1999)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
A data mining algorithm builds a model that captures interesting aspects of the underlying data. We develop a framework for quantifying the difference, called the deviation, between two datasets in...
Clustering Large Datasets in Arbitrary Metric Spaces (1999)
Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison Powell, James French
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying...
Mining Very Large Databases (1999)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
this article was supported by Grant 2053 from the IBM Corp.
CACTUS - Clustering Categorical Data Using Summaries (1999)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering...
CACTUS - Clustering Categorical Data Using Summaries (1999)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering...
A Framework for Measuring Differences in Data Characteristics (1999)
Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Wei-yin Loh
A data mining algorithm builds a model that captures interesting aspects of the underlying data. We develop a framework for quantifying the difference, called the deviation, between two datasets in...
DEMON: Mining and Monitoring Evolving Data (1999)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through...
Clustering Large Datasets in Arbitrary Metric Spaces (1999)
Venkatesh Ganti Raghu, Raghu Ramakrishnan, Johannes Gehrke, Allison Powell, James French
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying...
Jaguar: Extending the Predator Database System with JAVA (1998)
Bonnet, Philippe, Gehrke, Johannes
The Jaguar project is aimed at breaking down the traditional barriers that require SQL query processing to reside on the database server. Indeed, database applications will soon be accessed by large...
Flexible Decision Support in Device-Saturated Environments (1998)
The widespread distribution of small-scale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. Sensor networks with nodes that combine physical...
Clustering Large Datasets in Arbitrary Metric Spaces (1998)
Ganti, Venkatesh, Ramakrishnan, Raghu, Gehrke, Johannes, Powell, Allison, French, James
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying...
RainForest - a Framework for Fast Decision Tree Construction of Large Datasets (1998)
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly...
Automatic subspace clustering of high dimensional data for data mining applications (1998)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan
Data mining applications place special requirements on clustering algorithms including: the ability to nd clusters embedded in subspaces of high dimensional data, scalability, end-user...
Clustering Large Datasets in Arbitrary Metric Spaces (1998)
Venkatesh Ganti Raghu, Raghu Ramakrishnan, Johannes Gehrke, Allison Powell, James French
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying...
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets (1998)
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly...
A Framework for Measuring Changes in Data Characteristics (1998)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, Wei-yin Loh
A data mining algorithm builds a model that captures interesting aspects of the underlying data. We develop a framework for quantifying the difference, called the deviation, between two datasets in...
Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications (1998)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user...
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets (1998)
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly...
Clustering Large Datasets in Arbitrary Metric Spaces (1998)
Venkatesh Ganti Raghu, Raghu Ramakrishnan, Johannes Gehrke, Allison Powell, James French
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying...
RainForest - a Framework for Fast Decision Tree Construction of Large Datasets (1998)
Johannes Gehrke, Raghu Ramakrishnan, Venka Gan
Classification of large datasets is an important data mining problem. Many classification algo-rithms have been proposed in the literature, but studies have shown that so far no algorithm uni-formly...
Automatic subspace clustering of high dimensional data for data mining applications (1998)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan
Data mining applications place special requirements on clustering algorithms including: the ability to nd clusters embedded in subspaces of high dimensional data, scalability, end-user...
Rapid Convergence of a Local Load Balancing Algorithm for Asynchronous Rings (1997)
Johannes Gehrke, C. Greg Plaxton
. We consider the problem of load balancing in a ring network. We present an analysis of the following local algorithm. In each step, each node of the ring examines the number of tokens at its...
Identifying Temporal Patterns and Key Players in Document Collections (1995)
Benyah Shaparenko, Rich Caruana, Johannes Gehrke, Thorsten Joachims
This paper considers the problem of analyzing the development of a document collection over time without requiring meaningful citation data. Given a collection of timestamped documents, we formulate...
Conway, and Gerry Salton, on the occasion of Juris Hartmanis’s Turing Award celebration. (1978)
Juris Hartmanis, Juris Hartmanis, David Gries, John Hopcroft, Juris Hartmanis, Bob Constable, ...
Degrees granted 25 20 15 10 5 PhD’s Granted