Hector Garcia-molina

Social Systems: Can we Do More Than Just Poke Friends? (2009)

Koutrika, Georgia, Bercovitz, Benjamin, Ikeda, Robert, Kaliszan, Filip, Liou, Henry, Zadeh, Zahra Mohammadi, ...

Social sites have become extremely popular among users but have they attracted equal attention from the research community? Are they good only for simple tasks, such as tagging and poking friends? Do...

Incremental Updates of Invertecl Lists for Text Document Retrieval * (2009)

Hector Garcia-molina, Kurt Sheens

Wit h the proliferation of the world’s “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of...

The Demarcation Protocol: A Technique for Maintaining Constraints in Distributed Database Systems (2009)

Daniel Barbard-mill, Hector Garcia-molina

Abstract. Traditional protocols for distributed database management have a high message overhead; restrain or lock access to resources during protocol execution; and may become impractical for some...

Chapter 19 An Overview of Real-Time Database Systems 1 (2008)

Ben Kao, Hector Garcia-molina

A real-time database system provides database features such as data independence and concurrency control, while at the same time enforcing real-time constraints that applications may have. In this...

ABSTRACT Building a Distributed Full-Text Index for the Web (2008)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Assignment-Based Partitioning in a Condition Monitoring System (2008)

Yongqiang Huang, Hector Garcia-molina

Abstract. A condition monitoring system tracks real-world variables and alerts users when a predefined condition becomes true, e.g., when enemy planes take off, or when suspicious terrorist...

Simrank++: Query (2008)

Ioannis Antonellis, Hector Garcia-molina, Chi-chao Chang

rewriting through link analysis of the click graph

Abstract Computing Capabilities of Mediators (2008)

Ramana Yerneni, Ohen Li, Hector Garcia-molina, Jeffrey Ullman

Existing data-integration systems based on the media-tion architecture employ a variety of mechanisms to de-scribe the query-processing capabilities of sources. How-ever, these systems do not compute...

Digital Equipment Corp. and (2008)

Robert K. Abbott, Hector Garcia-molina

Managing transactions with real-time requirements presents many new problems. In this paper we address several: How can we schedule transactions with deadlines? How do the real-time constraints...

Building the InfoBus: A Review of Technical Choices in the Stanford Digital Library Project (2008)

Andreas Paepcke, Michelle Baldonado, Steve Cousins, Hector Garcia-molina

We review selected technical challenges addressed in our digital library project. Our InfoBus, a CORBA-based distributed object infrastructure, unifies access to heterogeneous document collections...

Abstract Computing Capabilities of Mediators (2008)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Je Rey Ullman

Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the...

Abstract Copy Detection Mechanisms for Digital Documents (2008)

Sergey Brin, James Davis, Hector Garcia-molina

In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it...

List of Supported Students and Staff (2008)

Hector Garcia-molina, Hector Garcia-molina, Brian Cooper, Doctoral C, Arturo Crespo, Doctoral C

The goal of this project is to design and implement a modern, scalable digital library repository (DLR). This repository will permanently store the digital objects that make up a library. The DLR...

Associate Editors (2008)

Jaideep Srivastava, Thomas M. Niccum, Bhaskar Himatsingka, Leana Golubchik, Richard R. Muntz, Gerhard Weikum, ...

is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters,...

Synthetic Workload Performance Analysis of Incremental Updates * Abstract (2008)

Kurt Sheens, Anthony Tomasic, Hector Garcia-molina

Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressedusing a...

and (2008)

Sergey Melnik, Hector Garcia-molina

A set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset (⊆) operator. Set containment joins are deployed in many database...

ABSTRACT Pong-Cache Poisoning in GUESS ∗ (2008)

Neil Daswani, Hector Garcia-molina

This paper studies the problem of resource discovery in unstructured peer-to-peer (P2P) systems. We propose simple policies that make the discovery of resources resilient to coordinated attacks by...

Abstract Shrinking the Warehouse Update Window (2008)

Wilburt Juan Labio, Ramana Yerneni, Hector Garcia-molina

Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken...

Abstract Meaningful Change Detection in Structured Data* (2008)

Sudarshan S. Chawathe, Hector Garcia-molina

Detecting changes by comparing data snapshots is an im-portant requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting...

Abstract Proximity Search in Databases (2008)

Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, Hector Garcia-molina

An information retrieval (IR) engine can rank documents based on textual proximity of key-words within each document. In this paper we apply this notion to search across an entire database for...

Abstract Taxonomy of Trust: Categorizing P2P Reputation Systems ⋆ (2008)

Sergio Marti, Hector Garcia-molina

The field of peer-to-peer reputation systems has exploded in the last few years. Our goal is to organize existing ideas and work to facilitate system design. We present a taxonomy of reputation...

Assignment-Based Partitioning in a Condition Monitoring System (2008)

Yongqiang Huang, Hector Garcia-molina

Abstract. A condition monitoring system tracks real-world variables and alerts users when a predefined condition becomes true, e.g., when enemy planes take off, or when suspicious terrorist...

Overview Details (2008)

Hector Garcia-molina, Tobias Dönz

at the Database and Artificial Intelligence Group, Institute of Information Systems,

Associate Editors (2008)

Ashish Gupta, Inderpal Singh Mumick, Joachim Hammer, Hector Garcia-molina, Jennifer Widom, Wilburt Labio, ...

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

Abstract Non-Cooperation in Competitive P2P Networks £ (2008)

Beverly Yang, Tyson Condie, Sepandar Kamvar, Hector Garcia-molina

Large-scale competitive P2P networks are threatened by the noncooperation problem, where peers do not forward queries to potential competitors. While non-cooperation is not a problem in current P2P...

Abstract (2008)

Anthony Tomasic, Hector Garcia-molina

The proliferation of the world's \information highways " has renewed interest in e cient document indexing techniques. In this article, we provide an overview of the issues in parallel...

Abstract (2008)

Bob Mungamuru, Hector Garcia-molina, Subhasish Mitra

In order to safeguard a sensitive database, we must ensure both its privacy and its longevity. However, privacy and longevity tend to be competing objectives. We show how to design a system that...

ABSTRACT Combating Spam in Tagging Systems (2008)

Georgia Koutrika, Frans Adjie Effendi, Zoltán Gyöngyi, Paul Heymann, Hector Garcia-molina

Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging systems are gaining in popularity, they become more susceptible to tag spam:...

Abstract The TSIMMIS Project: Integration of Heterogeneous Information Sources (2008)

Sudarshan Chawathe, Hector Garcia-molina, Joachim Hammer, Kelly Irel, Yannis Papakonstantinou, Je Rey Ullman, ...

The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives...

Chapter 1 MANAGING PARALLEL DISKS FOR CONTINUOUS MEDIA DATA (2008)

Edward Chang, Chen Li, Hector Garcia-molina

Abstract In this study we present a scheme called two-dimensional BubbleUp (2DB) to manage parallel disks for continuous media data. Its goal is to reduce initial latency for interactive multimedia...

ABSTRACT Power Browser: Efficient Web Browsing for PDAs (2008)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke, Terry Winograd

We have designed and implemented new Web browsing facilities to support effective navigation on Personal Digital Assistants (PDAs) with limited capabilities: low bandwidth, small display, and slow...

Copy Detection Mechanisms for Digital Documents \Lambda (2008)

Sergey Brin, James Davis, Hector Garcia-molina

Abstract In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it...

ABSTRACT Pong-Cache Poisoning in GUESS ∗ (2008)

Neil Daswani, Hector Garcia-molina

This paper studies the problem of resource discovery in unstructured peer-to-peer (P2P) systems. We propose simple policies that make the discovery of resources resilient to coordinated attacks by...

Abstract Computing Capabilities of Mediators (2008)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Je Rey Ullman

Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the...

Abstract Focused Web Searching with PDAs (2008)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

The Stanford Power Browser project addresses the problems of interacting with the World-Wide Web through wirelessly connected Personal Digital Assistants (PDAs). These problems include bandwidth...

Stanford WebBase Components and Applications (2008)

Junghoo Cho, Hector Garcia-molina, Taher Haveliwala, Wang Lam, Andreas Paepcke, Sriram Raghavan, ...

We describe the design and performance of WebBase, a tool for Web research. The system includes a highly customizable crawler, a repository for collected Web pages, an indexer for both text and...

Abstract (2008)

Orkut Buyukkokten, Luis Gravano, Junghoo Cho, Hector Garcia-molina, Narayanan Shivakumar

Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are...

Abstract (2008)

Orkut Buyukkokten, Luis Gravano, Junghoo Cho, Hector Garcia-molina, Narayanan Shivakumar

Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are...

Replicated Data Management in Mobile Environments: Anything New Under the Sun? (2008)

Daniel Barbará-Millá, Hector Garcia-molina

this paper we show that such dynamic algorithms can be obtained simply by letting transaction update the directory

Simrank++: Query rewriting through link analysis of the click graph (2007)

Antonellis, Ioannis, Garcia-Molina, Hector, Chang, Chi-Chao

We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a...

y (2007)

Brad Adelberg, Hector Garcia-molina, Ben Kao

Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time...

Interoperability for Digital Libraries: Problems and Directions (2007)

Andreas Paepcke, Hector Garcia-molina, Terry Winograd

this paper is to present a broad introduction to the issues of interoperability, suggesting factors that may be used in evaluating interoperability solutions, and providing an overview of solution...

Evolving Source Interfaces over the Web (2007)

Ramana Yerneni, Hector Garcia-molina

Data sources over the Web publish their query interfaces through forms or templates. Over time, the set of templates supported by a source changes due to new requirements or enhanced query-processing...

Competitive Sourcing for Internet Commerce (2007)

Steven Ketchpel, Hector Garcia-molina

1 In electronic commerce on the Internet, a customer can choose among several competitive suppliers, but because of the nature of the Internet, the reliability and trustworthiness of suppliers may...

An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers (2007)

Michelle Baldonado, Seth Katz, Andreas Paepcke, Hector Garcia-molina, Terry Winograd

We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer...

Project Synopsis: Evaluating STRIP (2007)

Brad Adelberg, Hector Garcia-molina

Ths paper describes preliminary efforts at evaluating the performance of the Stanford realtime information processor (STRIP v2.0). We desribe a benchmark for active real-time databases based on a...

Distributed Commerce Transactions (2007)

Steven Ketchpel, Hector Garcia-molina

In situations where self-interested agents are interacting in an environment of distrust, commercial exchanges may be blocked due to a lack of trust. We propose a fully distributed algorithm that...

Competitive Sourcing for Internet Commerce (2007)

Steven Ketchpel, Hector Garcia-molina

1 In electronic commerce on the Internet, a customer can choose among several competitive suppliers, but because of the nature of the Internet, the reliability and trustworthiness of suppliers may...

Abstract Semistructured Data: The Tsimmis Experience (2007)

Joachim Hammer, Jason Mchugh, Hector Garcia-molina

In this paper we discuss the management of semi-structured data, i.e., data that has irregular or dynamically changing structure. We describe components of the Stanford Tsimmis Project that help...

1 Overview Template-Based Wrappers in the tsimmis System (2007)

Joachim Hammer, Hector Garcia-molina, Svetlozar Nestorov, Ramana Yerneni, Marcus Breunig, Vasilis Vassalos

In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided...

Abstract The TSIMMIS Project: Integration of Heterogeneous Information Sources (2007)

Sudarshan Chawathe, Hector Garcia-molina, Joachim Hammer, Kelly Irel, Yannis Papakonstantinou, Je Rey Ullman, ...

The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives...

Crawler-Friendly Web Servers (2007)

Onn Br, Junghoo Cho, Hector Garcia-molina, Narayanan Shivakumar

In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers...

Abstract Finding near-replicas of documents on the web (2007)

Narayanan Shivakumar, Hector Garcia-molina

We consider how to e ciently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results,...

Abstract (2007)

Orkut Buyukkokten, Luis Gravano, Junghoo Cho, Hector Garcia-molina, Narayanan Shivakumar

Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are...

Crawling the Hidden Web (Extended Abstract) (2007)

Sriram Raghavan, Hector Garcia-molina

Current-day crawlers retrieve content from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require...

Declarative Security (2007)

Neil Daswani Dan, Dan Boneh, Hector Garcia-molina, Andreas Paepcke

In this paper, we introduce the novel concept of a secure interface definition compiler (a "security " compiler, for short). We show how interface designers can declare an...

General Terms (2007)

Mor Naaman, Yee Jiun Song, Andreas Paepcke, Hector Garcia-molina

Given location information on digital photographs, we can automatically generate an abundance of photo-related metadata using off-the-shelf and web-based data sources. These metadata can serve as...

Chapter 1 MANAGING PARALLEL DISKS FOR CONTINUOUS MEDIA DATA (2007)

Edward Chang, Chen Li, Hector Garcia-molina

Abstract In this study we present a scheme called two-dimensional BubbleUp (2DB) to manage parallel disks for continuous media data. Its goal is to reduce initial latency for interactive multimedia...

Taxonomy of trust: Categorizing p2p reputation systems (2006)

Sergio Marti, Hector Garcia-molina

The field of peer-to-peer reputation systems has exploded in the last few years. Our goal is to organize existing ideas and work to facilitate system design. We present a taxonomy of reputation...

Link spam detection based on mass estimation (2006)

Zoltan Gyongyi, Hector Garcia-molina

Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the...

Link spam detection based on mass estimation (2006)

Zoltan Gyongyi, Hector Garcia-molina

Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the...

Generic Entity Resolution in the SERF Project (2006)

Omar Benjelloun, Hector Garcia-molina, Hideki Kawai, Tait Eliott Larson, David Menestrina, Qi Su, ...

The SERF project at Stanford deals with the Entity Resolution (ER) problem, in which records determined to represent the same real-life “entities ” (such as people or products) are successively...

Assigning Textual Names to Sets of Geographic Coordinates. Computers, Environment, and Urban Systems (2006)

Mor Naaman, Yee Jiun Song, Andreas Paepcke, Hector Garcia-molina

In many situations, it is necessary for a set of geographic coordinates to be described with textual place names that are familiar to humans. One reason to do so is to convert to text a list of...

Link spam alliances (2005)

Zoltán Gyöngyi, Hector Garcia-molina

Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be...

Adlib: A selftuning index for dynamic peer-to-peer systems (2005)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Peer-to-peer (P2P) systems enable queries over a large database horizontally partitioned across a dynamic set of nodes. We devise a self-tuning index for such systems that can trade off index...

Adlib: A selftuning index for dynamic peer-to-peer systems (2005)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Abstract Peer-to-peer (P2P) systems enable queries over a largedatabase horizontally partitioned across a dynamic set of nodes. We devise a self-tuning index for such systems thatcan trade off index...

Peer-to-peer data preservation through storage auctions (2005)

Brian F. Cooper, Hector Garcia-molina

Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative...

Infomonitor: Unobtrusively archiving a World Wide Web server (2005)

Brian Cooper, Hector Garcia-molina

It may be important to provide long-term preservation of digital data even when that data is stored in an unreliable system, such as a filesystem, a legacy database, or even the World Wide Web. In...

Web Spam Taxonomy (2005)

Zoltan Gyöngyi, Hector Garcia-Molina

Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a...

Link spam alliances (2005)

Zoltán Gyöngyi, Hector Garcia-molina

Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be...

Link spam alliances (2005)

Zoltán Gyöngyi, Hector Garcia-molina

Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be...

Web spam taxonomy (2005)

Zoltán Gyöngyi, Hector Garcia-molina

Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Recently, the amount of web spam has increased dramatically, leading to a...

Link spam alliances (2005)

Zoltán Gyöngyi, Hector Garcia-molina

Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be...

Peer-to-peer data preservation through storage auctions (2005)

Brian F. Cooper, Hector Garcia-molina

Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative...

Adlib: A selftuning index for dynamic peer-to-peer systems (2005)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Peer-to-peer (P2P) systems enable queries over a large database horizontally partitioned across a dynamic set of nodes. We devise a self-tuning index for such systems that can trade off index...

The Lowell Database Research Self-Assessment (2005)

Abiteboul, Serge, Agrawal, Rakesh, Bernstein, Philip A., Carey, Michael J., Ceri, Stefano, Croft, W. Bruce, ...

Database needs are changing, driven by the Internet and increasing amounts of scientific and sensor data. In this article, the authors propose research into several important new directions for...

Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems (2004)

Prasanna Ganesan, Mayank Bawa, Hector Garcia-molina

We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale...

SLIC: A Selfish Link-based Incentive Mechanism for Unstructured Peer-to-Peer Networks (2004)

Qixiang Sun, Hector Garcia-molina

Most Peer-to-Peer (P2P) systems assume that all peers are cooperating for the benefit of the community. However in practice, there is a significant portion of peers who leech resources from the...

Combating web spam with trustrank (2004)

Zoltán Gyöngyi, Hector Garcia-molina, Jan Pedersen

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large...

Context data in geo-referenced digital photo collections (2004)

Mor Naaman, Susumu Harada, Qianying Wang, Hector Garcia-molina, Andreas Paepcke

Given time and location information about digital photographs we can automatically generate an abundance of related contextual metadata, using off-the-shelf and Web-based data sources. Among these...

One torus to rule them all: Multi-dimensional queries in p2p systems (2004)

Prasanna Ganesan, Beverly Yang, Hector Garcia-molina

Peer-to-peer systems enable access to data spread over an extremely large number of machines. Most P2P systems support only simple lookup queries. However, many new applications, such as P2P photo...

Evaluating GUESS and Non-Forwarding Peer-to-Peer Search (2004)

Beverly Yang, Patrick Vinograd, Hector Garcia-molina

Current search techniques over unstructured peer-topeer networks rely on intelligent forwarding-based techniques to propagate queries to other peers in the network. Forwarding techniques are...

DHT Routing using Social Links (2004)

Sergio Marti, Prasanna Ganesan, Hector Garcia-molina

Abstract — The equality and anonymity of peer-to-peer networks makes them vulnerable to routing denial of service attacks from misbehaving nodes. In this paper, we investigate how existing social...

The Price of Validity in Dynamic Networks (2004)

Mayank Bawa, Aristides Gionis, Hector Garcia-molina, Rajeev Motwani

Massive-scale self-administered networks like Peer-to-Peer and Sensor Networks have data distributed across thousands of participant hosts. These networks are highly dynamic with short-lived hosts...

Adaptive Peer-To-Peer Topologies (2004)

Tyson Condie Sepandar, Tyson Condie, Ar D. Kamvar, Hector Garcia-molina

We present a peer-level protocol for forming adaptive, self-organizing topologies for data-sharing P2P networks. This protocol is based on the idea that a peer should directly connect to those peers...

DHT Routing Using Social Links (2004)

Sergio Marti Prasanna, Prasanna Ganesan, Hector Garcia-molina

The equality and anonymity of peer-to-peer networks makes them vulnerable to routing denial of service attacks from misbehaving nodes. In this paper, we investigate how existing social networks can...

One torus to rule them all: Multi-dimensional queries in p2p systems (2004)

Prasanna Ganesan, Beverly Yang, Hector Garcia-molina

Peer-to-peer systems enable access to data spread over an extremely large number of machines. Most P2P systems support only simple lookup queries. However, many new applications, such as P2P photo...

Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems (2004)

Prasanna Ganesan, Mayank Bawa, Hector Garcia-molina

We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale...

Combating web spam with trustrank (2004)

Zoltán Gyöngyi, Hector Garcia-molina, Jan Pedersen

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large...

Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems (2004)

Prasanna Ganesan, Mayank Bawa, Hector Garcia-molina

Abstract We consider the problem of horizontally partition-ing a dynamic relation across a large number of disks/nodes by the use of range partitioning. Suchpartitioning is often desirable in...

DIPSEA: A MODULAR DISTRIBUTED HASH TABLE (2004)

Rajeev Motwani, Hector Garcia-molina, Hari Balakrishnan

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a disser-tation for the degree of Doctor of Philosophy.

Evaluating GUESS and Non-Forwarding Peer-to-Peer Search (2004)

Beverly Yang, Patrick Vinograd, Hector Garcia-molina

Current search techniques over unstructured peer-topeer networks rely on intelligent forwarding-based techniques to propagate queries to other peers in the network. Forwarding techniques are...

Vision Paper: Enabling Privacy for the Paranoids (2004)

Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-molina, Krishnaram Kenthapadi, Nina Mishra, ...

P3P [27, 32] is a set of standards that allow corporations to declare their privacy policies. Hippocratic Databases [4] have been proposed to implement such policies within a corporation’s...

Example: Managing Credit Card (2004)

Krishnaram Kenthapadi, Hector Garcia-molina, Rajeev Motwani, G. Aggarwal, M. Bawa, C. Dwork, ...

� Individual centric privacy � Search over access-controlled data � Aggregates on vertically-partitioned databases � Approximations for k-anonymity Krishnaram Kenthapadi 2

The EigenTrust Algorithm for Reputation Management in P2P Networks (2003)

Kamvar, Sepandar D., Schlosser, Mario T., Garcia-Molina, Hector

Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience shows, the anonymous, open nature of these...

Evaluation of delivery techniques for dynamic web content (2003)

Mor Naaman, Hector Garcia-molina, Andreas Paepcke

The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is...

Revolutionizing Science and Engineering Through Cyberinfrastructure (2003)

Daniel E. Atkins, Kelvin K. Droegemeier, Stuart I. Feldman, Hector Garcia-molina, Michael L. Klein, David G. Messerschmitt, ...

This report was prepared by an officially appointed advisory panel to the National Science Foundation, however, any opinions, findings, and conclusions or recommendations expressed in this material...

Eigenrep: Reputation management in p2p networks (2003)

Sepandar D. Kamvar, Mario T. Schlosser, Hector Garcia-molina

Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience with P2P networks such as Gnutella shows,...

Apocrypha: Making p2p overlays network-aware (2003)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Abstract—Many distributed systems built on peer-to-peer principles organize nodes in an overlay network, in order to enable communication between nodes. In general, this overlay network may have...

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology (2003)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Existing peer-to-peer search networks generally fall into two categories: Gnutella-style systems that use arbitrary topology and rely on controlled flooding for search, and systems that explicitly...

Evaluation of delivery techniques for dynamic web content (2003)

Mor Naaman, Hector Garcia-molina, Andreas Paepcke

The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is...

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology (2003)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Abstract — Existing peer-to-peer search networks generally fall into two categories: Gnutella-style systems that use arbitrary topology and rely on controlled flooding for search, and systems that...

SIL: Modeling and measuring scalable peer-to-peer search networks (2003)

Brian F. Cooper, Hector Garcia-molina

Abstract. The popularity of peer-to-peer search networks continues to grow, even as the limitations to the scalability of existing systems become apparent. We propose a simple model for search...

The EigenTrust Algorithm for Reputation Management in P2P Networks (2003)

Sepandar D. Kamvar, Mario T. Schlosser, Hector Garcia-molina

Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience shows, the anonymous, open nature of these...

Studying search networks with SIL (2003)

Brian F. Cooper, Hector Garcia-molina

We present a general model, called the Search/Index Link (SIL) model, for studying peer-to-peer search networks. This model allows us to analyze and visualize existing network architectures. It also...

Maximizing remote work in flooding-based peer-to-peer systems (2003)

Qixiang Sun, Neil Daswani, Hector Garcia-molina

Abstract. In peer-to-peer (P2P) systems where individual peers must cooperate to process each other's requests, a useful metric for evaluating the system is how many remote requests are serviced...

Revolutionizing Science and Engineering Through Cyberinfrastructure (2003)

Daniel E. Atkins, Kelvin K. Droegemeier, Stuart I. Feldman, Hector Garcia-molina, Michael L. Klein, David G. Messerschmitt, ...

This report was prepared by an officially appointed advisory panel to the National Science Foundation, however, any opinions, findings, and conclusions or recommendations expressed in this material...

Designing a Super-peer Network (2003)

Beverly Yang, Hector Garcia-molina

Abstract A super-peer is a node in a peer-to-peer network that operates both as a server to a set of clients, and as an equal in a network of super-peers. Super-peer networks strike a balance between...

SIL: Modeling and measuring scalable peer-to-peer search networks (2003)

Brian F. Cooper, Hector Garcia-molina

Abstract. The popularity of peer-to-peer search networks continues to grow, even as the limitations to the scalability of existing systems become apparent. We propose a simple model for search...

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology (2003)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Abstract — Existing peer-to-peer search networks generally fall into two categories: Gnutella-style systems that use arbitrary topology and rely on controlled flooding for search, and systems that...

Estimating aggregates on a peer-to-peer network (2003)

Mayank Bawa, Hector Garcia-molina, Aristides Gionis, Rajeev Motwani

As Peer-to-Peer (P2P) networks become popular, there is an emerging need to collect a variety of statistical summary information about the participating nodes. The P2P networks of today lack...

Addressing the non-cooperation problem in competitive P2P systems,” presented at the 1st Workshop on Economics of Peer-to-Peer Systems (2003)

Sepandar Kamvar, Beverly Yang, Hector Garcia-molina

Large-scale competitive P2P systems are threatened by the noncooperation problem, where peers do not forward queries to potential competitors. While non-cooperation is not a problem in current P2P...

Complex queries over web repositories (2003)

Sriram Raghavan, Hector Garcia-molina

Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must...

Query Merging: Improving Query Subscription Processing in a Multicast Environment (2003)

Arturo Crespo, Orkut Buyukkokten, Hector Garcia-molina

This paper introduces techniques for reducing data dissemination costs of query subscriptions in a multicast environment. The reduction is achieved by merging queries with overlapping, but not...

Effective page refresh policies for web crawlers (2003)

Junghoo Cho, Hector Garcia-molina

In this paper we study how we can maintain local copies of remote data sources “fresh, ” when the source data is updated autonomously and independently. In particular, we study the problem of Web...

Evaluation of ESI and Class-Based Delta Encoding (2003)

Mor Naaman, Hector Garcia-molina, Andreas Paepcke

The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is...

Peer-to-Peer Research at Stanford (2003)

Mayank Bawa, Brian F. Cooper, Arturo Crespo, Neil Daswani, Prasanna Ganesan, Hector Garcia-molina, ...

this paper we present recent and ongoing research projects of the Peers research group at Stanford University. Section 2 studies the problems relating to locating resources in P2P systems. Section 3...

Addressing the Non-Cooperation Problem in Competitive P2P (2003)

Systems Beverly Yang, Beverly Yang, Sepandar Kamvar, Hector Garcia-molina

Large-scale competitive P2P systems are threatened by the noncooperation problem, where peers do not forward queries to potential competitors. While non-cooperation is not a problem in current P2P...

to Compute Similarity (2003)

Prasanna Ganesan Hector, Hector Garcia-molina, Jennifer Widom

this article, we develop measures that take this hierarchy into account, leading to similarity scores that are closer to human intuition than previous measures

Open Problems in Data-Sharing Peer-to-Peer Systems (2003)

Neil Daswani, Hector Garcia-molina, Beverly Yang

In a Peer-To-Peer (P2P) system, autonomous computers pool their resources (e.g., les, storage, compute cycles) in order to inexpensively handle tasks that would normally require large costly servers....

Incentives for Combatting Freeriding on P2P Networks (2003)

Ar D. Kamvar, Mario T. Schlosser, Hector Garcia-molina

Abstract. We address the freerider problem on P2P networks. We first propose a specific participation metric, which we call a peer’s EigenTrust score. We show that EigenTrust scores accurately...

Query Merging: Improving Query Subscription Processing in a Multicast Environment (2003)

Arturo Crespo, Orkut Buyukkokten, Hector Garcia-molina

This paper introduces techniques for reducing data dissemination costs of query subscriptions in amulticast environment. The reduction is achieved by merging queries with overlapping, but not...

Incentives for Combatting Freeriding on P2P Networks. (2003)

Sepandar Kamvar Mario, Ar D. Kamvar, Mario T. Schlosser, Hector Garcia-molina

We address the freerider problem on P2P networks. We first propose a specific participation metric, which we call a peer's EigenTrust score. We show that EigenTrust scores accurately capture...

Query Merging: Improving Query Subscription Processing in a Multicast Environment (2003)

Arturo Crespo, Orkut Buyukkokten, Hector Garcia-molina

This paper introduces techniques for reducing data dissemination costs of query subscriptions in amulticast environment. The reduction is achieved by merging queries with overlapping, but not...

Maximizing remote work in flooding-based peer-to-peer systems (2003)

Qixiang Sun, Neil Daswani, Hector Garcia-molina

In peer-to-peer (P2P) systems where individual peers must cooperate to process each other’s requests, a useful metric for evaluating the system is how many remote requests are serviced by each...

Publish/subscribe tree construction in wireless ad-hoc networks (2003)

Yongqiang Huang, Hector Garcia-molina

Abstract. Wireless ad-hoc publish/subscribe systems combine a publish/subscribe mechanism with wireless ad-hoc networking. The combination, although very attractive, has not been studied extensively...

Effective page refresh policies for web crawlers (2003)

Junghoo Cho, Hector Garcia-molina

In this paper we study how we can maintain local copies of remote data sources “fresh, ” when the source data is updated autonomously and independently. In particular, we study the problem of Web...

Designing a Super-peer Network (2003)

Beverly Yang, Hector Garcia-molina

A super-peer is a node in a peer-to-peer network that operates both as a server to a set of clients, and as an equal in a network of super-peers. Super-peer networks strike a balance between the...

Incentives for Combatting Freeriding on P2P Networks (2003)

Ar D. Kamvar, Mario T. Schlosser, Hector Garcia-molina

Abstract. We address the freerider problem on P2P networks. We first propose a specific participation metric, which we call a peer’s EigenTrust score. We show that EigenTrust scores accurately...

Publish/Subscribe Tree Construction in Wireless Ad-Hoc networks (2003)

Yongqiang Huang, Hector Garcia-molina

Abstract. Wireless ad-hoc publish/subscribe systems combine a publish/subscribe mechanism with wireless ad-hoc networking. The combination, although very attractive, has not been studied extensively...

Parallel Crawlers (2002)

Cho, Junghoo, Garcia-Molina, Hector

In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in...

Read-Only Transactions in a Distributed Database. (2002)

Garcia-Molina,Hector, Wiederhold,Gio

A read-only transaction or query is a transaction which does not modify any data. Read-only transactions could be processed with general transaction processing algorithms, but in many cases it is...

Transience of peers and streaming media (2002)

Mayank Bawa, Hrishikesh Deshp, Hector Garcia-molina

Application level multicast schemes have traditionally been evaluated with respect to the e ciency penalties incurred in migrating the multicast functionality from the network layer to the...

Peer-to-peer resource trading in a reliable distributed system (2002)

Brian F. Cooper, Hector Garcia-molina

1 Introduction Peer-to-peer systems form a useful architecture for awide range of important applications. Although the term "peer-to-peer " is often associated in the pub-lic...

Clustering for approximate similarity search in high-dimensional spaces (2002)

Chen Li, Edward Chang, Hector Garcia-molina, Gio Wiederhold

AbstractÐIn this paper, we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one would...

Time as essence for photo browsing through personal digital libraries (2002)

Adrian Graham, Hector Garcia-molina, Andreas Paepcke, Terry Winograd

We developed two photo browsers for collections with thousands of time-stamped digital images. Modern digital cameras record photo shoot times, and semantically related photos tend to occur in...

Transience of peers and streaming media (2002)

Mayank Bawa, Hrishikesh Deshp, Hector Garcia-molina

Application level multicast schemes have traditionally been evaluated with respect to the efficiency penalties incurred in migrating the multicast functionality from the network layer to the...

Protecting the pipe from malicious peers (2002)

Brian F. Cooper, Mayank Bawa, Neil Daswani, Hector Garcia-molina

Digital materials can be protected from failures by replicating them at multiple autonomous, distributed sites. A Peerto-peer Information Preservation and Exchange (PIPE) network is a good way to...

Routing indices for peer-to-peer systems (2002)

Arturo Crespo, Hector Garcia-molina

Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or ooding the network with queries. In this paper we introduce the concept of Routing...

A case for locally-organized peer-to-peer lookup service (2002)

Prasanna Ganesan, Qixiang Sun, Hector Garcia-molina

Distributed lookup services have predominantly fallen into one of two categories: Gnutella-based systems and DHTs. In this paper, we identify a set of applications for P2P lookup services, and...

Routing indices for peer-to-peer systems (2002)

Arturo Crespo, Hector Garcia-molina

Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or ooding the network with queries. In this paper we introduce the concept of Routing...

Improving search in peer-to-peer networks (2002)

Beverly Yang, Hector Garcia-molina

Peer-to-peer systems have emerged as a popular way to share huge volumes of data. The usability of these systems depends on effective techniques to find and retrieve data; however, current techniques...

Similarity flooding: A versatile graph matching algorithm (2002)

Sergey Melnik, Hector Garcia-molina, Erhard Rahm

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on...

The stanford archival repository project: Preserving our digital past (2002)

Brian F. Cooper, Arturo Crespo, Hector Garcia-molina

The Stanford Archival Repository Project aims to build a robust archiving system that can protect digital objects from failures over very long time spans. Objects are replicated among cooperating...

Peer-to-peer data trading to preserve information (2002)

Brian F. Cooper, Hector Garcia-molina

Data archiving systems rely on replication to preserve information. This paper discusses how a network of autonomousarchiving sites can trade data to achieve the most reliable replication. A series...

Bidding for storage space in a peer-to-peer data preservation system (2002)

Brian F. Cooper, Hector Garcia-molina

Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative...

Clustering for Approximate Similarity Search in High-Dimensional Spaces (2002)

Chen Li, Edward Y. Chang, Hector Garcia-molina, Gio Wiederhold

In this paper we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one wishes to find...

Routing Indices for Peer-to-Peer Systems (2002)

Arturo Crespo, Hector Garcia-molina

Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or flooding the network with queries. In this paper we introduce the concept of Routing...

Similarity Flooding: A Versatile Graph Matching Algorithm (2002)

And Its Application, Sergey Melnik, Hector Garcia-molina, Erhard Rahm

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on...

The Stanford Archival Repository Project: Preserving our digital past (2002)

Brian F. Cooper, Arturo Crespo, Hector Garcia-molina

The Stanford Archival Repository Project aims to build a robust archiving system that can protect digital objects from failures over very long time spans. Objects are replicated among cooperating...

Query-Flood DoS Attacks in Gnutella (2002)

Neil Daswani And, Neil Daswani, Hector Garcia-molina

We describe a simple but e#ective tra#c model that can be used to understand the e#ects of denial-of-service (DoS) attacks based on query floods in Gnutella networks. We run simulations based on the...

Semantic overlay networks for p2p systems (2002)

Arturo Crespo, Hector Garcia-molina

Abstract. In a peer-to-peer (P2P) system, nodes typically connect to a small set of random nodes (their neighbors), and queries are propagated along these connections. Such query flooding tends to be...

Time as essence for photo browsing through personal digital libraries (2002)

Adrian Graham, Hector Garcia-molina, Andreas Paepcke, Terry Winograd

We developed two photo browsers for collections with thousands of time-stamped digital images. Modern digital cameras record photo shoot times, and semantically related photos tend to occur in...

Efficient search in peer-to-peer networks (2002)

Beverly Yang, Hector Garcia-molina

Peer-to-peer systems have emerged as a popular way to share huge volumes of data. The usability of these systems depends on effective techniques to find and retrieve data; however, current techniques...

Building a Distributed Full-Text Index for the Web (2001)

Melnik, Sergey, Raghavan, Sriram, Yang, Beverly, Garcia-Molina, Hector

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices (2001)

Buyukkokten, Orkut, Garcia-Molina, Hector, Paepcke, Andreas

We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text units that can each...

Efficient Web Form Entry on PDAs (2001)

Kaljuvee, Oliver, Buyukkokten, Orkut, Garcia-Molina, Hector, Paepcke, Andreas

We propose a design for displaying and manipulating HTML forms on small PDA screens. The form input widgets are not shown until the user is ready to fill them in. At that point, only one widget is...

Integrating diverse information management systems: a brief survey (2001)

Sriram Raghavan, Hector Garcia-molina

Most current information management systems can be classified into text retrieval systems, relational/object database systems, or semistructured/XML database systems. However, in practice, many...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Integrating diverse information management systems: a brief survey (2001)

Sriram Raghavan, Hector Garcia-molina

Most current information management systems can be classified into text retrieval systems, relational/object database systems, or semistructured/XML database systems. However, in practice, many...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Crawling the hidden web (2001)

Sriram Raghavan, Hector Garcia-molina

Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require...

Searching the web (2001)

Arvind Arasu, Junghoo Cho, Hector Garcia-molina, Andreas Paepcke, Sriram Raghavan

We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage,...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Crawling the hidden web (2001)

Sriram Raghavan, Hector Garcia-molina

Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pages that require...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of web pages. We introduce a novel pipelining technique for structuring the core index-building...

Creating Trading Networks of Digital Archives (2001)

Brian Cooper, Hector Garcia-molina

Digital archives can best survive failures if they have made several copies of their collections at remote sites. In this paper, we discuss how autonomous sites can cooperate to provide preservation...

Searching the web (2001)

Arvind Arasu, Junghoo Cho, Hector Garcia-molina, Andreas Paepcke, Sriram Raghavan

We o#er an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage,...

Efficient Web Form Entry on PDAs (2001)

Oliver Kaljuvee, Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We propose a design for displaying and manipulating HTML forms on small PDA screens. The form input widgets are not shown until the user is ready to fill them in. At that point, only one widget is...

Seeing the whole in parts: text summarization for web browsing on handheld devices (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text units that can each...

Cost-Driven Design for Archival Repositories (2001)

Arturo Crespo, Hector Garcia-molina

Designing an archival repository is a complex task because there are many alternative configurations, each with different reliability levels and costs. In this paper we study the costs involved in an...

Seeing the whole in parts: text summarization for web browsing on handheld devices (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text units that can each...

Text summarization of web pages on handheld devices (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We present a design for displaying and manipulating HTML pages on small handheld devices such as personal digital assistants (PDAs), or cellular phones. We introduce methods for summarizing parts of...

Text summarization of web pages on handheld devices (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We present a design for displaying and manipulating HTML pages on small handheld devices such as personal digital assistants (PDAs), or cellular phones. We introduce methods for summarizing parts of...

Accordion Summarization for End-Game Browsing on PDAs and Cellular Phones (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We demonstrate a new browsing technique for devices with small displays such as PDAs or cellular phones. We concentrate on end-game browsing, where the user is close to or on the target page. We make...

Accordion Summarization for End-Game Browsing on PDAs and Cellular Phones (2001)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

We demonstrate a new browsing technique for devices with small displays such as PDAs or cellular phones. We concentrate on end-game browsing, where the user is close to or on the target page. We make...

Building a distributed full-text index for the web (2001)

Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-molina

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building...

Comparing Hybrid Peer-to-Peer Systems (2001)

Beverly Yang, Hector Garcia-molina

“Peer-to-peer ” systems like Napster and Gnutella have recently become popular for sharing information. In this paper, we study the relevant issues and tradeoffs in designing a scalable P2P...

Integrating diverse information management systems: a brief survey (2001)

Sriram Raghavan, Hector Garcia-molina

Most current information management systems can be classified into text retrieval systems, relational/object database systems, or semistructured/XML database systems. However, in practice, many...

Searching the web (2001)

Arvind Arasu, Junghoo Cho, Hector Garcia-molina, Andreas Paepcke, Sriram Raghavan

We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage,...

Replicated condition monitoring (2001)

Yongqiang Huang, Yongqiang Huang, Yongqiang Huang, Hector Garcia-molina, Hector Garcia-molina

A condition monitoring system tracks real-world variables and alerts users when a predefined condition becomes true, e.g., when stock price drops, or when a nuclear reactor overheats. Replication of...

Implementing a reliable digital object archive (2000)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

Extended version An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing,...

Implementing a reliable digital object archive (2000)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

Abstract. An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and...

The web as a graph (2000)

Sriram Raghavan, Hector Garcia-molina

A Web repository is a large special-purpose collection of Web pages and associated indexes. Many useful queries and computations over such repositories involve traversal and navigation of the Web...

Implementing a reliable digital object archive (2000)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

Abstract. An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and...

The web as a graph (2000)

Sriram Raghavan, Hector Garcia-molina

A Web repository is a large special-purpose collection of Web pages and associated indexes. Many useful queries and computations over such repositories involve traversal and navigation of the Web...

The evolution of the web and implications for an incremental crawler (2000)

Junghoo Cho, Hector Garcia-molina

In this paper we study how to build an e#ective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically...

Finding replicated web collections (2000)

Junghoo Cho, Narayanan Shivakumar, Hector Garcia-molina

Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we...

Beyond document similarity: Understanding value-based search and browsing technologies (2000)

Andreas Paepcke, Hector Garcia-molina, Gerard Rodriguez-mula, Junghoo Cho

In the face of small, one or two word queries, high volumes of diverse documents on the Web are overwhelming search and ranking technologies that are based on document similarity measures. The...

Webbase: A repository of web pages (2000)

Jun Hirai, Sriram Raghavan, Hector Garcia-molina, Andreas Paepcke

In this paper, we study the problem of constructing and maintaining a large shared repository of web pages. We discuss the unique characteristics of such a repository, propose an architecture, and...

The evolution of the web and implications for an incremental crawler (2000)

Junghoo Cho, Hector Garcia-molina

In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically...

Webbase: A repository of web pages (2000)

Jun Hirai, Sriram Raghavan, Hector Garcia-molina, Andreas Paepcke

In this paper, we study the problem of constructing and maintaining a large shared repository of web pages. We discuss the unique characteristics of such a repository, propose an architecture, and...

Webbase: A repository of web pages (2000)

Jun Hirai, Sriram Raghavan, Hector Garcia-molina, Andreas Paepcke

In this paper, we study the problem of constructing and maintaining a large shared repository of web pages. We discuss the unique characteristics of such a repository, propose an architecture, and...

Performance issues in incremental warehouse maintenance (2000)

Wilburt Juan Labio, Jun Yang, Yingwei Cui, Hector Garcia-molina, Jennifer Widom

A well-known challenge in data warehousing is the efficient incremental maintenance of warehouse data in the presence of source data updates. In this paper, we identify several critical data...

Finding replicated web collections (2000)

Junghoo Cho, Narayanan Shivakumar, Hector Garcia-molina

Paper Number 201 Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In...

Approximate query translation across heterogeneous information sources (extended version (2000)

Hector Garcia-molina

In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources...

Focused web searching with PDAs (2000)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke

The Stanford Power Browser project addresses the problems of interacting with the World-Wide Web through wirelessly connected Personal Digital Assistants (PDAs). These problems include bandwidth...

A Mediation Infrastructure for Digital Library Services (2000)

Sergey Melnik, Hector Garcia-molina, Andreas Paepcke

Digital library mediators allow interoperation between diverse information services. In this paper we describe a flexible and dynamic mediator infrastructure that allows mediators to be composed from...

Maximizing Coverage of Mediated Web Queries (2000)

Ramana Yerneni, Felix Naumann, Hector Garcia-molina

Over the Web, mediators are built on large collections of sources to provide integrated access to Web content (e.g., meta-search engines). In order to minimize the expense of visiting a large number...

Crawler-Friendly Web Servers (2000)

Onn Brandman, Junghoo Cho, Hector Garcia-molina, Narayanan Shivakumar

In this paper we study how to make web servers #e.g., Apache# morecrawler friendly. Current web servers o#er the same interfacetocrawlers and regular web surfers, even though crawlers and surfers...

Synchronizing a database to Improve Freshness (2000)

Junghoo Cho Hector, Hector Garcia-molina

In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more di#cult to maintain the copy...

Estimating Frequency of Change (2000)

Junghoo Cho, Junghoo Cho, Junghoo Cho, Hector Garcia-molina, Hector Garcia-molina

Many online data sources are updated autonomously and independently. In this paper, we make the case for estimating the change frequency of the data, to improve web crawlers, web caches and to help...

A Mediation Infrastructure for Digital Library Services (2000)

Sergey Melnik Hector, Hector Garcia-molina, Andreas Paepcke

Digital library mediators allow interoperation between diverse information services. In this paper we describe a flexible and dynamic mediator infrastructure that allows mediators to be composed from...

The SIFT Information Dissemination System (2000)

Tak Yan, Hector Garcia-molina

Information dissemination is a powerful mechanism for finding information in wide-area environments. An information dissemination server accepts long-term user queries, collects new documents from...

Finding Replicated Web Collections (2000)

Junghoo Cho, Narayanan Shivakumar, Hector Garcia-molina

Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we...

Synchronizing a database to Improve Freshness (2000)

Junghoo Cho, Hector Garcia-molina

In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more difficult to maintain the copy...

Efficient Resumption of Interrupted Warehouse Loads (2000)

Wilburt Juan Labio, Janet L. Wiener, Hector Garcia-molina, Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. Atypical load to create or maintain a warehouse processes GBs of data, takes hours or even days to...

Efficient Resumption of Interrupted Warehouse Loads (2000)

Wilburt Juan, Wilburt Juan Labio, Janet L. Wiener, Hector Garcia-molina, Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to...

Efficient Resumption of Interrupted Warehouse Loads (2000)

Wilburt Juan, Wilburt Juan Labio, Janet L. Wiener, Hector Garcia-molina, Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to...

Efficient Resumption of Interrupted Warehouse Loads (2000)

Wilburt Juan, Wilburt Juan Labio, Janet L. Wiener, Hector Garcia-molina, Vlad Gorelik

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to...

Junghoo Cho (2000)

Junghoo Cho, Hector Garcia-molina

this paper, we make the case for estimating the change frequency of data to improve Web crawlers, Web caches and to help data mining. We first identify various scenarios, where di#erent applications...

Power browser: Efficient web browsing for PDAs (2000)

Orkut Buyukkokten, Hector Garcia-molina, Andreas Paepcke, Terry Winograd

We have designed and implemented new Web browsing facilities to support effective navigation on Personal Digital Assistants (PDAs) with limited capabilities: low bandwidth, small display, and slow...

Performance issues in incremental warehouse maintenance (2000)

Wilburt Juan Labio, Jun Yang, Yingwei Cui, Hector Garcia-molina, Jennifer Widom

A well-known challenge in data warehousing is the efficient incremental maintenance of warehouse data in the presence of source data updates. In this paper, we identify several critical data...

Beyond document similarity: Understanding value-based search and browsing technologies (2000)

Andreas Paepcke, Hector Garcia-molina, Gerard Rodriguez-mula, Junghoo Cho

In the face of small, one or two word queries, high volumes of diverse documents on the Web are overwhelming search and ranking technologies that are based on document similarity measures. The...

Implementing a reliable digital object archive (2000)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and...

Optimizing large join queries in mediation systems (1999)

Ramana Yerneni, Chen Li, Jeffrey Ullman, Hector Garcia-molina

Abstract. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of...

Clindex: Clustering for similarity queries in high-dimensional spaces (1999)

Chen Li, Edward Chang, Hector Garcia-molina, James Ze Wang, Gio Wiederhold

In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to find many of the...

Medic: Memory and disk cache for multimedia clients (1999)

Edward Chang, Hector Garcia-molina

In this paper we do focus on the client side, presenting a combined memory-disk buffering algorithm that allows the client to dynamically and effectively deal with variable data rates and delays. We...

Optimizing large join queries in mediation systems (1999)

Ramana Yerneni, Chen Li, Jeffrey Ullman, Hector Garcia-molina

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and...

Computing Capabilities of Mediators (1999)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Jeffrey Ullman

In data-integration systems, the queries supported by a mediator are affected by the query-processing limitations of the sources being integrated. Existing mediation systems employ a variety of...

Coping with Limited Capabilities of Sources (1999)

Hector Garcia-molina, Ramana Yerneni

. In various contexts (e.g., the Internet), the query-processing capabilities of data sources may be limited. Middleware systems based on a mediation architecture are employed to provide powerful...

Exploiting geographical location information of web pages (1999)

Orkut Buyukkokten, Junghoo Cho, Hector Garcia-molina, Luis Gravano, Narayanan Shivakumar

Many information sources on the web are relevant primarily to specific geographical communities. For instance, web sites containing information on restaurants, theatres and apartment rentals are...

Optimizing Large Join Queries in Mediation Systems (1999)

Ramana Yerneni Chen, Chen Li, Jeffrey Ullman, Hector Garcia-molina

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and...

Exploiting Geographical Location Information of Web Pages (1999)

Orkut Buyukkokten, Junghoo Cho, Hector Garcia-molina, Luis Gravano, Narayanan SHivakumar

Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are...

Performance Issues in Incremental Warehouse Maintenance (1999)

Wilburt Juan Labio, Jun Yang, Yingwei Cui, Hector Garcia-molina, Jennifer Widom

A well-known challenge in data warehousing is the efficient incremental maintenance of warehouse data in the presence of source data updates. In this paper, we identify several critical data...

Approximate Query Translation across Heterogeneous Information Sources (1999)

Kevin Chang, Hector Garcia-molina

In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources...

A Sound and Complete Algorithm for Distributed Commerce Transactions (1999)

Steven Ketchpel And, Steven P. Ketchpel, Hector Garcia-molina

In a multi-party transaction such as fulfilling an information request from multiple sources (also called a distributed commerce transaction), agents face risks from dealing with untrusted agents....

Capability-Sensitive Query Processing on Internet Sources (1999)

Hector Garcia-molina, Wilburt Labio, Ramana Yerneni

On the Internet, the limited query-processing capabilities of sources make answering even the simplest queries challenging. In this paper, we present a scheme called GenCompact for generating...

Computing Capabilities of Mediators (1999)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Jeffrey Ullman

In data-integration systems, the queries supported by a mediator are affected by the queryprocessing limitations of the sources being integrated. Existing mediation systems employ a variety of...

Modeling Archival Repositories for Digital Libraries (1999)

Arturo Crespo, Hector Garcia-molina

This paper studies the archival problem: how a digital library can preserve electronic documents over long periods of time. We analyze how an archival repository can fail and we present different...

Implementing a Reliable Digital Object Archive (1999)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and...

Modeling Archival Repositories for Digital Libraries (1999)

Arturo Crespo, Hector Garcia-molina

This paper studies the archival problem: how a digital library can preserve electronic documents over long periods of time. We analyze how an archival repository can fail and we present different...

Implementing a Reliable Digital Object Archive (1999)

Brian Cooper, Arturo Crespo, Hector Garcia-molina

An Archival Repository reliably stores digital objects for long periods of time (decades or centuries). The archival nature of the system requires new techniques for storing, indexing, and...

Exploiting Geographical Location Information of Web Pages (1999)

Orkut Buyukkokten, Junghoo Cho, Hector Garcia-molina, Luis Gravano, Narayanan Shivakumar

Many information sources on the web are relevant primarily to specific geographical communities. For instance, web sites containing information on restaurants, theatres and apartment rentals are...

Clindex: Clustering for Similarity Queries in High-Dimensional Spaces (1999)

Chen Li, Edward Chang, Hector Garcia-molina, James Ze Wang, Gio Wiederhold

In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to find many of the...

Computing Capabilities of Mediators (1999)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Jeffrey Ullman

Existing data-integration systems based on the mediation architecture employ a variety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute...

Performance Issues in Incremental Warehouse Maintenance (1999)

Wilburt Juan Labio, Jun Yang, Yingwei Cui, Hector Garcia-molina, Jennifer Widom

A well-known challenge in data warehousing is the efficient incremental maintenance of warehouse data in the presence of source data updates. In this paper, we identify several critical data...

Capability Sensitive Query Processing on Internet Sources (1999)

Hector Garcia-molina, Wilburt Labio, Ramana Yerneni

On the Internet, query processing capabilities of sources may be limited in diverse ways, and this makes answering even the simplest queries challenging. In this paper, we present a scheme called...

Optimizing Large Join Queries in Mediation Systems (1999)

Chen Li, Ramana Yerneni, Jeffrey Ullman, Hector Garcia-molina

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and...

Capability Sensitive Query Processing on Internet Sources (1999)

Hector Garcia-molina, Wilburt Labio, Ramana Yerneni

On the Internet, query processing capabilities of sources may be limited in diverse ways, and this makes answering even the simplest queries challenging. In this paper, we present a scheme called...

Computing Capabilities of Mediators (1999)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Jeffrey Ullman

In data-integration systems, the queries supported by a mediator are affected by the queryprocessing limitations of the sources being integrated. Existing mediation systems employ a variety of...

Self-Maintainability of Graph Structured Views (1999)

Yue Zhuge, Hector Garcia-molina

Materialized views need to be maintained in response to changes of base data. It is desirable to do this maintenance without accessing base data, to reduce overhead and contention for base data. In...

Optimizing Large Join Queries in Mediation Systems (1999)

Ramana Yerneni, Chen Li, Jeffrey Ullman, Hector Garcia-molina

. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and...

MEDIC: A Memory Disk Cache for Multimedia Clients (1999)

Edward Chang, Hector Garcia-molina

In this paper we propose an integrated memory and disk cache for multimedia clients. The cache cushions the multimedia decoder from input rate fluctuations and mismatches, and because data can be...

Optimizing large join queries in mediation systems (1999)

Ramana Yerneni, Chen Li, Je Rey Ullman, Hector Garcia-molina

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and...

Clindex: Clustering for similarity queries in high-dimensional spaces (1999)

Chen Li, Edward Chang, Hector Garcia-molina, James Ze Wang, Gio Wiederhold

Paper Number 128 In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to...

Optimizing large join queries in mediation systems (1999)

Ramana Yerneni, Chen Li, Je Rey Ullman, Hector Garcia-molina

Abstract. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of...

Computing capabilities of mediators (1999)

Ramana Yerneni, Chen Li, Hector Garcia-molina, Je Rey Ullman

In data-integration systems, the queries supported by a mediator are a ected by the queryprocessing limitations of the sources being integrated. Existing mediation systems employ a variety...

The Asilomar Report on Database Research (1998)

Bernstein, Phil, Brodie, Michael, Ceri, Stefano, DeWitt, David, Franklin, Mike, Garcia-Molina, Hector, ...

The database research community is rightly proud of success in basic research, and its remarkable record of technology transfer. Now the field needs to radically broaden its research focus to attack...

Capability based mediation in TSIMMIS (1998)

Chen Li, Ramana Yerneni, Vasilis Vassalos, Hector Garcia-molina, Yannis Papakonstantinou, Jeffrey Ullman, ...

The TSIMMIS system [1] integrates data from multiple heterogeneous sources and provides users with seamless integrated views of the data. It translates a user query on

Expiring data in a warehouse (1998)

Hector Garcia-molina, Wilburt Juan Labio, Jun Yang

Data warehouses collect data into materi-alized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this pa-per, we handle this by expiring or...

Computing iceberg queries efficiently (1998)

Min Fang, Narayanan Shivakumar, Hector Garcia-molina, Rajeev Motwani, Jeffrey D. Ullman

Many applications compute aggregate func-tions over an attribute (or set of attributes) to find aggregate values above some spec-ified threshold. We call such queries ice-berg queries, because the...

Distributed and Parallel Computing Issues in Data Warehousing (1998)

Hector Garcia-molina, Wilburt J. Labio, Janet L. Wiener, Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. The warehouse data is used for decision-support or data mining....

Proximity search in databases (1998)

Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, Hector Garcia-molina

An information retrieval (IR) engine can rank documents based on textual proximityofkeywords within each document. In this paper we apply this notion to search across an entire database for objects...

2d BubbleUp - Managing parallel disks for media servers. Stanford (1998)

Edward Chang, Hector Garcia-molina, Chen Li

In this study we present a scheme called two-dimensional BubbleUp (2DB) for managing parallel disks in a multimedia server. Its goal is to reduce initial latency for interactive multimedia...

Efficient crawling through URL ordering (1998)

Junghoo Cho, Hector Garcia-molina

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more “important ” pages first. Obtaining important pages rapidly can be very useful when a...

Computing iceberg queries efficiently (1998)

Min Fang, Narayanan Shivakumar, Hector Garcia-molina, Rajeev Motwani, Jeffrey D. Ullman

Many applications compute aggregate functions (such as COUNT, SUM) over an attribute (or set of attributes) to find aggregate values above some specified threshold. We call such queries iceberg...

Distributed and Parallel Computing Issues in Data Warehousing (1998)

Hector Garcia-molina, Wilburt J. Labio, Janet L. Wiener, Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. The warehouse data is used for decision-support or data mining....

Efficient Crawling Through URL Ordering (1998)

Junghoo Cho Hector, Hector Garcia-molina

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when...

Shrinking the Warehouse Update Window (1998)

Wilburt Juan Labio, Ramana Yerneni, Hector Garcia-molina

Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken...

Shrinking the Warehouse Update Window (1998)

Wilburt Juan Labio, Ramana Yerneni, Hector Garcia-molina

Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken...

Expiring Data in a Warehouse (1998)

Hector Garcia-molina, Wilburt Juan Labio, Jun Yang

Data warehouses collect data into materialized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this paper, we handle this by expiring or...

Graph Structured Views and Their Incremental Maintenance (1998)

Yue Zhuge And, Yue Zhuge, Hector Garcia-molina

We study the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional...

Consistency Algorithms for Multi-Source Warehouse View Maintenance (1998)

Yue Zhuge, Hector Garcia-molina, Janet L. Wiener

. A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources...

Graph Structured Views and Their Incremental Maintenance (1998)

Yue Zhuge, Hector Garcia-molina

We study the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional...

Archival Storage for Digital Libraries (1998)

Arturo Crespo, Hector Garcia-Molina

We propose an architecture for Digital Library Repositories that assures long-term archival storage of digital objects. The architecture is formed by a federation of independent but collaborating...

SWAPEROO: A Simple Wallet Architecture for Payments, Exchanges, Refunds, and Other Operations (1998)

Neil Daswani Dan, Dan Boneh, Hector Garcia-molina, Steven Ketchpel, Andreas Paepcke

Most existing digital wallet implementations support a single or a limited set of proprietary financial instruments and protocols for electronic commerce transactions, preventing a user from having...

Accounting for Memory Use, Cost, Throughput, and Latency in the Design of a Media Server (1998)

Edward Chang, Hector Garcia-molina

Conventional wisdom holds that reducing disk latency leads to higher disk utilization, maximizing disk utilization leads to higher throughput, employing a faster disk leads to better performance. All...

Expiring Data in a Warehouse (1998)

Hector Garcia-molina, Wilburt Juan Labio, Jun Yang

Data warehouses collect data into materialized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this paper, we handle this by expiring or...

Computing Iceberg Queries Efficiently (1998)

Min Fang, Narayanan Shivakumar, Hector Garcia-molina, Rajeev Motwani, Jeffrey D. Ullman

Many applications compute aggregate functions over an attribute (or set of attributes) to find aggregate values above some specified threshold. We call such queries iceberg queries, because the...

Graph Structured Views and Their Incremental Maintenance (1998)

Yue Zhuge, Hector Garcia-molina

We study the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional...

Cost-Based Media Server Design (1998)

Edward Chang, Hector Garcia-molina

Conventional wisdom holds that reducing disk latency leads to higher disk utilization, maximizing disk utilization leads to higher throughput, and employing a faster disk leads to better performance....

Distributed and Parallel Computing Issues in Data Warehousing (Invited Talk) (1998)

Hector Garcia-molina, Wilburt J. Labio, Janet L. Wiener, Yue Zhuge

A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. The warehouse data is used for decision-support or data mining....

Safeguarding and Charging for Information on the Internet (1998)

Hector Garcia-molina, Steven P. Ketchpel, Narayanan Shivakumar

With the growing acceptance of the Internet as a new dissemination medium, several new and interesting challenges arise in building a digital commerce infrastructure. In this article we discuss some...

Efficient Query Subscription Processing in a Multicast Environment (1998)

Arturo Crespo, Orkut Buyukkokten, Hector Garcia-molina

This paper examines query subscription merging in a distributed environment where multicast channels are used to deliver information. It describes methods for reducing the cost of delivering...

Finding Near-Replicas of Documents on the Web (1998)

Narayanan Shivakumar, Hector Garcia-molina

We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results,...

Expiring Data in a Warehouse (1998)

Hector Garcia-molina, Wilburt Labio, Jun Yang

Data warehouses collect data into materialized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this paper, we handle this by expiring or...

The Networked Information Economy: Applied And Theoretical Frameworks For Electronic Commerce (1998)

Steven Paul Ketchpel, Hector Garcia-molina

This thesis addresses two areas in electronic commerce. The first is the software engineering practice of designing and developing applications quickly. The use of the object-oriented interfaces...

2D BubbleUp: Managing Parallel Disks for Media Servers (1998)

Edward Chang, Hector Garcia-molina, Chen Li

In this study we present a scheme called two-dimensional BubbleUp (2DB) for managing parallel disks in a multimedia server. Its goal is to reduce initial latency for interactive multimedia...

Shrinking the Warehouse Update Window (1998)

Wilburt Juan Labio, Ramana Yerneni, Hector Garcia-molina

Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken...

Performance Analysis of WHIPS Incremental Maintenance (1998)

Yue Zhuge, Hector Garcia-molina

Incremental maintenance incorporates new changes automatically and continuously into a data warehouse, and seems to be the best maintenance solution for very large warehouses. However, the...

Finding Near-Replicas of Documents on the Web (1998)

Narayanan Shivakumar, Hector Garcia-molina

. We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results,...

Efficient Query Subscription Processing in a Broadcast Environment (1998)

Arturo Crespo, Orkut Buyukkokten, Hector Garcia-molina

This paper introduces techniques for reducing data dissemination costs of query subscriptions. This is achieved by merging queries with overlapping, but not necessarily equal, answers. The paper...

Expiring Data from the Warehouse (1998)

Wilburt Juan, Wilburt Juan Labio, Hector Garcia-molina

Data warehouses are used to collect and analyze data from remote sources. The data collected often originate from transactional information and can become very large. This paper presents a framework...

Efficient Crawling Through URL Ordering (1998)

Junghoo Cho, Hector Garcia-molina, Lawrence Page

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when...

Performance Analysis of WHIPS Incremental Maintenance (1998)

Yue Zhuge, Hector Garcia-molina

We consider materialized view maintenance, where views are integrated from multiple distributed data sources and stored in a data warehouse. Incremental maintenance incorporates new changes...

Accounting for Memory Use, Cost, Throughput, and Latency in the Design of a Media Server (1998)

Edward Chang, Hector Garcia-molina

Conventional wisdom holds that reducing disk latency leads to higher disk utilization, maximizing disk utilization leads to higher throughput, employing a faster disk leads to better performance. All...

Capability Based Mediation in TSIMMIS (1998)

Chen Li, Ramana Yerneni, Vasilis Vassalos, Hector Garcia-molina, Yannis Papakonstantinou, Jeffrey Ullman

this paper, we show how the TSIMMIS mediator takes into account the capabilities of the sources to generate feasible query plans for user queries. Section 2 explains how the mediator processes user...

SWAPEROO: A Simple Wallet Architecture for Payments, Exchanges, Refunds, and Other Operations (1998)

Neil Daswani, Dan Boneh, Hector Garcia-molina, Steven Ketchpel, Andreas Paepcke

Most existing digital wallet implementations support a single or a limited set of proprietary financial instruments and protocols for electronic commerce transactions, preventing a user from having...

2d BubbleUp - Managing parallel disks for media servers. Stanford (1998)

Edward Chang, Hector Garcia-molina, Chen Li

In this study we present a scheme called two-dimensional BubbleUp (2DB) for managing parallel disks in a multimedia server. Its goal is to reduce initial latency for int