Alon Halevy

Publication List Details

Period

2000 - 2009

Number

93

Co-Authors

Harnessing the Deep Web: Present and Future (2009)

Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy

The Deep Web refers to content hidden behind HTML forms. In order to get to such content, a user has to perform a form submission with valid input values. The name Deep Web arises from the fact that...

Harnessing the Deep Web: Present and Future (2009)

Madhavan, Jayant, Afanasiev, Loredana, Antova, Lyublena, Halevy, Alon

Over the past few years, we have built a system that has exposed large volumes of Deep-Web content to Google.com users. The content that our system exposes contributes to more than 1000 search...

Enterprise Information Integration: Successes, Challenges and Controversies (2009)

Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...

The goal of EII systems is to provide uniform access to multiple data sources without having to first load them into a data warehouse. Since the late 1990’s, several EII products have appeared in...

ABSTRACT Google’s Deep-Web Crawl (2009)

Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy

The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web,...

Enterprise Information Integration: Successes, Challenges and Controversies (2008)

Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...

The goal of EII Systems is to provide uniform access to multiple data sources without having to first loading them into a data warehouse. Since the late 1990’s, several EII products have appeared...

A Platform for Personal Information Management and Integration (2008)

Xin (Luna) Dong, Alon Halevy

The explosion of the amount of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on...

Enterprise Information Integration: Successes, Challenges and Controversies (2008)

Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...

The goal of EII Systems is to provide uniform access to multiple data sources without having to first loading them into a data warehouse. Since the late 1990’s, several EII products have appeared...

The VLDB Journal manuscript No. (will be inserted by the editor) Learning to Match Ontologies on the Semantic Web (2008)

Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy

Abstract On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between...

Protein (2008)

Alon Halevy

<cd> <title> The best of … </title>

Associate Editors (2008)

Philip A. Bernstein, Nishant Dani, Badriddine Khessib, Ramesh Manne, David Shutt, Jayant Madhavan, ...

A funny thing happened on the way to a billion........................................... Alfredo Alba,

SEMEX: Mining for Personal Information Integration (2008)

Xin Dong, Alon Halevy, Ema Nemes, Stephan B. Sigurdsson, Pedro Domingos

Abstract. Personal information management is one of the key applications of the semantic web. Whereas today’s devices store data according to applications, ideal personal information management...

ABSTRACT The Piazza Peer Data Management Project (2008)

Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...

A major problem in today’s information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...

Associate Editors (2008)

Masaru Kitsuregawa, Betty Salzberg, Gonzalo Navarro, Ricardo Baeza-yates, Erkki Sutinen, Jorma Tarhio, ...

IntegratingDiverseInformationManagementSystems:ABriefSurvey..................................

These slides are based in part on slides from (2008)

Craig Knoblock, Jose Luis Ambite, Craig Knoblock, Jose Luis Ambite, Craig Knoblock, José Luis Ambite, ...

The material in these notes is copyrighted by its respective authors. It does not count as published. For more information on ICAPS, please visit www.icaps-conference.org. Planning on the Web

ABSTRACT Semantic Email (2008)

Luke Mcdowell, Oren Etzioni, Alon Halevy, Henry Levy

This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...

Webtables: Exploring the power of tables on the web (2008)

Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang

The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from Google’s...

Bootstrapping pay-as-you-go data integration systems (2008)

Anish Das Sarma, Xin Dong, Alon Halevy

Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront...

Google’s deep-web crawl (2008)

Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy

The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web,...

ABSTRACT The Piazza Peer Data Management Project (2007)

Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...

A major problem in today’s information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...

ABSTRACT Evolving the Semantic Web with Mangrove (2007)

Luke Mcdowell, Oren Etzioni, Steven D. Gribble, Alon Halevy, Henry Levy, William Pentney, ...

Despite numerous proposals for its creation, the semantic web has yet to achieve widespread adoption. Recently, some researchers have argued that participation in the semantic web is too difficult...

Research on Statistical Relational Learning at the University of Washington (2007)

Pedro Domingos, Yeuhi Abe, Corin Anderson, Anhai Doan, Dieter Fox, Alon Halevy, ...

This paper presents an overview of the research on learning statistical models from relational data being carried out at the University of Washington. Our work falls into five main directions:...

ABSTRACT Learning to Map between Ontologies on the Semantic Web (2007)

Anhai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy

Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing....

Associate Editors (2007)

Masaru Kitsuregawa, Betty Salzberg, Mary Fern, Atsuyuki Morishima, Dan Suciu, Wang-chiew Tan, ...

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

Research on Statistical Relational Learning (2007)

Pedro Domingos, Yeuhi Abe, Corin Anderson, Anhai Doan, Dieter Fox, ...

This paper presents an overview of the research on learning statistical models of relational data being carried out at the University of Washington. Our work falls into five main directions: learning...

ABSTRACT Semantic Email (2007)

Luke Mcdowell, Oren Etzioni, Alon Halevy, Henry Levy

This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...

Indexing Dataspaces (2007)

Xin Dong, Alon Halevy

Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume...

CID Name Quarter CSE444 Databases fall (2007)

Alon Halevy, Charles Undergrad, Dan Grad

• Data integration: – Connecting disparate data sources – Great progress in last decade • But we’re still missing the point: – Dataspaces: a new abstraction • A few connections to my...

Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)

Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...

The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...

Abstract (2007)

Ana-maria Popescu, Ana-maria Popescu, Oren Etzioni, Oren Etzioni, Alon Halevy, Dan Weld

This is to certify that I have examined this copy of a doctoral dissertation by

Soliciting User Feedback in a Dataspace System (2007)

Shawn Jeffery, Michael Franklin, Alon Halevy, Shawn R. Jeffery

personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the...

Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)

Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...

The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...

Visualization of heterogeneous data (2007)

Mike Cammarano, Xin (luna Dong, Bryan Chan, Jeff Klingner, Justin Talbot, Alon Halevy, ...

Abstract — Both the Resource Description Framework (RDF), used in the semantic web, and Maya Viz u-forms represent data as a graph of objects connected by labeled edges. Existing systems for...

Abstract (2007)

Jing Liu, Xin Dong, Alon Halevy

There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms...

Principles of dataspace systems (2006)

Alon Halevy, Google Inc, Michael Franklin

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient,...

Data integration: The teenage years (2006)

Alon Halevy, Google Inc

Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...

Structured Data Meets the Web: A Few Observations (2006)

Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (luna Dong, Shawn R. Jeffery, David Ko, ...

The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...

Data integration: The teenage years (2006)

Alon Halevy, Google Inc

Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...

Personal information management with SEMEX (2005)

Yuhan Cai, Xin Luna Dong, Alon Halevy, Jing Michelle Liu, Jayant Madhavan

The explosion of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on the WWW,...

Corpus-based schema matching (2005)

Jayant Madhavan, Philip A. Bernstein, Anhai Doan, Alon Halevy

Schema Matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have...

Corpus-based schema matching (2005)

Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon Halevy, Pradeep Shenoy

Schema matching is the problem of determining a set of correspondences that identify similar elements in two different schemas. In this paper we propose a novel method for matching schemas that...

Date: (2005)

Gerome Miklau, Gerome Miklau, Dan Suciu, Alon Halevy, Dan Suciu, John Zahorjan, ...

This is to certify that I have examined this copy of a doctoral dissertation by

Digital Library Information-Technology Infrastructures (2005)

Ioannidis, Yannis, Maier, David, Abiteboul, Serge, Buneman, Peter, Davidson, Susan, Fox, Edward, ...

This paper charts a research agenda on systems-oriented issues in digital libraries. It focuses on the most central and generic system issues, including system architecture, user-level functionality,...

Semantic Email (2004)

McDowell, Luke, Etzioni, Oren, Halevy, Alon, Levy, Henry

This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...

Similarity search for web services (2004)

Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang

Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new...

Specifying semantic email processes (2004)

Luke Mcdowell, Oren Etzioni, Alon Halevy

Prior work has shown that semantic email processes (SEPs) can be an effective tool for automating emailmediated tasks that are currently performed manually in a tedious, time-consuming, and...

Semantic email: theory and applications (2004)

Luke Mcdowell, Oren Etzioni, Alon Halevy

This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of a...

Semex: Toward on-the-fly personal information integration (2004)

Xin Dong, Alon Halevy, Ema Nemes, Stephan B. Sigurdsson, Pedro Domingos

On-the-fly information integration attempts to change the basic cost-benefit equation association with building information integration applications. This paper argues that on-the-fly can be...

Similarity search for web services (2004)

Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang

Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new...

CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter (2004)

Alon Halevy, Charles Undergrad, Dan Grad

<cd> <title> The best of … </title> <artist> Carreras </artist> </cd> <artist> Pavarotti </artist> <artist> Domingo </artist>...

Co-Chairs of Supervisory Committee: (2004)

Luke K. Mcdowell, Luke K. Mcdowell, Oren Etzioni, Alon Halevy, Oren Etzioni, Alon Halevy, ...

and have found that it is complete and satisfactory in all respects,

iMAP: Discovering Complex Semantic Matches between Database Schemas (2004)

Robin Dhamankar, Yoonkyong Lee, Anhai Doan, Alon Halevy, Pedro Domingos

Creating semantic matches between disparate data sources is fundamental to numerous data sharing e#orts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have...

Efficient Query Reformulation in Peer Data Management Systems (2004)

Igor Tatarinov, Alon Halevy

Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peer's domain of interest,...

Semantic Email: Theory and Applications (2004)

Luke Mcdowell, Oren Etzioni, Alon Halevy

This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of a...

The Specification of Agent Behavior by Ordinary People: A Case Study (2004)

Luke Mcdowell, Oren Etzioni, Alon Halevy

The development of intelligent agents is a key part of the Semantic Web vision, but how does an ordinary person tell an agent what to do? One approach to this problem is to use RDF templates that are...

iMAP: discovering complex semantic matches between database schemas (2004)

Robin Dhamankar, Yoonkyong Lee, Anhai Doan, Alon Halevy, Pedro Domingos

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have...

The Lowell Database Research Self Assessment (2003)

Abiteboul, Serge, Agrawal, Rakesh, Bernstein, Phil, Carey, Mike, Ceri, Stefano, Croft, Bruce, ...

A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the...

The Piazza Peer Data Management Project (2003)

Tatarinov, Igor, Ives, Zachary G, Madhavan, Jayant, Halevy, Alon, Suciu, Dan, Dalvi, Nilesh, ...

A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...

Schema Mediation in Peer Data Management Systems (2003)

Halevy, Alon, Ives, Zachary G, Suciu, Dan, Tatarinov, Igor

Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems:...

Crossing the Structure Chasm (2003)

Etzioni, Oren, Halevy, Alon, Doan, Anhai, Ives, Zachary G, Madhaven, Jayant, McDowell, Luke, ...

It has frequently been observed that most of the world's data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....

Learning to match the schemas of data sources: A multistrategy approach (2003)

Anhai Doan, Pedro Domingos, Alon Halevy

The problem of integrating data from multiple data sources- either on the Internet or within enterprises- has received much attention in the database and AI communities. The focus has been on...

Crossing the structure chasm (2003)

Alon Halevy, Oren Etzioni, Anhai Doan, Zachary Ives, Ý Jayant Madhavan, Luke Mcdowell, ...

It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....

Crossing the structure chasm (2003)

Alon Halevy, Oren Etzioni, Anhai Doan, Zachary Ives, Ý Jayant Madhavan, Luke Mcdowell, ...

It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....

The Piazza Peer Data Management Project (2003)

Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, ...

A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...

Serge Abiteboul, Rakesh Agrawal, Phil Bernstein, Mike Carey, Stefano Ceri, Bruce Croft, David DeWitt, Mike Franklin, (2003)

Serge Abiteboul, Rakesh Agrawal, Phil Bernstein, Mike Carey, Stefano Ceri, Bruce Croft, ...

This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of...

Learning to Match Ontologies on the Semantic Web (2003)

Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy

On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them....

Semantic Email: Adding Lightweight Data Manipulation Capabilities to the Email Habitat (2003)

Oren Etzioni Alon, Alon Halevy, Henry Levy, Luke Mcdowell

B "2C !$-$ 1;@ 0-D !C@E $1 !;D, %2- 7 B*F!'G J 8'(:)<K+ !<!<0'<0 0LKM !"2B ,', 0N !POQ)?$" $&'? 0IA <0 ,IR56-N,-'R,? ! !"2B...

The Piazza Peer Data Management Project (2003)

Igor Tatarinov Zachary, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...

A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...

Ontology Matching: A Machine Learning Approach (2003)

Anhai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy

Finally, we describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings. 1 A Motivating Example: the Semantic Web The current World-Wide...

Learning to Match Ontologies on the Semantic Web (2003)

Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy

On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them....

(SI-2003) (2003)

Anhai Doan, Alon Halevy, Natasha Noy

In numerous distributed environments, including today's World-Wide Web, organizational intranets, and the emerging Semantic Web, the applications will inevitably use the information described by...

Learning to Map Between Ontologies on the Semantic Web (2002)

Doan, AnHai, Madhavan, Jayant, Domingos, Pedro, Halevy, Alon

Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machineunderstandable data, opening myriad opportunities for automated information processing....

Efficiently ordering query plans for data integration (2002)

Anhai Doan, Alon Halevy

The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query...

An evolutionary approach to the semantic web (2002)

Oren Etzioni, Steve Gribble, Alon Halevy, Henry Levy, Luke Mcdowell

Proposals for creating a semantic web have been around at least since 1995 [Dobson and Burrill, 1995]. A wide range of semantic markup languages have been proposed including RDF, N3, SHOE, DAML, and...

Efficient Query Processing for Data Integration (2002)

Alon Halevy, Daniel Weld, Dan Suciu, Zachary G. Ives, Zachary G. Ives, Zachary G. Ives

A major problem today is that important data is scattered throughout dozens of separately evolved data sources, in a form that makes the "big picture" difficult to obtain. Data integration...

What can databases do for peer-to-peer (2001)

Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu

The Internet community has recently been focused on peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision — a decentralized community of machines pooling their resources to...

What can databases do for peer-to-peer (2001)

Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu

The Internet community has recently been focused on peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision | a decentralized community of machines pooling their resources to benet...

Reconciling schemas of disparate data sources: A machine-learning approach (2001)

Anhai Doan, Pedro Domingos, Alon Halevy

A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of...

What can peer-to-peer do for databases, and vice versa (2001)

Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu

The latest focus of the Internet community has centered around peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision | a decentralized community of machines pooling their...

MiniCon: A Scalable Algorithm for Answering Queries Using Views (2001)

Rachel Pottinger, Rachel Pottinger, Alon Halevy, Alon Halevy

The problem of answering queries using views is to nd ecient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations....

The VLDB Journal (2001) 10: 182–198 / Digital Object Identifier (DOI) 10.1007/s007780100048 MiniCon: A scalable algorithm for answering queries using views (2001)

Rachel Pottinger, Alon Halevy

Abstract. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the...

Query containment for data integration systems (2000)

Todd Millstein, Alon Halevy, Marc Friedman

The problem of query containment is fundamental to many aspects of database systems,including query optimization,determining independence of queries from updates,and rewriting queries using views. In...