Harnessing the Deep Web: Present and Future (2009)
Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy
The Deep Web refers to content hidden behind HTML forms. In order to get to such content, a user has to perform a form submission with valid input values. The name Deep Web arises from the fact that...
Harnessing the Deep Web: Present and Future (2009)
Madhavan, Jayant, Afanasiev, Loredana, Antova, Lyublena, Halevy, Alon
Over the past few years, we have built a system that has exposed large volumes of Deep-Web content to Google.com users. The content that our system exposes contributes to more than 1000 search...
Enterprise Information Integration: Successes, Challenges and Controversies (2009)
Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...
The goal of EII systems is to provide uniform access to multiple data sources without having to first load them into a data warehouse. Since the late 1990’s, several EII products have appeared in...
ABSTRACT Google’s Deep-Web Crawl (2009)
Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web,...
Enterprise Information Integration: Successes, Challenges and Controversies (2008)
Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...
The goal of EII Systems is to provide uniform access to multiple data sources without having to first loading them into a data warehouse. Since the late 1990’s, several EII products have appeared...
A Platform for Personal Information Management and Integration (2008)
The explosion of the amount of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on...
Enterprise Information Integration: Successes, Challenges and Controversies (2008)
Naveen Ashish, Dina Bitton, Michael Carey, Denise Draper, Jeff Pollock, ...
The goal of EII Systems is to provide uniform access to multiple data sources without having to first loading them into a data warehouse. Since the late 1990’s, several EII products have appeared...
Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy
Abstract On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between...
Philip A. Bernstein, Nishant Dani, Badriddine Khessib, Ramesh Manne, David Shutt, Jayant Madhavan, ...
A funny thing happened on the way to a billion........................................... Alfredo Alba,
SEMEX: Mining for Personal Information Integration (2008)
Xin Dong, Alon Halevy, Ema Nemes, Stephan B. Sigurdsson, Pedro Domingos
Abstract. Personal information management is one of the key applications of the semantic web. Whereas today’s devices store data according to applications, ideal personal information management...
ABSTRACT The Piazza Peer Data Management Project (2008)
Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...
A major problem in today’s information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...
Masaru Kitsuregawa, Betty Salzberg, Gonzalo Navarro, Ricardo Baeza-yates, Erkki Sutinen, Jorma Tarhio, ...
IntegratingDiverseInformationManagementSystems:ABriefSurvey..................................
These slides are based in part on slides from (2008)
Craig Knoblock, Jose Luis Ambite, Craig Knoblock, Jose Luis Ambite, Craig Knoblock, José Luis Ambite, ...
The material in these notes is copyrighted by its respective authors. It does not count as published. For more information on ICAPS, please visit www.icaps-conference.org. Planning on the Web
ABSTRACT Semantic Email (2008)
Luke Mcdowell, Oren Etzioni, Alon Halevy, Henry Levy
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...
Webtables: Exploring the power of tables on the web (2008)
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from Google’s...
Bootstrapping pay-as-you-go data integration systems (2008)
Anish Das Sarma, Xin Dong, Alon Halevy
Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront...
Databases and Web 2.0 panel at VLDB 2007 (2008)
Amer-Yahia, Sihem, Markl, Volker, Halevy, Alon, Doan, AnHai, Alonso, Gustavo, Kossmann, Donald, ...
Google’s deep-web crawl (2008)
Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web,...
ABSTRACT The Piazza Peer Data Management Project (2007)
Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...
A major problem in today’s information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...
ABSTRACT Evolving the Semantic Web with Mangrove (2007)
Luke Mcdowell, Oren Etzioni, Steven D. Gribble, Alon Halevy, Henry Levy, William Pentney, ...
Despite numerous proposals for its creation, the semantic web has yet to achieve widespread adoption. Recently, some researchers have argued that participation in the semantic web is too difficult...
Research on Statistical Relational Learning at the University of Washington (2007)
Pedro Domingos, Yeuhi Abe, Corin Anderson, Anhai Doan, Dieter Fox, Alon Halevy, ...
This paper presents an overview of the research on learning statistical models from relational data being carried out at the University of Washington. Our work falls into five main directions:...
ABSTRACT Learning to Map between Ontologies on the Semantic Web (2007)
Anhai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy
Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing....
Masaru Kitsuregawa, Betty Salzberg, Mary Fern, Atsuyuki Morishima, Dan Suciu, Wang-chiew Tan, ...
The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...
Research on Statistical Relational Learning (2007)
Pedro Domingos, Yeuhi Abe, Corin Anderson, Anhai Doan, Dieter Fox, ...
This paper presents an overview of the research on learning statistical models of relational data being carried out at the University of Washington. Our work falls into five main directions: learning...
ABSTRACT Semantic Email (2007)
Luke Mcdowell, Oren Etzioni, Alon Halevy, Henry Levy
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...
Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume...
CID Name Quarter CSE444 Databases fall (2007)
Alon Halevy, Charles Undergrad, Dan Grad
• Data integration: – Connecting disparate data sources – Great progress in last decade • But we’re still missing the point: – Dataspaces: a new abstraction • A few connections to my...
Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)
Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
Ana-maria Popescu, Ana-maria Popescu, Oren Etzioni, Oren Etzioni, Alon Halevy, Dan Weld
This is to certify that I have examined this copy of a doctoral dissertation by
Soliciting User Feedback in a Dataspace System (2007)
Shawn Jeffery, Michael Franklin, Alon Halevy, Shawn R. Jeffery
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the...
Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)
Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
Visualization of heterogeneous data (2007)
Mike Cammarano, Xin (luna Dong, Bryan Chan, Jeff Klingner, Justin Talbot, Alon Halevy, ...
Abstract — Both the Resource Description Framework (RDF), used in the semantic web, and Maya Viz u-forms represent data as a graph of objects connected by labeled edges. Existing systems for...
Jing Liu, Xin Dong, Alon Halevy
There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms...
Principles of dataspace systems (2006)
Alon Halevy, Google Inc, Michael Franklin
The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient,...
Structured Data Meets the Web: A Few Observations (2006)
Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (luna Dong, Shawn R. Jeffery, David Ko, ...
and offer some principlesfor addressing them in a general fashion.
Data integration: The teenage years (2006)
Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...
Structured Data Meets the Web: A Few Observations (2006)
Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (luna Dong, Shawn R. Jeffery, David Ko, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
Data integration: The teenage years (2006)
Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...
Personal information management with SEMEX (2005)
Yuhan Cai, Xin Luna Dong, Alon Halevy, Jing Michelle Liu, Jayant Madhavan
The explosion of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on the WWW,...
Corpus-based schema matching (2005)
Jayant Madhavan, Philip A. Bernstein, Anhai Doan, Alon Halevy
Schema Matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have...
Corpus-based schema matching (2005)
Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon Halevy, Pradeep Shenoy
Schema matching is the problem of determining a set of correspondences that identify similar elements in two different schemas. In this paper we propose a novel method for matching schemas that...
Gerome Miklau, Gerome Miklau, Dan Suciu, Alon Halevy, Dan Suciu, John Zahorjan, ...
This is to certify that I have examined this copy of a doctoral dissertation by
Digital Library Information-Technology Infrastructures (2005)
Ioannidis, Yannis, Maier, David, Abiteboul, Serge, Buneman, Peter, Davidson, Susan, Fox, Edward, ...
This paper charts a research agenda on systems-oriented issues in digital libraries. It focuses on the most central and generic system issues, including system architecture, user-level functionality,...
McDowell, Luke, Etzioni, Oren, Halevy, Alon, Levy, Henry
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF...
Rethinking the Conference Reviewing Process - Panel (2004)
Franklin,Michael J., Widom,Jennifer, Weikum,Gerhard, Bernstein,Philip A., Halevy,Alon, DeWitt,David J., ...
Rethinking the Conference Reviewing Process - Panel (2004)
Franklin, Michael J., Widom, Jennifer, Weikum, Gerhard, Bernstein, Philip A., Halevy, Alon, DeWitt, David J., ...
Similarity search for web services (2004)
Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang
Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new...
Specifying semantic email processes (2004)
Luke Mcdowell, Oren Etzioni, Alon Halevy
Prior work has shown that semantic email processes (SEPs) can be an effective tool for automating emailmediated tasks that are currently performed manually in a tedious, time-consuming, and...
Semantic email: theory and applications (2004)
Luke Mcdowell, Oren Etzioni, Alon Halevy
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of a...
Semex: Toward on-the-fly personal information integration (2004)
Xin Dong, Alon Halevy, Ema Nemes, Stephan B. Sigurdsson, Pedro Domingos
On-the-fly information integration attempts to change the basic cost-benefit equation association with building information integration applications. This paper argues that on-the-fly can be...
Similarity search for web services (2004)
Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang
Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new...
CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter (2004)
Alon Halevy, Charles Undergrad, Dan Grad
<cd> <title> The best of … </title> <artist> Carreras </artist> </cd> <artist> Pavarotti </artist> <artist> Domingo </artist>...
Co-Chairs of Supervisory Committee: (2004)
Luke K. Mcdowell, Luke K. Mcdowell, Oren Etzioni, Alon Halevy, Oren Etzioni, Alon Halevy, ...
and have found that it is complete and satisfactory in all respects,
iMAP: Discovering Complex Semantic Matches between Database Schemas (2004)
Robin Dhamankar, Yoonkyong Lee, Anhai Doan, Alon Halevy, Pedro Domingos
Creating semantic matches between disparate data sources is fundamental to numerous data sharing e#orts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have...
Efficient Query Reformulation in Peer Data Management Systems (2004)
Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peer's domain of interest,...
Semantic Email: Theory and Applications (2004)
Luke Mcdowell, Oren Etzioni, Alon Halevy
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of a...
The Specification of Agent Behavior by Ordinary People: A Case Study (2004)
Luke Mcdowell, Oren Etzioni, Alon Halevy
The development of intelligent agents is a key part of the Semantic Web vision, but how does an ordinary person tell an agent what to do? One approach to this problem is to use RDF templates that are...
iMAP: discovering complex semantic matches between database schemas (2004)
Robin Dhamankar, Yoonkyong Lee, Anhai Doan, Alon Halevy, Pedro Domingos
Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have...
Rethinking the Conference Reviewing Process - Panel (2004)
Franklin, Michael J., Widom, Jennifer, Weikum, Gerhard, Bernstein, Philip A., Halevy, Alon, DeWitt, David J., ...
The Lowell Database Research Self Assessment (2003)
Abiteboul, Serge, Agrawal, Rakesh, Bernstein, Phil, Carey, Mike, Ceri, Stefano, Croft, Bruce, ...
A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the...
The Piazza Peer Data Management Project (2003)
Tatarinov, Igor, Ives, Zachary G, Madhavan, Jayant, Halevy, Alon, Suciu, Dan, Dalvi, Nilesh, ...
A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...
Schema Mediation in Peer Data Management Systems (2003)
Halevy, Alon, Ives, Zachary G, Suciu, Dan, Tatarinov, Igor
Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems:...
Crossing the Structure Chasm (2003)
Etzioni, Oren, Halevy, Alon, Doan, Anhai, Ives, Zachary G, Madhaven, Jayant, McDowell, Luke, ...
It has frequently been observed that most of the world's data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....
Learning to match the schemas of data sources: A multistrategy approach (2003)
Anhai Doan, Pedro Domingos, Alon Halevy
The problem of integrating data from multiple data sources- either on the Internet or within enterprises- has received much attention in the database and AI communities. The focus has been on...
Crossing the structure chasm (2003)
Alon Halevy, Oren Etzioni, Anhai Doan, Zachary Ives, Ý Jayant Madhavan, Luke Mcdowell, ...
It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....
Crossing the structure chasm (2003)
Alon Halevy, Oren Etzioni, Anhai Doan, Zachary Ives, Ý Jayant Madhavan, Luke Mcdowell, ...
It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others....
The Piazza Peer Data Management Project (2003)
Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, ...
A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...
Serge Abiteboul, Rakesh Agrawal, Phil Bernstein, Mike Carey, Stefano Ceri, Bruce Croft, ...
This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of...
Learning to Match Ontologies on the Semantic Web (2003)
Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them....
Semantic Email: Adding Lightweight Data Manipulation Capabilities to the Email Habitat (2003)
Oren Etzioni Alon, Alon Halevy, Henry Levy, Luke Mcdowell
B "2C !$-$ 1;@ 0-D !C@E $1 !;D, %2- 7 B*F!'G J 8'(:)<K+ !<!<0'<0 0LKM !"2B ,', 0N !POQ)?$" $&'? 0IA <0 ,IR56-N,-'R,? ! !"2B...
The Piazza Peer Data Management Project (2003)
Igor Tatarinov Zachary, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, ...
A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing...
Ontology Matching: A Machine Learning Approach (2003)
Anhai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy
Finally, we describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings. 1 A Motivating Example: the Semantic Web The current World-Wide...
Learning to Match Ontologies on the Semantic Web (2003)
Anhai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them....
Anhai Doan, Alon Halevy, Natasha Noy
In numerous distributed environments, including today's World-Wide Web, organizational intranets, and the emerging Semantic Web, the applications will inevitably use the information described by...
Learning to Map Between Ontologies on the Semantic Web (2002)
Doan, AnHai, Madhavan, Jayant, Domingos, Pedro, Halevy, Alon
Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machineunderstandable data, opening myriad opportunities for automated information processing....
Efficiently ordering query plans for data integration (2002)
The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query...
An evolutionary approach to the semantic web (2002)
Oren Etzioni, Steve Gribble, Alon Halevy, Henry Levy, Luke Mcdowell
Proposals for creating a semantic web have been around at least since 1995 [Dobson and Burrill, 1995]. A wide range of semantic markup languages have been proposed including RDF, N3, SHOE, DAML, and...
Efficient Query Processing for Data Integration (2002)
Alon Halevy, Daniel Weld, Dan Suciu, Zachary G. Ives, Zachary G. Ives, Zachary G. Ives
A major problem today is that important data is scattered throughout dozens of separately evolved data sources, in a form that makes the "big picture" difficult to obtain. Data integration...
What can databases do for peer-to-peer (2001)
Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu
The Internet community has recently been focused on peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision — a decentralized community of machines pooling their resources to...
What can databases do for peer-to-peer (2001)
Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu
The Internet community has recently been focused on peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision | a decentralized community of machines pooling their resources to benet...
Reconciling schemas of disparate data sources: A machine-learning approach (2001)
Anhai Doan, Pedro Domingos, Alon Halevy
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of...
What can peer-to-peer do for databases, and vice versa (2001)
Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu
The latest focus of the Internet community has centered around peer-to-peer systems like Napster, Gnutella, and Freenet. The grand vision | a decentralized community of machines pooling their...
MiniCon: A Scalable Algorithm for Answering Queries Using Views (2001)
Rachel Pottinger, Rachel Pottinger, Alon Halevy, Alon Halevy
The problem of answering queries using views is to nd ecient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations....
Abstract. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the...
Query containment for data integration systems (2000)
Todd Millstein, Alon Halevy, Marc Friedman
The problem of query containment is fundamental to many aspects of database systems,including query optimization,determining independence of queries from updates,and rewriting queries using views. In...