Building Query Optimizers for Information Extraction: The SQoUT Project (2009)
Alpa Jain, Panagiotis Ipeirotis, Luis Gravano
Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relations from documents,...
Names and Similarities on the Web: Fact Extraction in the Fast Lane Marius Pas¸ca (2009)
Google Inc, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, Alpa Jain
In a new approach to large-scale extraction of facts from unstructured text, distributional similarities become an integral part of both the iterative acquisition of high-coverage contextual...
Jain, Alpa, Ipeirotis, Panagiotis G., Gravano, Luis, Doan, Anhai
Information extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce...
Jain, Alpa, Ipeirotis, Panagiotis G., Gravano, Luis, Doan, Anhai
Information extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce...
Exploring a Few Good Tuples From a Text Database (2008)
Jain, Alpa, Srivastava, Divesh
Information extraction from text databases is a useful paradigm to populate relational tables and unlock the considerable value hidden in plain-text documents. However, information extraction can be...
A Quality-Aware Optimizer for Information Extraction (2008)
Jain, Alpa, Ipeirotis, Panagiotis G.
Large amounts of structured information is buried in unstructured text. Information extraction systems can extract structured relations from the documents and enable sophisticated, SQL-like queries...
A Quality-Aware Optimizer for Information Extraction (2008)
Jain, Alpa, Ipeirotis, Panagiotis G.
Large amounts of structured information is buried in unstructured text. Information extraction systems can extract structured relations from the documents and enable sophisticated, SQL-like queries...
Acronym-Expansion Recognition and Ranking on the Web (2008)
The paper presents a study on large-scale automatic extraction of acronyms and associated expansions from Web data and from the user interactions with this data through Web search engines. We...
Optimizing SQL Queries over Text Databases (2008)
Alpa Jain, Anhai Doan, Luis Gravano
Abstract — Text documents often embed data that is structured in nature, and we can expose this structured data using information extraction technology. By processing a text database with...
Services) project is building Columbia PSL's “proof-of-concept ” realization of NICCI. NICCI (Network-centric Infrastructure for Command, Control and Intelligence) is a prospective DARPA...