08421 Working Group: Lineage/Provenance (2009)
Das Sarma, Anish, Deshpande, Amol, Hubauer, Thomas, Ilyas, Ihab F., König-Ries, Birgitta, Renz, Matthias, ...
The following summary tries to capture a collection of state-of-the-art techniques and challenges for future work on lineage management in uncertain and probabilistic databases that we discussed in...
08421 Working Group: Classification, Representation and Modeling (2009)
Das Sarma, Anish, De Keijzer, Ander, Deshpande, Amol, Haas, Peter J., Ilyas, Ihab F., Koch, Christoph, ...
This report briefly summarizes the discussions carried out in the working group on classification, representation and modeling of uncertain data. The discussion was divided into two subgroups: the...
TopX – AdHoc and Feedback Tasks (2008)
Martin Theobald, Andreas Broschart, Ralf Schenkel, Silvana Solomon, Gerhard Weikum
Abstract. This paper describes the setup and results of our contributions
This paper proposes a demo of the TopX search engine, an extensive framework for unified indexing, querying, and ranking of large collections of unstructured, semistructured, and structured data....
TopX: Efficient and versatile top-k query processing for semistructured data (2008)
Martin Theobald, Fakultät I Prof, Dr. Thorsten Herfet, ...
TopX ist eine Top-k Suchmaschine für Text und XML Daten. Im Gegensatz
TopX: Efficient and versatile top-k query processing for semistructured data (2008)
Martin Theobald, Holger Bast, Debapriyo Majumdar, Ralf Schenkel Gerhard, Martin Theobald, Holger Bast, ...
Abstract Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval...
TopX: Efficient and versatile top-k query processing for semistructured data (2008)
Martin Theobald, Ralf Schenkel, Gerhard Weikum
Abstract: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and...
TopX: Efficient and versatile top-k query processing for semistructured data (2008)
Martin Theobald, Ralf Schenkel, Gerhard Weikum
Abstract: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and...
Databases with uncertainty and lineage (2008)
Benjelloun, Omar, Das Sarma, Anish, Halevy, Alon Y., Theobald, Martin, Widom, Jennifer
Broschart, Andreas, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard
This paper describes the setup and results of the Max-Planck-Institut f{\"u}r Informatik's contributions for the INEX 2007 AdHoc Track task. The runs were produced with TopX, a search engine for...
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases (2008)
Das Sarma, Anish, Theobald, Martin, Widom, Jennifer
We study the problem of computing query results with confidence values in ULDBs: relational databases with uncertainty and lineage. ULDBs, which subsume probabilistic databases, offer an alternative...
Photospread: a spreadsheet for managing photos (2008)
Kandel, Sean, Abelson, Eric, Garcia-Molina, Hector, Paepcke, Andreas, Theobald, Martin
TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data (2008)
Theobald, Martin, Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Weikum, Gerhard
Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over...
Trio-One: Layering Uncertainty and Lineage on a (2007)
Michi Mutsuzaki, Martin Theobald, Ander De Keijzer, Jennifer Widom, Parag Agrawal, Omar Benjelloun, ...
Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS...
Broschart, Andreas, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard
This paper describes the setup and results of the Max-Planck-Institut f{\"u}r Informatik's contributions for the INEX 2007 AdHoc Track task. The runs were produced with TopX, a search engine for...
Efficient Text Proximity Search (2007)
Schenkel, Ralf, Broschart, Andreas, Hwang, Seungwon, Theobald, Martin, Weikum, Gerhard
In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches...
COMPASS: A Concept-Based Web Search Engine for HTML, XML, and Deep Web Data (2007)
Graupmann, Jens, Biwer, Michael, Zimmer, Christian, Zimmer, Patrick, Bender, Matthias, Theobald, Martin, ...
TopX - Adhoc Track and Feedback Task (2007)
Theobald, Martin, Broschart, Andreas, Schenkel, Ralf, Solomon, Silvana, Weikum, Gerhard
This paper describes the setup and results of the Max-Planck-Institut für Informatik’s contributions for the {INEX} 2006 AdHoc Track and Feedback task. The runs were produced with the Top{X}...
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard
This paper presents a comprehensive overview of the Top{X} search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and structured...
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard
This paper proposes a demo of the Top{X} search engine, an extensive framework for unified indexing, querying, and ranking of large collections of unstructured, semistructured, and structured data....
TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous...
Io-top-k: Index-access optimized top-k query processing (2006)
Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum
Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...
Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data (2006)
Ralf Schenkel, Martin Theobald
Abstract. Relevance Feedback is an important way to enhance retrieval quality by integrating relevance information provided by a user. In XML retrieval, feedback engines usually generate an expanded...
der Naturwissenschaftlich-Technischen Fakultät I (2006)
Martin Theobald, Fakultät I Prof, Dr. Thorsten Herfet, Vorsitzender Prüfungskommission, ...
TopX ist eine Top-k Suchmaschine für Text und XML Daten. Im Gegensatz
Io-top-k: Index-access optimized top-k query processing (2006)
Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum
Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...
Structural feedback for keyword-based xml retrieval (2006)
Ralf Schenkel, Martin Theobald
Abstract. Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a...
Io-top-k: Index-access optimized top-k query processing (2006)
Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum
Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...
IO-Top-k: Index-Access Optimized Top-k Query Processing (2006)
Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Dayal, Umeshwar, ...
Top-$k$ query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data....
IO-Top-k at TREC 2006: Terabyte Track (2006)
Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Voorhees, Ellen M., ...
IO-Top-k: Index-access Optimized Top-k Query Processing (2006)
Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Dayal, Umeshwar, ...
Top-$k$ query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data....
Structural Feedback for Keyword-Based XML Retrieval (2006)
Schenkel, Ralf, Theobald, Martin, Lalmas, Mounia, MacFarlane, Andy, Rüger, Stefan M., Tombros, Anastasios, ...
Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...
Relevance Feedback for Structural Query Expansion (2006)
Schenkel, Ralf, Theobald, Martin, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella
Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...
IO-Top-k at TREC 2006: Terabyte Track (2006)
Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Voorhees, Ellen M., ...
This paper describes the setup and results of our contribution to the TREC 2006 Terabyte Track. Our implementation was based on the algorithms proposed in [IO-Top-k: Index-Access Optimized Top-K...
Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data (2006)
Schenkel, Ralf, Theobald, Martin, Ioannidis, Yannis, Scholl, Marc H., Schmidt, Joachim W., Matthes, Florian, ...
Relevance Feedback is an important way to enhance retrieval quality by integrating relevance information provided by a user. In XML retrieval, feedback engines usually generate an expanded query from...
TopX & XXL at INEX 2005 (Ad-Hoc Track) (2006)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, ...
We participated with two different and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses...
TopX - AdHoc Track and Feedback Task (2006)
Theobald, Martin, Broschart, Andreas, Schenkel, Ralf, Solomon, Silvana, Weikum, Gerhard, Fuhr, Norbert, ...
This paper describes the setup and results of our contributions to the INEX 2006 AdHoc and Feedback tasks.
Martin Theobald, Ralf Schenkel, Gerhard Weikum
Abstract. We participated with two different and independent search engines in this year’s INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this...
Relevance feedback for structural query expansion (2005)
Ralf Schenkel, Martin Theobald
Abstract. Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a...
Martin Theobald, Ralf Schenkel, Gerhard Weikum
Abstract. We participated with two different and independent search engines in this year’s INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this...
An Efficient and Versatile Query Engine for TopX Search (2005)
Martin Theobald, Ralf Schenkel, Gerhard Weikum
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold...
An efficient and versatile query engine for TopX search (2005)
Martin Theobald, Ralf Schenkel, Gerhard Weikum
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold...
Efficient and self-tuning incremental query expansion for top-k query processing (2005)
We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion...
An Efficient and Versatile Query Engine for TopX Search (2005)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Böhm, Klemens, Jensen, Christian S., Haas, Laura M., ...
Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing (2005)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Baeza-Yates, Ricardo A., Ziviani, Nivio, Marchionini, Gary, ...
We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion...
Learning Word-to-Concept Mappings for Automatic Text Classification (2005)
Ifrim, Georgiana, Theobald, Martin, Weikum, Gerhard, De Raedt, Luc, Wrobel, Stefan
For both classification and retrieval of natural language text documents, the standard document representation is a term vector where a term is simply a morphological normal form of the corresponding...
TopX & XXL at INEX 2005 (2005)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, ...
We participated with two different and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses...
Relevance Feedback for Structural Query Expansion (2005)
Schenkel, Ralf, Theobald, Martin, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella
eyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...
Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification (2005)
Mavroeidis, Dimitrios, Tsatsaronis, George, Vazirgiannis, Michalis, Theobald, Martin, Weikum, Gerhard, Jorge, Alípio, ...
The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification...
BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources (2004)
Theobald, Martin ; Klas, Claus-Peter
Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and managing...
BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources (2004)
Theobald, Martin ; Klas, Claus-Peter
Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and managing...
Towards a Statistically Semantic Web (2004)
Weikum, Gerhard, Graupmann, Jens, Schenkel, Ralf, Theobald, Martin, Atzeni, Paolo, Chu, Wesley, ...
The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this...
Top-k Query Evaluation with Probabilistic Guarantees (2004)
Theobald, Martin, Weikum, Gerhard, Schenkel, Ralf, Nascimento, Mario A., Özsu, M. Tamer, Kossmann, Donald, ...
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algo-rithm for evaluating...
COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data (2004)
Graupmann, Jens, Biwer, Michael, Zimmer, Christian, Zimmer, Patrick, Bender, Matthias, Theobald, Martin, ...
COMPASS: A concept-based Web search engine for HTML, XML, and deep Web data (2004)
Jens Graupmann, Michael Biwer, Christian Zimmer, Patrick Zimmer, Matthias Bender, Martin Theobald, ...
Top-k Query Evaluation with Probabilistic Guarantees (2004)
Martin Theobald, Gerhard Weikum, Ralf Schenkel
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating...
BINGO! and DAFFODIL: Personalized Exploration of Digital Libraries and Web Sources (2004)
Martin Theobald, Claus-Peter Klas
Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and...
Top-k Query Evaluation with Probabilistic Guarantees (2004)
Martin Theobald, Gerhard Weikum, Ralf Schenkel
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating...
COMPASS: A Concept-based {Web} Search Engine for {HTML,} {XML,} and {Deep} {Web} {Data} (2004)
Graupmann, Jens, Biwer, Michael, Zimmer, Christian, Zimmer, Patrick, Bender, Matthias, Theobald, Martin, ...
Top-k Query Evaluation with Probabilistic Guarantees (2004)
Theobald, Martin, Weikum, Gerhard, Schenkel, Ralf, Nascimento, Mario A., Özsu, M. Tamer, Kossmann, Donald, ...
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algo-rithm for evaluating...
Towards a Statistically Semantic Web (2004)
Weikum, Gerhard, Graupmann, Jens, Schenkel, Ralf, Theobald, Martin, Atzeni, Paolo, Chu, Wesley, ...
The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this...
Classification and Focused Crawling for Semistructured Data (2003)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Blanken, Henk, Grabs, Torsten, Schek, Hans-Jörg, ...
Graupmann, Jens, Sizov, Sergej, Theobald, Martin, Freytag, Johann Christoph, Lockemann, Peter C., Abiteboul, Serge, ...
The BINGO! System for Information Portal Generation and Expert Web Search (2003)
Sizov, Sergej, Theobald, Martin, Siersdorfer, Stefan, Weikum, Gerhard, Graupmann, Jens, Biwer, Michael, ...
The bingo! system for information portal generation and expert web search (2003)
Sergej Sizov, Michael Biwer, Jens Graupmann, Stefan Siersdorfer, Martin Theobald, Gerhard Weikum, ...
This paper presents the BINGO! focused crawler, an advanced tool for information portal generation and expert Web search. In contrast to standard search engines such as Google which are solely based...
This paper investigates how to automatically classify schemaless XML data into a user-defined topic directory. The main focus is on constructing appropriate feature spaces on which a classifier...
Weikum: The BINGO! Focused Crawler: From Bookmarks to Archetypes (2002)
Sergej Sizov, Stefan Siersdorfer, Martin Theobald, Gerhard Weikum
Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. Consider an advanced Web user, say a researcher or a student, who is looking for the...
Efficient Top-k Query Processing for Text, Semistructured, and Structured Data
TopX is a top-$k$ retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the $k$ top-ranked result objects according to a...