Martin Theobald

08421 Working Group: Lineage/Provenance (2009)

Das Sarma, Anish, Deshpande, Amol, Hubauer, Thomas, Ilyas, Ihab F., König-Ries, Birgitta, Renz, Matthias, ...

The following summary tries to capture a collection of state-of-the-art techniques and challenges for future work on lineage management in uncertain and probabilistic databases that we discussed in...

08421 Working Group: Classification, Representation and Modeling (2009)

Das Sarma, Anish, De Keijzer, Ander, Deshpande, Amol, Haas, Peter J., Ilyas, Ihab F., Koch, Christoph, ...

This report briefly summarizes the discussions carried out in the working group on classification, representation and modeling of uncertain data. The discussion was divided into two subgroups: the...

TopX – AdHoc and Feedback Tasks (2008)

Martin Theobald, Andreas Broschart, Ralf Schenkel, Silvana Solomon, Gerhard Weikum

Abstract. This paper describes the setup and results of our contributions

Textual Databases, H.3 [Information Storage and Retrieval]: General General Terms: Performance (2008)

Martin Theobald

This paper proposes a demo of the TopX search engine, an extensive framework for unified indexing, querying, and ranking of large collections of unstructured, semistructured, and structured data....

TopX: Efficient and versatile top-k query processing for semistructured data (2008)

Martin Theobald, Holger Bast, Debapriyo Majumdar, Ralf Schenkel Gerhard, Martin Theobald, Holger Bast, ...

Abstract Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval...

TopX: Efficient and versatile top-k query processing for semistructured data (2008)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

Abstract: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and...

TopX: Efficient and versatile top-k query processing for semistructured data (2008)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

Abstract: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and...

TopX @ INEX 2007 (2008)

Broschart, Andreas, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard

This paper describes the setup and results of the Max-Planck-Institut f{\"u}r Informatik's contributions for the INEX 2007 AdHoc Track task. The runs were produced with TopX, a search engine for...

Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases (2008)

Das Sarma, Anish, Theobald, Martin, Widom, Jennifer

We study the problem of computing query results with confidence values in ULDBs: relational databases with uncertainty and lineage. ULDBs, which subsume probabilistic databases, offer an alternative...

TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data (2008)

Theobald, Martin, Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Weikum, Gerhard

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over...

Trio-One: Layering Uncertainty and Lineage on a (2007)

Michi Mutsuzaki, Martin Theobald, Ander De Keijzer, Jennifer Widom, Parag Agrawal, Omar Benjelloun, ...

Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS...

TopX @ INEX 2007 (2007)

Broschart, Andreas, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard

This paper describes the setup and results of the Max-Planck-Institut f{\"u}r Informatik's contributions for the INEX 2007 AdHoc Track task. The runs were produced with TopX, a search engine for...

Efficient Text Proximity Search (2007)

Schenkel, Ralf, Broschart, Andreas, Hwang, Seungwon, Theobald, Martin, Weikum, Gerhard

In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches...

TopX - Adhoc Track and Feedback Task (2007)

Theobald, Martin, Broschart, Andreas, Schenkel, Ralf, Solomon, Silvana, Weikum, Gerhard

This paper describes the setup and results of the Max-Planck-Institut für Informatik’s contributions for the {INEX} 2006 AdHoc Track and Feedback task. The runs were produced with the Top{X}...

TopX - Efficient and Versatile Top-k Query Processing for Text, Semistructured, and Structured Data (2007)

Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard

This paper presents a comprehensive overview of the Top{X} search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and structured...

The TopX DB&IR engine (2007)

Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard

This paper proposes a demo of the Top{X} search engine, an extensive framework for unified indexing, querying, and ranking of large collections of unstructured, semistructured, and structured data....

TopX : efficient and versatile top-k query processing for text, structured, and semistructured data (2006)

Theobald, Martin

TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous...

Io-top-k: Index-access optimized top-k query processing (2006)

Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...

Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data (2006)

Ralf Schenkel, Martin Theobald

Abstract. Relevance Feedback is an important way to enhance retrieval quality by integrating relevance information provided by a user. In XML retrieval, feedback engines usually generate an expanded...

Io-top-k: Index-access optimized top-k query processing (2006)

Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...

Structural feedback for keyword-based xml retrieval (2006)

Ralf Schenkel, Martin Theobald

Abstract. Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a...

Io-top-k: Index-access optimized top-k query processing (2006)

Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k...

IO-Top-k: Index-Access Optimized Top-k Query Processing (2006)

Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Dayal, Umeshwar, ...

Top-$k$ query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data....

IO-Top-k: Index-access Optimized Top-k Query Processing (2006)

Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Dayal, Umeshwar, ...

Top-$k$ query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data....

Structural Feedback for Keyword-Based XML Retrieval (2006)

Schenkel, Ralf, Theobald, Martin, Lalmas, Mounia, MacFarlane, Andy, Rüger, Stefan M., Tombros, Anastasios, ...

Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...

Relevance Feedback for Structural Query Expansion (2006)

Schenkel, Ralf, Theobald, Martin, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella

Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...

IO-Top-k at TREC 2006: Terabyte Track (2006)

Bast, Holger, Majumdar, Debapriyo, Schenkel, Ralf, Theobald, Martin, Weikum, Gerhard, Voorhees, Ellen M., ...

This paper describes the setup and results of our contribution to the TREC 2006 Terabyte Track. Our implementation was based on the algorithms proposed in [IO-Top-k: Index-Access Optimized Top-K...

Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data (2006)

Schenkel, Ralf, Theobald, Martin, Ioannidis, Yannis, Scholl, Marc H., Schmidt, Joachim W., Matthes, Florian, ...

Relevance Feedback is an important way to enhance retrieval quality by integrating relevance information provided by a user. In XML retrieval, feedback engines usually generate an expanded query from...

TopX & XXL at INEX 2005 (Ad-Hoc Track) (2006)

Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, ...

We participated with two different and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses...

TopX - AdHoc Track and Feedback Task (2006)

Theobald, Martin, Broschart, Andreas, Schenkel, Ralf, Solomon, Silvana, Weikum, Gerhard, Fuhr, Norbert, ...

This paper describes the setup and results of our contributions to the INEX 2006 AdHoc and Feedback tasks.

Topx xxl at inex 2005 (2005)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

Abstract. We participated with two different and independent search engines in this year’s INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this...

Relevance feedback for structural query expansion (2005)

Ralf Schenkel, Martin Theobald

Abstract. Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a...

Topx xxl at inex 2005 (2005)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

Abstract. We participated with two different and independent search engines in this year’s INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this...

An Efficient and Versatile Query Engine for TopX Search (2005)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold...

An efficient and versatile query engine for TopX search (2005)

Martin Theobald, Ralf Schenkel, Gerhard Weikum

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold...

Efficient and self-tuning incremental query expansion for top-k query processing (2005)

Martin Theobald

We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion...

Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing (2005)

Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Baeza-Yates, Ricardo A., Ziviani, Nivio, Marchionini, Gary, ...

We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion...

Learning Word-to-Concept Mappings for Automatic Text Classification (2005)

Ifrim, Georgiana, Theobald, Martin, Weikum, Gerhard, De Raedt, Luc, Wrobel, Stefan

For both classification and retrieval of natural language text documents, the standard document representation is a term vector where a term is simply a morphological normal form of the corresponding...

TopX & XXL at INEX 2005 (2005)

Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, ...

We participated with two different and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses...

Relevance Feedback for Structural Query Expansion (2005)

Schenkel, Ralf, Theobald, Martin, Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella

eyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to...

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification (2005)

Mavroeidis, Dimitrios, Tsatsaronis, George, Vazirgiannis, Michalis, Theobald, Martin, Weikum, Gerhard, Jorge, Alípio, ...

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification...

BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources (2004)

Theobald, Martin ; Klas, Claus-Peter

Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and managing...

BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources (2004)

Theobald, Martin ; Klas, Claus-Peter

Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and managing...

Towards a Statistically Semantic Web (2004)

Weikum, Gerhard, Graupmann, Jens, Schenkel, Ralf, Theobald, Martin, Atzeni, Paolo, Chu, Wesley, ...

The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this...

Top-k Query Evaluation with Probabilistic Guarantees (2004)

Theobald, Martin, Weikum, Gerhard, Schenkel, Ralf, Nascimento, Mario A., Özsu, M. Tamer, Kossmann, Donald, ...

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algo-rithm for evaluating...

Top-k Query Evaluation with Probabilistic Guarantees (2004)

Martin Theobald, Gerhard Weikum, Ralf Schenkel

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating...

BINGO! and DAFFODIL: Personalized Exploration of Digital Libraries and Web Sources (2004)

Martin Theobald, Claus-Peter Klas

Daffodil is a digital library system targeted at strategic support of advanced users during the information search process. It provides user-customizable "stratagems" for exploring and...

Top-k Query Evaluation with Probabilistic Guarantees (2004)

Martin Theobald, Gerhard Weikum, Ralf Schenkel

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating...

Top-k Query Evaluation with Probabilistic Guarantees (2004)

Theobald, Martin, Weikum, Gerhard, Schenkel, Ralf, Nascimento, Mario A., Özsu, M. Tamer, Kossmann, Donald, ...

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algo-rithm for evaluating...

Towards a Statistically Semantic Web (2004)

Weikum, Gerhard, Graupmann, Jens, Schenkel, Ralf, Theobald, Martin, Atzeni, Paolo, Chu, Wesley, ...

The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this...

The bingo! system for information portal generation and expert web search (2003)

Sergej Sizov, Michael Biwer, Jens Graupmann, Stefan Siersdorfer, Martin Theobald, Gerhard Weikum, ...

This paper presents the BINGO! focused crawler, an advanced tool for information portal generation and expert Web search. In contrast to standard search engines such as Google which are solely based...

Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data (2003)

Martin Theobald

This paper investigates how to automatically classify schemaless XML data into a user-defined topic directory. The main focus is on constructing appropriate feature spaces on which a classifier...

Weikum: The BINGO! Focused Crawler: From Bookmarks to Archetypes (2002)

Sergej Sizov, Stefan Siersdorfer, Martin Theobald, Gerhard Weikum

Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. Consider an advanced Web user, say a researcher or a student, who is looking for the...

Efficient Top-k Query Processing for Text, Semistructured, and Structured Data

Theobald, Martin

TopX is a top-$k$ retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the $k$ top-ranked result objects according to a...