Publication View

ABSTRACT Building a Distributed Full-Text Index for the Web (2008)

Abstract
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed indexing system that we have implemented.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.119.1892
Source http://www-db.stanford.edu/~byang/pubs/www10paper.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Keywords H.3.4 [Information Systems, Systems and Software—Distributed systems Keywords Distributed indexing, Text retrieval, Inverted files, Pipelining, Embedded databases
Type text
Language English
Relation 10.1.1.87.9634, 10.1.1.18.1519, 10.1.1.18.8282, 10.1.1.85.7719, 10.1.1.50.122, 10.1.1.24.8162, 10.1.1.96.1350, 10.1.1.21.2605, 10.1.1.56.2407, 10.1.1.42.5298, 10.1.1.52.4403, 10.1.1.111.2655