Publication View

distributed information retrieval (2008)

Abstract
The dramatic growth of the Internet has created a new problem for users: the location of relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Then, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS – Glossary of Servers Server –, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective at determining promising text sources for a given query.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.1382
Source http://www.cs.cmu.edu/~tomasic/doc/1999/GravanoGarciaTomasicTODS1999.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Keywords Categories and Subject Descriptors, H.3 [Information Systems, Information Storage and Retrieval General Terms, Performance, Measurement Additional Key Words and Phrases, Internet search and retrieval, digital libraries, text databases
Type text
Language English
Relation 10.1.1.46.8448, 10.1.1.40.7959, 10.1.1.21.478, 10.1.1.21.2462, 10.1.1.31.1173, 10.1.1.127.4459, 10.1.1.29.8868, 10.1.1.33.2482, 10.1.1.134.4887, 10.1.1.38.4885, 10.1.1.51.7726, 10.1.1.38.7069, 10.1.1.47.7079, 10.1.1.32.2269, 10.1.1.33.7078, 10.1.1.46.2302, 10.1.1.56.801, 10.1.1.18.8800, 10.1.1.21.1701, 10.1.1.17.3123, 10.1.1.13.9060, 10.1.1.54.5826