| Almost-Constant-Time Clustering of Arbitrary Corpus Subsets (1997) | |||||||||||||||
Abstract | |||||||||||||||
| Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almostconstant -time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sized and large sets it is significantly faster. This algorithm is useful for clustering arbitrary subsets of large corpora --- obtained, for instance, by a boolean search --- quickly enough to be useful in an interactive setting. 1 Introduction Document clustering has emerged as an important tool for the presentation and navigation of document collections. For example, the Scatter/Gather browsing paradigm clusters documents into topic-coherent groups and presents descriptive textual summaries to the user [2]. Informed by the summaries, the user may select clusters, thereby forming a subcollection, for iterative examination. The clustering and reclustering is done ... | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||