Publication View

Projections for Efficient Document Clustering (1997)

Abstract
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage. 1 Introduction Clustering is becoming increasingly widespread: It is finding applications in browsing [8, 7], in improving the performance of similarity search tools [16, 19], and in automatically generating thesauri [5, 6]. In query analysis, clu...

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.3039
Source http://www-cs-students.stanford.edu/~csilvers/papers/metrics-sigir.ps
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.136.7906, 10.1.1.38.4937, 10.1.1.21.3062, 10.1.1.34.4297, 10.1.1.23.8248, 10.1.1.13.9326, 10.1.1.103.8568, 10.1.1.108.649, 10.1.1.33.1855, 10.1.1.130.5217, 10.1.1.11.7807, 10.1.1.130.8903, 10.1.1.25.5545, 10.1.1.72.8739, 10.1.1.80.8015, 10.1.1.101.5316, 10.1.1.32.6139, 10.1.1.73.5431, 10.1.1.84.257, 10.1.1.91.466, 10.1.1.111.1791, 10.1.1.36.5346, 10.1.1.39.7065, 10.1.1.64.3871, 10.1.1.109.2595, 10.1.1.38.6432, 10.1.1.40.7762, 10.1.1.60.4574, 10.1.1.80.6273, 10.1.1.86.4360, 10.1.1.130.4741, 10.1.1.101.6092, 10.1.1.106.4395, 10.1.1.106.9810, 10.1.1.107.4361, 10.1.1.109.3473, 10.1.1.65.1000, 10.1.1.69.6740, 10.1.1.69.7131, 10.1.1.77.63, 10.1.1.78.9841, 10.1.1.79.1078, 10.1.1.88.7234, 10.1.1.91.3125, 10.1.1.92.1239, 10.1.1.93.1692, 10.1.1.95.6155, 10.1.1.95.8833, 10.1.1.97.5860, 10.1.1.97.6431, 10.1.1.98.7112, 10.1.1.99.822, 10.1.1.99.8826, 10.1.1.112.3770, 10.1.1.101.2191, 10.1.1.119.3319, 10.1.1.119.3605, 10.1.1.121.3847, 10.1.1.122.6420, 10.1.1.123.1286, 10.1.1.123.5531, 10.1.1.125.6744, 10.1.1.136.5938, 10.1.1.137.7478, 10.1.1.138.8038, 10.1.1.42.5119, 10.1.1.38.7857, 10.1.1.31.7585, 10.1.1.25.6956, 10.1.1.9.6579, 10.1.1.9.9487, 10.1.1.1.8224, 10.1.1.61.447, 10.1.1.140.3011