| Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection ∗ (2008) | |||||||||||||||
Abstract | |||||||||||||||
| Nearest neighbor graphs are widely used in data mining and machine learning. The brute-force method to compute the exact kNN graph takes Θ(dn 2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn t) time for high dimensional data (large d). The exponent t depends on an internal parameter and is larger than one. Experiments show that a high quality graph usually requires a small t which is close to one. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs. 1 | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||