Publication View

Sharp kernel clustering algorithms and their associated Grothendieck inequalities (2009)

Abstract
In the kernel clustering problem we are given a (large) $n\times n$ symmetric positive semidefinite matrix $A=(a_{ij})$ with $\sum_{i=1}^n\sum_{j=1}^n a_{ij}=0$ and a (small) $k\times k$ symmetric positive semidefinite matrix $B=(b_{ij})$. The goal is to find a partition $\{S_1,...,S_k\}$ of $\{1,... n\}$ which maximizes $ \sum_{i=1}^k\sum_{j=1}^k (\sum_{(p,q)\in S_i\times S_j}a_{pq})b_{ij}$. We design a polynomial time approximation algorithm that achieves an approximation ratio of $\frac{R(B)^2}{C(B)}$, where $R(B)$ and $C(B)$ are geometric parameters that depend only on the matrix $B$, defined as follows: if $b_{ij} = < v_i, v_j>$ is the Gram matrix representation of $B$ for some $v_1,...,v_k\in \R^k$ then $R(B)$ is the minimum radius of a Euclidean ball containing the points $\{v_1, ..., v_k\}$. The parameter $C(B)$ is defined as the maximum over all measurable partitions $\{A_1,...,A_k\}$ of $\R^{k-1}$ of the quantity $\sum_{i=1}^k\sum_{j=1}^k b_{ij}< z_i,z_j>$, where for $i\in \{1,...,k\}$ the vector $z_i\in \R^{k-1}$ is the Gaussian moment of $A_i$, i.e., $z_i=\frac{1}{(2\pi)^{(k-1)/2}}\int_{A_i}xe^{-\|x\|_2^2/2}dx$. We also show that for every $\eps > 0$, achieving an approximation guarantee of $(1-\e)\frac{R(B)^2}{C(B)}$ is Unique Games hard.

Publication details
Download http://arxiv.org/abs/0906.4816
Repository arXiv (United States)
Keywords Computer Science - Data Structures and Algorithms, Computer Science - Computational Complexity
Type text