Department of Mathematics Colloquium


Abstract 

Identifying clusters of similar objects in
data plays a significant role in a wide range of applications such as
information retrieval, pattern recognition, computational biology, and
image processing. A classical approach to clustering is to model a
given data set as a graph whose nodes correspond to items in the data
and edges indicate similarity between the corresponding endpoints, and
then try to decompose the graph into dense subgraphs. Unfortunately,
finding such a decomposition usually requires solving an intractable
combinatorial optimization problem. In this talk, I will discuss a
convex relaxation approach to identifying these dense subgraphs with
surprising recovery properties: if the data is randomly sampled from a
distribution of “clusterable” data, then the correct partition of the
data into clusters can be exactly recovered from the optimal solution
of the convex relaxation with high probability.
