Overlapping clustering, where a data point can be assigned to more than one cluster, is desirable in various applications, such as bioinformatics, information retrieval, and social network analysis. In this paper we generalize the framework of correlation clustering to deal with overlapping clusters. In short, we formulate an optimization problem in which each point in the dataset is mapped to a small set of labels, representing membership in different clusters. The number of labels does not have to be the same for all data points. The objective is to find a mapping so that the distances between points in the dataset agree as much as possible with distances taken over their sets of labels. For defining distances between sets of labels, we consider two measures: set-intersection indicator and the Jaccard coefficient.
To solve the problem we propose a local-search algorithm. Iterative improvement within our algorithm gives rise to non-trivial optimization problems, which, for the measures of set intersection and Jaccard, we solve using a greedy method and non-negative least squares, respectively.