Spectral clustering

Example of two connected graphs

Spectral clustering techniques use the spectrum ( eigenvalues ) of the data similarity matrix to realize dimensionality reduction before clustering in spaces of lower dimensions. The similarity matrix is served as an input and consists of quantitative estimates of the relative similarity of each pair of points in the data.

When applied to image segmentation, spectral clustering is known as .

Algorithms

If a numbered set of data points is specified, the similarity matrix can be defined as a symmetric matrix $A$ ${\ displaystyle A}$ $A$ in which the elements $A_{ij}\geq 0$ ${\ displaystyle A_ {ij} \ geq 0}$ $A_{ij}\geq 0$ represent a measure of similarity between data points with indexes $i$ ${\ displaystyle i}$ $i$ and $j$ ${\ displaystyle j}$ $j$ . The general principle of spectral clustering is the use of the standard clustering method (there are many such methods, the k-means method is discussed below ) on significant eigenvectors of the Kirchhoff matrix of the matrix $A$ ${\ displaystyle A}$ $A$ . There are many different ways to define the Kirchhoff matrix, which has different mathematical interpretations, so clustering will also have different interpretations. Significant eigenvectors are those that correspond to the smallest several eigenvalues of the Kirchhoff matrix, with the exception of the eigenvalues 0. To ensure computational efficiency, these eigenvectors are often calculated as eigenvectors corresponding to some of the largest eigenvalues of the function of the Kirchhoff matrix.

One technique for spectral clustering is (or the Shi – Malik algorithm ) proposed by Giambo Shi and Jitendra Malik ^[1] , a widely used method for image segmentation . The algorithm divides points into two sets $(B_{1},B_{2})$ ${\ displaystyle (B_ {1}, B_ {2})}$ $(B_{1},B_{2})$ based on your own vector $v$ ${\ displaystyle v}$ $v$ corresponding to the second largest eigenvalue of the symmetrically normalized Kirchhoff matrix given by the formula

L^{\text{norm}}:=I-D^{-1/2}AD^{-1/2},

{\ displaystyle L ^ {\ text {norm}}: = ID ^ {- 1/2} AD ^ {- 1/2},}

L^{\text{norm}}:=I-D^{-1/2}AD^{-1/2},

Where $D$ ${\ displaystyle D}$ $D$ - diagonal matrix

D_{ii}=\sum _{j}A_{ij}.

{\ displaystyle D_ {ii} = \ sum _ {j} A_ {ij}.}

D_{ii}=\sum _{j}A_{ij}.

The mathematically equivalent algorithm ^[2] uses an eigenvector corresponding to the largest eigenvalue of the normalized Kirchhoff matrix of a random walk $P=D^{-1}A$ ${\ displaystyle P = D ^ {- 1} A}$ $P=D^{-1}A$ . The Mail – Shi algorithm was tested in the context of , which, as it turned out, are related to computational quantum mechanics ^[3] .

Another possibility is to use the Kirchhoff matrix defined by the expression

L:=D-A

{\ displaystyle L: = DA}

rather than the symmetrically normalized Kirchhoff matrix.

Splitting can be done in various ways, such as calculating the median $m$ ${\ displaystyle m}$ component of the second smallest eigenvector $v$ ${\ displaystyle v}$ and putting all the points in $B_{1}$ ${\ displaystyle B_ {1}}$ whose components in $v$ ${\ displaystyle v}$ more than $m$ ${\ displaystyle m}$ , the remaining points are placed in $B_{2}$ ${\ displaystyle B_ {2}}$ . The algorithm can be used for hierarchical clustering by sequentially partitioning subsets in a similar way.

If the algebraically similarity matrix $A$ ${\ displaystyle A}$ not yet built, the efficiency of spectral clustering can be improved if the solution to the corresponding problem is to search for eigenvalues using a matrixless method (without explicit manipulation or even calculating the similarity matrix), such as .

For large graphs, the second eigenvalue of the (normalized) Kirchhoff matrix of the graph is often poorly conditioned , which leads to slow convergence of iterative methods for finding eigenvalues. Preconditioning is a key technique for improving convergence, for example, in the matrixless LOBPCG method. Spectral clustering was successfully applied to large graphs first of all by recognizing the structure of the network community and then clustering the community ^[4] .

Spectral clustering is closely related to and dimensionality reduction techniques, such as a locally linear embedding, can be used to reduce the error from noise or outlier in observations ^[5] .

Free spectral clustering implementation software is available in large open source projects such as ^[6] , MLlib for pseudo-eigenvalue clustering using the method ^[7] , language R ^{[ 8]} .

Connection with k- means

The problem of is an extension of the problem of k- means, in which the input points are mapped nonlinearly into the space of features of high dimension using the core function $k(x_{i},x_{j})=\varphi ^{T}(x_{i})\varphi (x_{j})$ ${\ displaystyle k (x_ {i}, x_ {j}) = \ varphi ^ {T} (x_ {i}) \ varphi (x_ {j})}$ . The weighted problem of k- means with a nonlinear kernel extends the problem further by setting the weight $w_{r}$ ${\ displaystyle w_ {r}}$ for each cluster as a value inversely proportional to the number of cluster elements,

\max _{\{C_{s}\}}\sum _{r=1}^{k}w_{r}\sum _{x_{i},x_{j}\in C_{r}}k(x_{i},x_{j}).

{\ displaystyle \ max _ {\ {C_ {s} \}} \ sum _ {r = 1} ^ {k} w_ {r} \ sum _ {x_ {i}, x_ {j} \ in C_ {r }} k (x_ {i}, x_ {j}).}

Let be $F$ ${\ displaystyle F}$ Is the matrix of normalized coefficients for each point of any cluster in which $F_{ij}=w_{r}$ ${\ displaystyle F_ {ij} = w_ {r}}$ , if a $i,j\in C_{r}$ ${\ displaystyle i, j \ in C_ {r}}$ , and 0 otherwise. Let be $K$ ${\ displaystyle K}$ - kernel matrix for all points. The weighted problem of k- means with a nonlinear kernel with n points and k clusters is defined as the maximization problem

\max _{F}\operatorname {trace} \left(KF\right)

{\ displaystyle \ max _ {F} \ operatorname {trace} \ left (KF \ right)}

under conditions

F=G_{n\times k}G_{k\times n}^{T}

{\ displaystyle F = G_ {n \ times k} G_ {k \ times n} ^ {T}}

G^{T}G=E

{\ displaystyle G ^ {T} G = E}

Wherein $\operatorname {rank} (G)=k$ ${\ displaystyle \ operatorname {rank} (G) = k}$ . In addition, a restriction on the coefficients $F$ ${\ displaystyle F}$

F\cdot \mathbf {1} =\mathbf {1}

{\ displaystyle F \ cdot \ mathbf {1} = \ mathbf {1}}

,

Where $\mathbf {1}$ ${\ displaystyle \ mathbf {1}}$ represents a vector of units.

F^{T}\mathbf {1} =\mathbf {1}

{\ displaystyle F ^ {T} \ mathbf {1} = \ mathbf {1}}

The task can be converted to

\max _{G}\operatorname {trace} (G^{T}G).

{\ displaystyle \ max _ {G} \ operatorname {trace} (G ^ {T} G).}

This problem is equivalent to the problem of spectral clustering, when the restriction on $F$ ${\ displaystyle F}$ weakened. In particular, the weighted problem of k- means with a nonlinear kernel can be reformulated as the problem of spectral clustering (graph partitioning), and vice versa. The output of the algorithm is eigenvectors that do not satisfy the restrictions on indicator variables defined by the vector $F$ ${\ displaystyle F}$ . Therefore, postprocessing of eigenvectors is required so that the tasks are equivalent ^[9] . Converting the spectral clustering problem into a weighted problem of k- means with a nonlinear core significantly reduces computational costs ^[10] .

Measures for comparing clustering

Ravi Kannan, Santos Vempala and Adrian Vetta ^[11] proposed a bicriteria measure to determine the quality of clustering. They say that clustering is (α, ε) -clustering if the conductivity of each cluster is not less than α, and the weight of intercluster edges does not exceed ε fractions of the weight of all edges in the graph. In the same article, they also consider two approximation algorithms.

Notes

↑ Shi, Malik, 2000 .
↑ Meilă, Shi, 2001 , p. 873-879.
↑ Scott, Therani, Wang, 2017 , p. 1-17.
↑ Zare, Shooshtari, Gupta, Brinkman, 2010 , p. 403.
↑ Arias-Castro, Chen, Lerman, 2011 , p. 1537-1587.
↑ 2.3. Clustering - scikit-learn 0.20.2 documentation
↑ Clustering - RDD-based API - Spark 2.4.0 Documentation
↑ CRAN - Package kernlab
↑ Dhillon, Guan, Kulis, 2004 , p. 551–556.
↑ Dhillon, Guan, Kulis, 2007 , p. 1-14.
↑ Kannan, Vempala, Vetta, 2000 , p. 497-515.

Literature

Marina Meilă, Jianbo Shi. Learning Segmentation by Random Walks // Neural Information Processing Systems . - 2001 .-- T. 13 (NIPS 2000).
Jianbo Shi, Jitendra Malik. Normalized Cuts and Image Segmentation // IEEE Transactions on PAMI. - 2000. - August ( t. 22 , issue 8 ).
TC Scott, Madhusudan Therani, Xing M. Wang. Data Clustering with Quantum Mechanics // Mathematics. - 2017 .-- T. 5 , no. 1 . - S. 1–17 . - DOI : 10.3390 / math5010005 .
Habil Zare, P. Shooshtari, A. Gupta, R. Brinkman. Data reduction for spectral clustering to analyze high throughput flow cytometry data // BMC Bioinformatics. - 2010 .-- T. 11 . - S. 403 . - DOI : 10.1186 / 1471-2105-11-403 . - PMID 20667133 .
E. Arias-Castro, G. Chen, G. Lerman. Spectral clustering based on local linear approximations. // Electronic Journal of Statistics. - 2011 .-- T. 5 . - S. 1537-1587 . - DOI : 10.1214 / 11-ejs651 .
IS Dhillon, Y. Guan, B. Kulis. Kernel k -means: spectral clustering and normalized cuts // Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. - 2004. - S. 551–556.
Inderjit Dhillon, Yuqiang Guan, Brian Kulis. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach // IEEE Transactions on Pattern Analysis and Machine Intelligence. - 2007. - November ( t. 29 , issue 11 ). - DOI : 10.1109 / tpami.2007.1115 .
Ravi Kannan, Santosh Vempala, Adrian Vetta. On Clusterings: Good. Bad and Spectral // Journal of the ACM. - 2000 .-- T. 51 . - DOI : 10.1145 / 990308.990313 .

[_235d40a60412c12f-1] Shi, Malik, 2000 .

[_f7e789e9ab106fc1-2] Meilă, Shi, 2001 , p. 873-879.

[_9d70258f4f81c8a8-3] Scott, Therani, Wang, 2017 , p. 1-17.

[_89fb7753b3a2e51c-4] Zare, Shooshtari, Gupta, Brinkman, 2010 , p. 403.

[_7964406e48ef0267-5] Arias-Castro, Chen, Lerman, 2011 , p. 1537-1587.

[6] 2.3. Clustering - scikit-learn 0.20.2 documentation

[7] Clustering - RDD-based API - Spark 2.4.0 Documentation

[8] CRAN - Package kernlab

[_219c1a0d8cd558d8-9] Dhillon, Guan, Kulis, 2004 , p. 551–556.

[_5923aa455a4ddeae-10] Dhillon, Guan, Kulis, 2007 , p. 1-14.

[_91bffb585308589c-11] Kannan, Vempala, Vetta, 2000 , p. 497-515.