Spectral clustering with tensors

NSF Project Information Page

People Involved

Goals and objectives

Society is awash in complex data that is recorded by high-resolution sensors and through our traces of online activity. One of the key challenges in turning this data into actionable insight is finding hidden groups in these data. For instance, finding similar types of land based on its spectral composition, or finding groups of like-minded people in a social network. While there are a variety of methods to solve these problems that are known already, there are none that take advantage of the multi-dimensional features of the data in an integrated way. The work in this proposal will provide an entirely new type of data clustering, or grouping, method the exploits the rich features of these complex modern datasets. Having access to such a method will help society improve its understanding of the results of complex scientific experiments, produce new insights into the common patterns in social networks, and extract valuable information from large databases of sensor information. The methods developed will be general, and thus, they will have broad applicability that will enrich our capability to use data to benefit society.

Spectral clustering is a technique to identify cohesive groups in a database based on simple pairwise relationships between the items. In a social network, for example, people are connected based on pair-wise friendship relationships. This two-way, or two-dimensional, association between people does not capture the richness of real-world social interactions that often involve multiple people. These higher-order interactions among groups will be represented as tensors and hyper-graphs in this project. For instance, an interaction among three people corresponds to a hyper-edge, which will be represented by a three-dimensional tensor. Thus, this project will investigate novel formulations of the spectral clustering problem that work on multi-dimensional relationships represented with tensors. These new methods and their variations will be evaluated based on the insights that they provide into (i) hyper-spectral imaging data, where each pixel has a large number of frequencies measured; (ii) in social network data, where groups of interactions create higher-order relationships; and (iii) in complex simulation data, where parameteric variation creates multi-dimensional data and relationships based on space, time and parameters. This project will also investigate fast algorithms to identify the clusters based on the new formulations of spectral clustering, as well as relations between these algorithms and emerging work in computational topology.

Publications

Higher-order organization of complex networks. Journal paper. Austin Benson, David F. Gleich, and Jure Leskovec. Science, 353(6295):163-166, 2016. [ bib | DOI | local ]

Triangular alignment (TAME): A tensor-based approach for higher-order network alignment. Journal paper. Shahin Mohammadi, David F. Gleich, Tamara G. Kolda, and Ananth Grama. Transactions on Computational Biology and Bioinformatics, In press. Preprint on arXiv. [ bib | http ]

Generalized tensor spectral co-clustering. Preprint on arXiv. Tao Wu, Austin Benson, and David F. Gleich. arXiv, cs.SI:1603.00395, 2016. [ bib | http ]

The spacey random walk: a stochastic process for higher-order data. Preprint on arXiv. Austin Benson, David F. Gleich, and Lek-Heng Lim. arXiv, cs.NA:1602.02102, 2016. [ bib | http ]

Multilinear PageRank. Journal paper. David F. Gleich, Lek-Heng Lim, and Yongyang Yu. SIAM Journal on Matrix Analysis and Applications, 36(4):1507-1541, 2015. [ bib | DOI | local | http ]

Tensor spectral clustering for partitioning higher-order network structures. Conference proceedings. Austin Benson, David F. Gleich, and Jure Leskovec. In SIAM Data Mining, 2015. [ bib | http ]

Software