Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - COLORAMAP (Constrained Low-Rank Matrix Approximations: Theoretical and Algorithmic Developments for Practitioners)

Teaser

What is the problem/issue being addressed?This project focuses on the following class of constrained low-rank matrix approximation (CLRMA) problems: Given an m-by-n matrix M and a factorization rank r

Summary

What is the problem/issue being addressed?
This project focuses on the following class of constrained low-rank matrix approximation (CLRMA) problems: Given an m-by-n matrix M and a factorization rank r
Why is it important for society?
CLRMA are very flexible models and can be used for many problems; in particular in data analysis, e.g., background substraction in a video sequence, recommender systems that predict the preferences of users for some items based on their preferences for other items, image segmentation, document classification, audio source separation (and more generally, blind source separation). The class of CLRMA problems contain for example PCA, k-means, subspace clustering, NMF, ICA, to cite a few.

What are the overall objectives?
This project has four main building blocks:
(1) \'Computational complexity\' whose goal is to understand CLRMA problems better: can we solve these problems? under which conditions?
(2) \'Provably correct algorithms\' whose goal is to design algorithms that provable recover the optimal solution. In fact, CLRMA problems are in general difficult but, under some assumptions, some of them can be solved efficiently.
(3) \'Heuristics\' whose goal is to design algorithms (without optimality guarantees) for difficult instances.
(4) \'Applications\' whose goal is to apply the developed algorithms on real-word problems such as the ones mentioned above.

Work performed

Let us list our main results following our four main building blocks:
(1) Computational complexity: we proved NP-hardness of CLRMA with l1 and linfinity norms which are two important variants. We were also able to understand sparse CLRMA problems better using geometric intuition and characterize situations where the optimal solution is unique (up to permutation and scaling).
(2) Provably correct algorithms: We have designed a fast and provable correct algorithm for NMF under the separability condition. We have also developed algorithms for minimum-volume NMF that allows to solve NMF under weaker assumptions than the separability condition.
(3) Heuristics: We have developed several heuristic algorithms for the following CLRMA problems: NMF (using momentum/extrapolation) and distributionally robust NMF, subspace clustering (using sampling and multilayer graphs), positive semidefinite factorization (that are particularly useful for extended formulations), sparse component analysis, dictionary-based factorizations.
Moreover, we have came up with a new application of CLRMA. We proposed a completely new model using factorization of stable matrices; namely A = (J-R)Q where J is antisymmetric (J^T=-J), R and Q are positive semidefinite. All previous work used directly the matrix A, trying to control its eigenvalues directly. With our formulation, a matrix is stable if and only if it admits a factorization of the form (J-R)Q with the above mentioned constraints. This work lead to many other results; e.g., for matrix pairs and positive-real systems.
(4) Applications: For all the algorithms mentioned above, we have used them on real-world applications; with a focus on document classification, audio source separation and hyperspectral unmixing.

Final results

*Progress beyond the state of the art*
For the NP-hardness proofs, we reduced known NP-hard problems to ours as done in the literature. However, to achieve this goal, we had to come up with new and nontrivial tricks. Our provably correct algorithms for separable and minimum-volume NMF were shown to outperform the state-of-the-art. Similarly, for the heuristic algorithms we have developed, they always compared favourably with the state of the art. Also, our new CLRMA model for finding the nearest stable matrix to an unstable one outperformed the state of the art. This translated into better performance for the considered real-world applications.

*Expected results until the end of the project*
Currently we are focusing on two aspects:
(1) provably correct algorithms: we are currently developing two new CLRMA models that can be solved provably up to global optimality.
(2) heuristic algorithms: we develop a general algorithm that can be used for most CLRMA models. The novelty is in the use of several extrapolated points, the fact the objective function is not increasing, and the global convergence to stationary points. We are also developing more specific algorithms; in particular for sparse CLRMA problems and symmetric ones (that is, M is symmetric and U=V).

Website & more info

More info: https://sites.google.com/site/nicolasgillis/projects/overview.