Romain Couillet

GSTATS DataScience Chair @ University Grenoble-Alpes

MIAI LargeDATA Chair @ University Grenoble-Alpes

The Project

At the core of Artificial Intelligence (AI) lies a set of elaborate non-linear, data-driven or implictly-defined machine learning methods and algorithms. The latter however largely rely on "small dimensional intuitions" and heuristics which have recently been shown to be mostly inappropriate and behave strikingly differently in large dimensions (see for instance the case of kernel spectral clustering in Fig. 1, or semi-supervised learning in Fig. 2). Recent advances in tools from large dimensional statistics, random matrix theory and statistical physics have provided a series of answers to this curse of dimensionality in proposing a renewed understanding and means of striking improvements through novel algorithms of elementary ML methods for bigdata (in the context of community detection, graph semi surpervised learning, subspace clustering, etc.). Of particular interest is the random matrix analysis of simple neural network structures.
More importantly, while mostly relying on simple modelling (iid Gaussian, simple mixture models, etc.), these tools are adequate and resiliant to realistic datasets, as they provably demonstrate universality features. Precisely, leveraging on a new approach to the concentration of measure theory, these results fully explain realistic advanced ML algorithm behaviors, such as deep learners and GANs (see Fig. 3).

The GSTATS and MIAI LargeDATA chairs aim at gathering these findings in a coherent new random matrix paradigm for big data machine learning. In particular, the project relies on innovative key theoretical directions:

(i) large dimensional statistics (random matrix theory) for the analysis and improvement non-linear optimization, kernel methods, generalized linear mixed models, etc.
(ii) concentration of measure theory and universality for deep learning understanding,
(iii) statistical physics methods for sparse graph mining, clustering, and neural network analysis,

Research Topics

A mathematical perspective of the GSTATS and LargeDATA chair activity can be delineated as the following three main domains.

1) Random Matrix Theory for AI:
- RMT analysis and improvement of ML methods in large dimensional regimes (kernel random matrices, spectral methods, random neural nets)
- Asymptotics of optimization problems in machine learning, generalized linear mixed models
- Large dimensional estimation and detection
- Statistical learning on large dimensional graphs
2) Statistical Physics Approaches:
- Statistical physics for large and sparse data and graphs
- Neural network asymptotics
3) Universality Results: from Theory to Practice:
- Universality through concentration of measure advances for ML
- Universal models and performance in applied areas (from electrical engineering to computer vision, statistical biology, finance, BCI, etc.).

Collaborative Projects

HUAWEI RMT4AI: We collaborate with HUAWEI Labs within the scope of a 2-year project (2020-2022) on the fundamental limitations of AI.
- PhD Thesis: Asymptotics of large dimensional non-convex machine learning (Charles Séjournée, advisor: R. Couillet).
- Postdoc: Random tensors in large dimensions: spiked models and fundamental limits (Henrique Goulard, advisors: P. Comon, R. Couillet).
STMicroelectonics Embedded AI: The STM-LargeDATA collaboration aims at designing and studying cost-efficient methods for embedded machine learning.
- PhD Thesis:Practical considerations on embedded AI (XXX, advisor: Stéphane Mancini).
CEA List:
- PhD Thesis: Concentration of measure theory and random matrices for machine learning (Cosme Louart, advisors: R. Couillet, M. Tamaazousti).
- PhD Thesis: Random matrix theory for AI: from theory to practice (Mohammed El Amine Seddik, advisors: R. Couillet, M. Tamaazousti).
Academic Theses:
- PhD Thesis (sponsored by MIAI) -- 2020-2023 -- : Information-theoretic bounds for large dimensional ML (Minh-Toan Nguyen, advisors: R. Couillet, P. Comon).
- PhD Thesis (sponsored by MIAI) -- 2020-2023 -- : Structured random matrix models and the complexity-performance tradeoff (Tayeb Zarrouk, advisors: F. Chatelain, R. Couillet, N. LeBihan).
- PhD Thesis -- 2020-2023 -- : Randomized linear algebra for large dimensional data (Yigit Pilavci, advisors: P. O. Amblard, S. Barthelme, N. Tremblay).
- PhD Thesis -- 2019-2022 -- : Statistical physics methods for large dimensional sparse data processing (Lorenzo Dall'Amico, advisors: R. Couillet, N. Tremblay).
- PhD Thesis (with CentraleSupélec) -- 2018-2021 -- : Advanced random matrix methods for machine learning (Malik Tiomoko, advisors: R. Couillet, F. Pascal).
- PhD Thesis (with Sondra@CentraleSupélec) -- 2019-2022 -- : Large dimensional classification in array processing (Cyprien Doz, advisors: R. Couillet, J. P. Ovarlez, C. Ren).
- PhD Thesis -- 2019-2024 -- : Large dimensional statistics for financial data (Bernard Nabet, advisor: R. Couillet).
Internships:
- M2 Internship -- 2021 -- : Semi-supervised transfer learning in large data (Victor Léger, advisors: R. Couillet, M. Tiomoko).
- M2 Internship -- 2021 -- : Kernel streaming in large dimensions (Hugo Lebeau, advisors: R. Couillet, F. Chatelain).
- M2 Internship -- 2021 -- : Theoretical tools for semi-sparse clustering (Jianyuang Wang, advisors: R. Couillet).
- M2 Internship -- 2021 -- : Statistical physics for semi-supervised learning (Filippo Zimmaro, advisors: R. Couillet, L. Dall'Amico).
- M2 Internship (MIAI collaboration) -- 2020 -- : Concentration of measure and word embeddings methods in AI (Muhammad Imran, advisors: R. Couillet, E. Gaussier).
- M2 Internship -- 2020 -- : Large-scale learning on graphs (Hashem Ghanem, advisors: R. Couillet, N. Keriven, N. Tremblay).
- M2 Internship -- 2020 -- : Performance Optima of Large Dimensional Machine Learning: A Random Matrix and Information Theory Analysis (Hugues Souchard de Lavoreille, advisors: R. Couillet, S. Zozor).

Publications within the MIAI LargeDATA project

R. Couillet, M. Tiomoko, S. Zozor, E. Moisan, "Random matrix-improved estimation of covariance matrix distances", Journal of Multivariate Analysis, no. 174, pp. 104531, 2019. [article]
X. Mai, R. Couillet, "A Random Matrix Analysis and Improvement of Semi-Supervised Learning for Large Dimensional Data", Journal of Machine Learning Research, vol. 19, no. 79, pp. 1-27, 2018. [article]

Ch. Séjourné, R. Couillet, P. Comon, "A large-dimensional analysis of symmetric SNE", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'21), Toronto, Canada, 2021. [article]
M. Seddik, C. Louart, R. Couillet, M. Tamaazousti, "The Unexpected Deterministic and Universal Behavior of Large Softmax Classifiers", Artificial Intelligence and Statistics (AISTATS'21), virtual conference, 2021. [article]
M. Tiomoko, H. Tiomoko, R. Couillet, "Deciphering and Optimizing Multi-Task and Transfer Learning: a Random Matrix Approach", International Conference on Learning Representations (ICLR'21), virtual conference, 2021. Spotlight article. [article]
Z. Liao, R. Couillet, M. Mahoney, "Sparse Quantized Spectral Clustering", International Conference on Learning Representations (ICLR'21), virtual conference, 2021. Spotlight article. [article]
R. Couillet, Y. Cinar, E. Gaussier, M. Imran, "Word Representations Concentrate and This is Good News!", SIGNLL Conference on Computational Natural Language Learning (CoNLL'20), virtual conference, 2020. [article]
M. Seddik, R. Couillet, M. Tamaazousti, "A Random Matrix Analysis of Learning with α-Dropout", International Conference on Machine Learning (ICML'20), Artemiss workshop, Graz, Autria, 2020. [article]
L. Dall'Amico, R. Couillet, N. Tremblay, "Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian", Conference on Neural Information Processing Systems (NeurIPS'20), Vacouver, Canada, 2020. [article]
Z. Liao, R. Couillet, M. Mahoney, "A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent", Conference on Neural Information Processing Systems (NeurIPS'20), Vacouver, Canada, 2020. [article]
M. Seddik, R. Couillet, M. Tamaazousti, "Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures", International Conference on Machine Learning (ICML'20), Graz, Autria, 2020. [article]
T. Zarrouk, R. Couillet, F. Chatelain, N. Le Bihan, "Performance-Complexity Trade-Off in Large Dimensional Statistics", International Workshop on Machine Learning for Signal Processing (MLSP'20), Espoo, Finland, 2020. [article]
L. Dall'Amico, R. Couillet, N. Tremblay, "Optimal Laplacian Regularization for Sparse Spectral Community Detection", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20), Barcelona, Spain, 2020. [article]
M. Tiomoko, C. Louart, R. Couillet, "Large Dimensional Asymptotics of Multi-Task Learning", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20), Barcelona, Spain, 2020. [article]