Nathan Noiry - Personal webpage

Welcome to my webpage !

I am a Senior Research Scientist at Owkin. Our goal is to understand complex biology through AI, bringing together domain knowledge expertise, traditional biostatistics methods and advanced machine learning techniques. We eventually aim at developing cutting-edge precision medicine tools.

Before that, I was a Postdoctoral Researcher at Telecom Paris, working in the S2A research team (Signal, Statistics and Learning), which is part of the LTCI Laboratory (Communication and Information Theory), inside the department IDS (Image, Data and Signal).

I worked on Machine Learning, Deep Learning and Transfer Learning, with a particular interest in Trustworthy AI: biases, fairness and robustness. In line with these subjects, I worked in collaboration with Idemia to reduce the demographic biases in face recognition.

I also co-founded a startup, althiqa, together with Victor Storchan. We are dedicated to AI Evaluation and to simplifying AI reporting for data scientists.
Don't hesitate to reach out if you have any question!

I completed a PhD in probability theory at Modal'X, under the supervision of Nathanaël Enriquez and Laurent Ménard. I was working on random matrices and random graphs theory. Here is the manuscript.

You can find a detailed CV here.

Contact

Telecom Paris
Bureau 5C14
19 place Marguerite Perey
91120 Palaiseau, France

Email: noirynathan 'at' gmail.com

Research

My research interests lie at the intersection between probability, statistics and machine learning.
So far, the topics I work / have worked on include:

Transfer Learning, Covariate Shift;
Biases and Fair Learning in NLP and Computer Vision;
Supervised and Self-supervised Contrastive Learning;
Matching and Online Matching;
Spectra of large random graphs;
Eigenvalues and eigenvectors of large deformed random matrices;
Asymptotic analysis of exploration algorithms on sparse random graphs.

Ongoing works / Under review

A fast softmax-based adversarial attack detector
with Marine Picot, Pablo Piantanida and Pierre Colombo.

A functional Perspective on Multi-Layer Out-of-Distribution Detection
with Eduardo D. C. Gomes, Pierre Colomb, Guillaume Staerman and Pablo Piantanida.

A simple unsupervised data depth-based method to detect adversarial images
with Marine Picot, Guillaume Staerman, Federica Granese, Francisco Messina, Pablo Piantanida and Pierre Colombo.

Toward Stronger Textual Attack Detectors
with Pierre Colombo, Marine Picot, Guillaume Staerman and Pablo Piantanida.

The Glass Ceiling of Automatic Evaluation in Natural Language Generation
with Pierre Colombo, Maxime Peyrard, Robert West and Pablo Piantanida.

A Novel Information Theoretic Objective to Disentangle Representations for Faire Classification
with Pierre Colombo, Guillaume Staerman and Pablo Piantanida.

Papers

What are the best systems? New perspectives on NLP Benchmarking
with Pierre Colombo, Ekhine Irurozki and Stéphan Clémençon.
NeurIPS (2022). Links: journal.
Beyond Mahalanobis-Based Scores for Textual OOD Detection
with Pierre Colombo, Eduardo Gomes, Guillaume Staerman and Pablo Piantanida.
NeurIPS (2022). Links: journal.
Mitigating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model
with Jean-Rémy Conti, Vicent Spiegelman, Stéphane Gentric and Stéphan Clémençon.
ICML (2022). Links: journal.
Learning Disentangled Textual Representations via Statistical Measures of Similarity
with Pierre Colombo, Guillaume Staerman and Pablo Piantanida.
ACL oral (2022). Links: journal.
Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm
with Vianney Perchet and Flore Sentenac.
NeurIPS (2021). Links: pdf, arXiv, journal.
Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics
with Myrto Limnios and Stéphan Clémençon.
LIDTA (2021). Links: journal.
Learning from Biased Data: A Semi-Parametric Approach
with Patrice Bertail, Stéphan Clémençon and Yannick Guyonvarch.
ICML (2021). Links: pdf.
Large deviations for spectral measures of some spiked matrices
with Alain Rouault.
RMTA (2021). Links: pdf, arXiv.
A solvable class of renewal processes and its applications
with Nathanaël Enriquez.
Electronic Communications in Probability, Volume 25 (2020).
Links: pdf, arXiv, journal.
Depth First Exploration of a Configuration Model
with Nathanaël Enriquez, Gabriel Faraud and Laurent Ménard.
EJP (2022). Links: pdf, arXiv.
Spectral Measures of Spiked Random Matrices
Journal of Theoritical Probability (2020). Links: pdf, arXiv.
Spectral asymptotic expansion of Wishart matrices with exploding moments
ALEA Latin American Journal of Probability and Mathematical Statistics, Volume XV, Number 2 (2018), pp. 897-911. Links: pdf, journal.

Others

I gave a talk at the Dataiku seminar, it was about OOD detection in the age of Transformers. You can find it here.
For the International Conference on Machine Learning (ICML), I gave a short talk about the article Learning from Biased Data: A Semi-Parametric Approach. You can find it here.
You can also have a look at the poster.
For the conference Random Matrices and Random Graphs, organized by the GDR MEGA (Matrices Et Graphes Aléatoires) at the CIRM (2019), I made a poster on spectral measures of spiked random matrices. You can find it here.
I gave a talk at the 2018 edition of the conference Les probabilités de demain. It was a short introduction to Wigner and Marchenko-Pastur laws and their generalizations, in links with a work on Wishart matrices with exploding moments. You can find the record (in french) here.

Teaching

2020-2021: Supervision of a group of students in internship at Safran
For the Masters Big Data and Artificial Intelligence, Télécom Paris.
Subjects: Active Learning, Semantic Segmentation, Sampling...
2021-2022: Machine Learning Practical Sessions with Python
For the Masters Big Data and Artificial Intelligence, Télécom Paris.
Among others: k-NN, LDA, Logistic Regression, SVM, Boosting, Random Forests, NMF...
2020-2021: Supervision of a group of students in internship at Sicara
For the Masters Big Data and Artificial Intelligence, Télécom Paris.
Subjects: Computer Vision, Detection of edges, Detection of keypoints...
2020-2021: Supervision of a group of students in internship at Air Liquide
For the Masters Big Data and Artificial Intelligence, Télécom Paris.
Subject: Causal Machine Learning and Counterfactual Analysis for medical applications.
2020-2021: Machine Learning Practical Sessions with Python
For the Masters Big Data and Artificial Intelligence, Télécom Paris.
Among others: k-NN, LDA, Logistic Regression, SVM, Boosting, Random Forests, NMF...
2019-2020: Statistical Methods Tutorials
For bachelors in mathematics and economics, Paris Nanterre.
Among others: descriptive statistics, hypothesis testing...
2017-2019: Algebra, Analysis and Optimization Tutorials
For bachelors in mathematics and economics, Paris Nanterre.
Among others: algebraic structures, multivariate calculus, Lagrange’s method...

Miscellaneous

Organization duties

Co-organization of the meetups of the Data Science and Artificial Intelligence for Digitalized Industry and Services chaire (DSAIDIS), where academic researchers in Machine Learning present their work to the industrial partners of Telecom Paris.
With Laure Dumaz and Guillaume Barraquand, I co-organize the monthly seminar of the GDR MEGA. More precisely, I am responsible of the morning session which consists of a mini-course (1h30) intended to PhD students. For more information about the seminar, you can go to the dedicated website.

Writings

My master thesis (in french): Le théorème de la loi circulaire.
A short note on Wigner's Theorem in the Journal de mathématiques des élèves de l'ENS Lyon. You can find the document (in french) here.