Probability II, Spring 2020.
University of Washington, Seattle.
This is the second quarter of a sequence in probability theory. This quarter, we study jointly distributed probability distributions, independent random variables, conditional distributions. We also cover diverse representations of probability distributions beyond density and cumulative distributions, namely, we introduce moment generating functions. We then study the convergence of random variables and in particular the central limit theorem.
Lecture slides
Lecture notes

Statistical Learning: Modeling, Prediction, And Computing, Winter 2020.
University of Washington, Seattle. Co-taught with Zaid Harchaoui
The course presents advanced statistical machine learning methods from a functional estimation (nonparametric statistics) viewpoint. The course covers the theoretical analysis of kernel-based methods, as well as their practical implementation using gradient-based optimization algorithms and numerical linear algebra algorithms. The course also covers an introduction to recent theoretical analyses of deep networks.

Teaching Assistant

Convex Optimization, 2014-2017.
Master Mathematics, Vision, Learning, École Normale Supérieure Paris-Saclay, Paris.
Taught by Alexandre d’Aspremont

Oral Interrogations in Maths, 2013-2014
Classes Préparatoires in Mathematics and Physics Lycée Janson de Sailly, Paris.


Automatic Differentiation, Statistical Machine Learning for Data Scientists, University of Washington, Seattle.
Lecture on automatic differentiation with code examples covering: how to compute gradients of a chain of computations, how to use automatic-differentiation software, how to use automatic- differentiation beyond gradient computations.
slides notebook

Optimization for deep learning, Jul. 2018.
Summer School on Fundamentals of Data Analysis,University of Wisconsin-Madison, Madison.
Interactive Jupyter Notebook for 30 attendees to understand the basics of optimization for deep learning: automatic-differentiation, convergence guarantees of SGD, illustration of the batch-normalization effect.