SIR models with Kermack and McKendrick

Time to bake and to read Kermack and McKendrick!


It seems about the right time to read Kermack & McKendrick, 1927, “A contribution to the Mathematical Theory of Epidemics”. It is an early article on the “Susceptible-Infected-Removed” or “SIR” model, a milestone in the mathematical modelling of infectious disease. In this blog post, I will go through the article, describe the model and the data considered by the authors (plague in Bombay in 1905-1906), which will turn out to be a questionable choice. Some references and R code are given at the end of the article. All of this comes with the disclaimer that I have no expertise in epidemiology.

Continue reading “SIR models with Kermack and McKendrick”

Remote seminars


Hi all,

It seems like the current environment is perfect for the growth of remote seminars. Most of them seem to be free to attend, some of them require registration. I’ve collected some links to seminars with topics related to statistics on this page: I will try to keep the page up to dates with more links as new seminars are being created, with an emphasis on topics at least loosely related to the topics usually covered in our blog posts. Don’t hesitate to send links via comments or emails.

Urgent call for modellers to support epidemic modelling


Hi all,

As many scientists who are not usually working in epidemiology are trying to contribute to the fight against the current pandemic while getting magnets stuck in their nose,  the Royal Society has a call here:  for “modellers to support epidemic modelling” with a deadline on April 2nd (5pm British Summer Time).

More details are given here: and they specifically welcome non-UK based scientists.

Statistical inference on MCMC traces


Hi everyone,

and Happy New Year! This post is about some statistical inferences that one can do using as “data” the output of MCMC algorithms. Consider the trace plot above. It has been generated by Metropolis–Hastings using a Normal random walk proposal, with a standard deviation “sigma”, on a certain target. Suppose that you are given a function that evaluates the pdf of that target. Can you retrieve the value of sigma used to generate that chain?

Continue reading “Statistical inference on MCMC traces”

Course on Bayesian machine learning in Paris

Trends for papers appearing in JMLR and at NeurIPS.

Rémi Bardenet and I are starting a new course on Bayesian machine learning at the MVA master (Mathématiques, Vision et Apprentissage) at ENS Paris-Saclay. Details on the syllabus can be found on the MVA webpage and on this Github repository. In this post, I shortly describe what motivated us for proposing this course and provide results of topic modeling of recent machine learning (conference) papers mentioning Bayesian in their abstract.Continue reading “Course on Bayesian machine learning in Paris”

Turning a sum of expectations into a single expectation with geometric series

Screenshot 2020-01-07 at 17.11.07
A technical lemma that is eager to learn how to turn a sum of expectations into a single expectation with geometric series.

At the dawn of 2020, in case anyone in the stat/ML community is not aware yet of Francis Bach’s blog started last year: this is a great place to learn about general tricks in machine learning explained with easy words. This month’s post The sum of a geometric series is all you need! shows how ubiquitous geometric series are in stochastic gradient descent, among others. In this post, I describe just another situation where the sum of a geometric series can be useful in statistics.

Turning a sum of expectations into a single expectation

I also found the sum of geometric series useful for turning a sum of expectations into a single expectation, by the linearity of expectation. More specifically, for a random variable X compactly supported on [-1,1],

\sum_{k=1}^\infty \mathbb{E}[X^k] = \mathbb{E}\big[\sum_{k=1}^\infty X^k\big] = \mathbb{E}\big[\frac{X}{1-X}\big],

where the sum can be infinite. Continue reading “Turning a sum of expectations into a single expectation with geometric series”

French Habilitation: Bayesian statistical learning and applications

A jury thrilled with Bayesian statistical learning

Long time no see, Statisfaction!

I’m glad to write about my habilitation entitled Bayesian statistical learning and applications I defended yesterday at Inria Grenoble. This Habilitation à Diriger des Recherches (HDR) is the highest degree issued through a university examination in France. If I am to believe official texts, the HDR recognizes a candidate’s “high scientific level, the originality of approach in a field of science, ability to master a research strategy in a sufficiently broad scientific or technological field and ability to supervise young researchers”. No less! In French academia, HDR is a necessary condition for supervising a Ph.D., being a reviewer for a Ph.D., and applying to prof positions. Part of the process consists of writing a manuscript summarizing the research done since the Ph.D. Mine is organized in three parts:

Chapter 1. Bayesian Nonparametric Mixture Modeling. This chapter is devoted to mixture models that are central to Bayesian nonparametrics. It focusses on BNP mixture models for (i) survival analysis, (ii) image segmentation and (ii) ecotoxicological applications.

Chapter 2. Approximate Bayesian Inference. This chapter is concerned in large parts with computational aspects of Bayesian inference. It describes: (i) conditional approaches in the form of truncation-based approximations for the Pitman–Yor process and for completely random measures, (ii) a marginal approach based on approximations of the predictive distribution of Gibbs-type processes, (iii) an approximate Bayesian computation (ABC) algorithm using the energy distance as data discrepancy.

Chapter 3. Distributional Properties of Statistical and Machine Learning Models. This chapter is concerned with general distributional properties of statistical and machine learning models, including (i) sub-Gaussian and sub-Weibull properties, (ii) a prior analysis of Bayesian neural networks and (iii) theoretical properties of asymmetric copulas.Continue reading “French Habilitation: Bayesian statistical learning and applications”

JRSS: Series B read paper and comparison with other unbiased estimators

Reading in the New York public library, by Amy Stewart

Hi all,

The paper “unbiased Markov chain Monte Carlo with couplings” co-written with John O’Leary and Yves Atchadé has been accepted as a read paper in JRSS: Series B, to be presented on December 11 at the Royal Statistical Society. Comments can be submitted (400 words max) until two weeks after, that is December 28; see the instructions here. The main contribution is on the removal of the burn-in bias of Markov chain Monte Carlo (MCMC) estimators, using coupled chains. We argue that 1) the burn-bias can be removed for a controllable loss of efficiency, and 2) this can be done in many settings, for low or high-dimensional, discrete or continuous target distributions, provided that the underlying MCMC algorithm works well enough. This might have consequences on the use of parallel processors for MCMC, but also on the estimation of Monte Carlo errors. We believe that there are a lot of open questions related to our work, on the theoretical, methodological and applied sides, and  hope that the discussion will be interesting.

All the ingredients of our approach were already in the literature, from coupling of Markov chains to debiasing techniques, to the analysis of budget-constrained simulations, but we also propose multiple contributions. Below, I summarize the main methodological contributions of the paper, the way I see it at least, and I  try to contrast our approach with those of Glynn & Rhee (2014) and Agapiou, Roberts & Vollmer (2018).

Continue reading “JRSS: Series B read paper and comparison with other unbiased estimators”

BayesBag, and how to approximate it

Hi all,

This post describes how unbiased MCMC can help in approximating expectations with respect to “BayesBag”, an alternative to standard posterior distributions mentioned in Peter Bühlmann‘s discussion of Big Bayes Stories (which was a special issue of Statistical Science). Essentially BayesBag is the result of “bagging” applied to “Bayesian inference”. In passing, here is an R script implementing this on a model written in the Stan language (as in this previous post), namely a Negative Binomial regression, and using a pure R implementation of unbiased HMC (joint work with Jeremy Heng). The script produces the following figure:

2019-10-stan-negbinreg-bayesbagwhich shows, for two parameters of the model, the cumulative distribution function (CDF) under standard Bayes (blue thin line) and under BayesBag (wider red line). BayesBag results in distributions on the parameter space that are more “spread out” than standard Bayes.

Continue reading “BayesBag, and how to approximate it”

Coding algorithms in R for models written in Stan

Stanislaw Ulam’s auto-biography, “adventures of a mathematician”, originally published in 1976

Hi all,

On top of recommending the excellent autobiography of Stanislaw Ulam, this post is about using the software Stan, but not directly to perform inference, instead to obtain R functions to evaluate a target’s probability density function and its gradient. With which, one can implement custom methods, while still benefiting from the great work of the Stan team on the “modeling language” side. As a proof of concept I have implemented a plain Hamiltonian Monte Carlo sampler for a random effect logistic regression model (taken from a course on Multilevel Models by Germán Rodríguez), a coupling of that HMC algorithm (as in “Unbiased Hamiltonian Monte Carlo with couplings“, see also this very recent article on the topic of coupling HMC), and then upper bounds on the total variation distance between the chain and its limiting distribution, as in “Estimating Convergence of Markov chains with L-Lag Couplings“.

The R script is here: and is meant to be as simple as possible, and self-contained; warning, this is all really proof of concept and not thoroughly tested.

Basically the R script starts like a standard script that would use rstan for inference; it runs the default algorithm of Stan for a little while, then extracts some info from the “stanfit” object.  With these, a pure R implementation of TV upper bounds for a naive HMC algorithm follows, that relies on functions called “stan_logtarget”  and “stan_gradlogtarget” to evaluate the target log-pdf and its gradient.

The script takes a few minutes to run in total. Some time is first needed to compile the Stan code, and to run Stan for a few steps. Then some time spent towards the end of the script to generate 250 independent meeting times with a lag of 500 between the chains; the exact run time will of course depend a lot on your number of available processors (on my machine it takes around one minute). The script produces this plot:


This plot suggests that vanilla HMC as implemented in the script converges in less than 1000 iterations to its stationary distribution. This is probably quite conservative, but it’s still usable.

In passing, upon profiling the code of the function that generates each meeting time, it appears that half of the time is spent in Stan‘s “grad_log_prob” function (which computes the gradient of the log pdf of the target). This implies that not that much efficiency is lost in the fact that the algorithms are coded in pure R, at least for this model.

%d bloggers like this: