Statistical inference with the Wasserstein distance

Posted in Statistics by Pierre Jacob on 27 January 2017
NGS Picture ID:1440112

Nature transporting piles of sand.

Hi! It’s been too long!

In a recent arXiv entryEspen Bernton, Mathieu Gerber and Christian P. Robert and I explore the use of the Wasserstein distance to perform parameter inference in generative models. A by-product is an ABC-type approach that bypasses the choice of summary statistics. Instead, one chooses a metric on the observation space. Our work fits in the minimum distance estimation framework and is particularly related to “On minimum Kantorovich distance estimators”, by Bassetti, Bodini and Regazzini. A recent and very related paper is “Wasserstein training of restricted Boltzmann machines“, by Montavon, Müller and Cuturi, who have similar objectives but are not considering purely generative models. Similarly to that paper, we make heavy use of recent breakthroughs in numerical methods to approximate Wasserstein distances, breakthroughs which were not available to Bassetti, Bodini and Regazzini in 2006.

Here I’ll describe the main ideas in a simple setting.  If you’re excited about ABCasymptotic properties of minimum Wasserstein estimators, Hilbert space-filling curves, delay reconstructions and Takens’ theorem, or SMC samplers with r-hit kernels, check our paper!


Gaussian variates truncated to a finite interval

Posted in General by nicolaschopin on 29 December 2016

Alan Rogers, an anthropologist at University of Utah, got in touch with me about my paper on the simulation of truncated Gaussian distributions (journal version, arxiv version). The method I proposed in this paper works for either a finite interval [a,b], or a semi-finite one [a,+inf[, but my C code implements only the latter, and Alan needed the former.

Alan thus decided to re-implement my method and several others (including Christian Robert’s accept-reject algorithm proposed in this paper) in C; see here:

Alan also sent me this interesting plot that compares the different methods. The color of a dot at position (a,b) corresponds to the fastest method for simulating N(0,1) truncated to [a,b];


CPU time comparison of truncated Normal samplers (c=chopin, r=robert, g=N(0,1) proposal, e=exponential proposal)

A few personal remarks:

  • My method is an accept-reject algorithm, where the proposal is a mixture of uniform distributions on rectangles. The point is to have a large probability that the number of basic operations (multiplication, division) needed to return the draw is small. However, the improvement brought by such a method might be be observable only in compiled languages. In an interpreted language such as R, Matlab and Python, loops over basic operations come with a certain overhead, which might cancel any improvement. This was the experience of a colleague who tried to implement it in Julia.
  • Even in C, this comparison might depend on several factors (computer, compiler, libraries, and so on). If I remember correctly, the choice of the random generator in particular may have a significant impact. (I used the GSL library which makes it easy to try different generators for the same piece of code.)
  • Also bear in mind that some progress has been made for computing the inverse CDF of a unit Gaussian distribution. Hence the basic inverse CDF method, while not being the fastest approach, works reasonably well these days, especially (again) in interpreted languages. (Update: Alan tells me the inverse CDF methods remains 10 times slower for his C implementation, based on the GSL library.)


Faà di Bruno’s note on eponymous formula, trilingual version

Posted in General by Julyan Arbel on 20 December 2016


The Italian mathematician Francesco Faà di Bruno was born in Alessandria (Piedmont, Italy) in 1825 and died in Turin in 1888. At the time of his birth, Piedmont used to be part of the Kingdom of Sardinia, led by the Dukes of Savoy. Italy was then unified in 1861, and the Kingdom of Sardinia became the Kingdom of Italy, of which Turin was declared the first capital. At that time, Piedmontese used to commonly speak both Italian and French.

Faà di Bruno is probably best known today for the eponymous formula which generalizes the derivative of a composition of two functions, \phi\circ \psi, to any order:

(\phi\circ \psi)^{(n)} = \sum \frac{n!}{m_1!\,\ldots m_n!}\phi^{(m_1+\,\cdots \,+m_n)}\circ \psi \cdot \prod_{i=1}^n\left(\frac{\psi^{(j)}}{j!}\right)^{m_j}

over n-tuples (m_1,\,\ldots \,, m_n) satisfying \sum_{j=1}^{n}j m_j = n.

Faà di Bruno published his formula in two notes:

  • Faà Di Bruno, F. (1855). Sullo sviluppo delle funzioni. Annali di Scienze Matematiche e Fisiche, 6:479–480. Google Books link.
  • Faà Di Bruno, F. (1857). Note sur une nouvelle formule de calcul différentiel. Quarterly Journal of Pure and Applied Mathematics, 1:359–360. Google Books link.

They both date from December 1855, and were signed in Paris. They are similar and essentially state the formula without a proof. I have arXived a note which contains a translation from the French version to English (reproduced below), as well as the two original notes in French and in Italian. I’ve used for this the Erasmus MMXVI font, thanks Xian for sharing! (more…)

post-doc positions at ENSAE

Posted in General by nicolaschopin on 27 November 2016



interested in doing a post-doc with me on anything related to Bayesian Computation? Please let me know, as there is currently a call for post-doc grants at the ENSAE, see below.

Nicolas Chopin

The Labex ECODEC is a research consortium in Economics and Decision Sciences common to three leading French higher education institutions based in the larger Paris area: École polytechnique, ENSAE and HEC Paris. The Labex Ecodec offers:

  • One-year postdoctoral fellowships for 2017-2018

  • Two-year postdoctoral fellowships for 2017-2019

The monthly gross salary of postdoctoral fellowships is 3 000 €.

Candidates are invited to contact as soon as possible members of the research group (see below) with whom they intend to work.

Research groups concerned by the call:

Area 1: Secure Careers in a Global Economy

Area 2: Financial Market Failures and Regulation

Area 3: Product Market Regulation and Consumer Decision-Making

Area 4: Evaluating the Impact of Public Policies and Firms’ Decisions

Area 5: New Challenges for New Data

Details of axis can be found on the website:

Deadlines for application:

31st December 2016

Screening of applications and decisions can be made earlier for srong candidates who need an early decision.

The application should be sent to in PDF. Please mention the area number on which you apply in the subject.

The application package includes:

  1. A cover letter with the name of a potential supervisor among the group;
  2. A research statement;
  3. A letter from the potential supervisor in support of the project;
  4. A Curriculum vita (with the address of the candidate, phone and e-mail contact).
  5. The Ph.D. dissertation or papers/preprint;
  6. Reference letters, including one from the PhD advisor. A letter from a member of the research group with whom the candidate is willing to interact will be appreciated.


Please note that HEC, Genes, and X PhD students are not eligible to apply for this call.

Selection will be based on excellence and a research project matching the group’s research agenda.

Area 1 “Secure careers in a Global Economy”: Pierre Cahuc (ENSAE), Dominique Rouziès (HEC), Isabelle Méjean (École polytechnique)

Area 2: “Financial Market Failures and Regulation”: François Derrien (HEC), Jean-David Fermanian, (ENSAE) Edouard Challe (École polytechnique)

Area 3: “Decision-Making and Market Regulation”: Nicolas Vieille (HEC), Philippe Choné (ENSAE), Marie-Laure Allain (École polytechnique)

Area 4: “Evaluating the Impact of Public Policies and Firm’s Decisions”: Bruno Crépon (ENSAE), Yukio Koriyama (École polytechnique), Daniel Halbheer (HEC)

Area 5:  “New Challenges for New Data”: Anna Simoni (ENSAE), Gilles Stoltz (HEC)


MathSciNet reviews on Bayesian papers

Posted in General by Julyan Arbel on 18 October 2016



I recently started to review papers on Mathematical Reviews / MathSciNet a decided I would post the reviews here from time to time. Here are the first three which deal with (i) objective Bayes priors for discrete parameters, (ii) random probability measures and inference on species variety and (iii) Bayesian nonparametric asymptotic theory and contraction rates.

The paper deals with objective prior derivation in the discrete parameter setting. Previous treatment of this problem includes J. O. Berger, J.-M. Bernardo and D. Sun [J. Amer. Statist. Assoc. 107 (2012), no. 498, 636–648; MR2980073] who rely on embedding the discrete parameter into a continuous parameter space and then applying reference methodology (J.-M. Bernardo [J. Roy. Statist. Soc. Ser. B 41 (1979), no. 2, 113–147; MR0547240]). The main contribution here is to propose an all purpose objective prior based on the Kullback–Leibler (KL) divergence. More specifically, the prior \pi(\theta) at any parameter value \theta is obtained as follows: (i) compute the minimum KL divergence over \theta'\neq \theta between models indexed by \theta' and \theta; (ii) set \pi(\theta) proportional to a sound transform of the minimum obtained in (i). A good property of the proposed approach is that it is not problem specific. This objective prior is derived in five models (including binomial and hypergeometric) and is compared to the priors known in the literature. The discussion suggests possible extension to the continuous parameter setting.

A. Lijoi, R. H. Mena and I. Prünster [Biometrika 94 (2007), no. 4, 769–786; MR2416792] recently introduced a Bayesian nonparametric methodology for estimating the species variety featured by an additional unobserved sample of size m given an initial observed sample. This methodology was further investigated by S. Favaro, Lijoi and Prünster [Biometrics 68 (2012), no. 4, 1188–1196; MR3040025; Ann. Appl. Probab. 23 (2013), no. 5, 1721–1754; MR3114915]. Although it led to explicit posterior distributions under the general framework of Gibbs-type priors [A. V. Gnedin and J. W. Pitman (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83–102, 244–245;MR2160320], there are situations of practical interest where m is required to be very large and the computational burden for evaluating these posterior distributions makes impossible their concrete implementation. This paper presents a solution to this problem for a large class of Gibbs-type priors which encompasses the two parameter Poisson-Dirichlet prior and, among others, the normalized generalized Gamma prior. The solution relies on the study of the large m asymptotic behaviour of the posterior distribution of the number of new species in the additional sample. In particular a simple characterization of the limiting posterior distribution is introduced in terms of a scale mixture with respect to a suitable latent random variable; this characterization, combined with the adaptive rejection sampling, leads to derive a large m approximation of any feature of interest from the exact posterior distribution. The results are implemented through a simulation study and the analysis of a dataset in linguistics.

A novel prior distribution is proposed for adaptive Bayesian estimation, meaning that the associated posterior distribution contracts to the truth with the exact optimal rate and at the same time is adaptive regardless of the unknown smoothness. The prior is termed \textit{block prior} and is defined on the Fourier coefficients \{\theta_j\} of a curve f by independently assigning 0-mean Gaussian distributions on blocks of coefficients \{\theta_j\}_{j\in B_k} indexed by some B_k, with covariance matrix proportional to the identity matrix; the proportional coefficient is itself assigned a prior distribution g_k. Under conditions on g_k, it is shown that (i) the prior puts sufficient prior mass near the true signal and (ii) automatically concentrates on its effective dimension. The main result of the paper is a rate-optimal posterior contraction theorem obtained in a general framework for a modified version of a block prior. Compared to the closely related block spike and slab prior proposed by M. Hoffmann, J. Rousseau and J. Schmidt-Hieber [Ann. Statist. 43 (2015), no. 5, 2259–2295; MR3396985] which only holds for the white noise model, the present result can be applied in a wide range of models. This is illustrated through applications to five mainstream models: density estimation, white noise model, Gaussian sequence model, Gaussian regression and spectral density estimation. The results hold under Sobolev smoothness and their extension to more flexible Besov smoothness is discussed. The paper also provides a discussion on the absence of an extra log term in the posterior contraction rates (thus achieving the exact minimax rate) with a comparison to other priors commonly used in the literature. These include rescaled Gaussian processes [A. W. van der Vaart and H. van Zanten, Electron. J. Stat. 1 (2007), 433–448; MR2357712; Ann. Statist. 37 (2009), no. 5B, 2655–2675; MR2541442] and sieve priors [V. Rivoirard and J. Rousseau, Bayesian Anal. 7 (2012), no. 2, 311–333; MR2934953; J. Arbel, G. Gayraud and J. Rousseau, Scand. J. Stat. 40 (2013), no. 3, 549–570; MR3091697].

Collegio Carlo Alberto

Posted in General by Julyan Arbel on 12 September 2016

The Collegio in the center.

I have spent three years as a postdoc at the Collegio Carlo Alberto. This was a great time during which I have been able to interact with top colleagues and to prepare my applications in optimal conditions. Now that I have left for Inria Grenoble, here is a brief picture presentation of the Collegio. (more…)

Coupling of particle filters: smoothing

Posted in Statistics by Pierre Jacob on 20 July 2016



Two trajectories made for each other.


Hi again!

In this post, I’ll explain the new smoother introduced in our paper Coupling of Particle Filters with Fredrik Lindsten and Thomas B. Schön from Uppsala University. Smoothing refers to the task of estimating a latent process x_{0:T} = (x_0,\ldots, x_T) of length T, given noisy measurements of it, y_{1:T} = (y_0,\ldots, y_T); the smoothing distribution refers to p(dx_{0:T}|y_{1:T}). The setting is state-space models (what else?!), with a fixed parameter assumed to have been previously estimated.


Coupling of particle filters: likelihood curves

Posted in Statistics by Pierre Jacob on 19 July 2016


In this post, I’ll write about coupling particle filters, as proposed in our recent paper with Fredrik Lindsten and Thomas B. Schön from Uppsala University, available on arXiv; and also in this paper by colleagues at NUS. The paper is about a methodology with multiple direct consequences. In this first post, I’ll focus on correlated likelihood estimators; in a later post, I’ll describe a new smoothing algorithm. Both are described in detail in the article. We’ve been blessed to have been advertised by xi’an’s og, so glory is just around the corner.


Back to blogging

Posted in General by Pierre Jacob on 9 July 2016

My new desk.

My last post dates back to May 2015… thanks to JB and Julyan for keeping the place busy! I’m not (quite) dead and intend to go back to posting stuff every now and then. And by the way, congrats to both for their new jobs!

Last July, I’ve also started a new job,  as an assistant professor in the Department of Statistics at Harvard University, after having spent two years in Oxford. At some point, I might post something on the cultural difference between the European English and American communities of statisticians.

In the coming weeks, I’ll tell you all about a new paper entitled Coupling of Particle Filters,  co-written with Fredrik Lindsten and Thomas B. Schön from Uppsala University in Sweden. We are excited about this coupling idea because it’s simple and yet brings massive gains in many important aspects of inference for state space models (including both parameter inference and smoothing). I’ll be talking about it at the World Congress in Probability and Statistics in Toronto next week and at JSM in Chicago, early in August.

I’ll also try to write about another exciting project, joint work with Christian Robert, Chris Holmes and Lawrence Murray, on modularization, cutting feedback, the infamous cut function of BUGS and all that funny stuff. I’ve talked about it in ISBA 2016, and intend to put the associated tech report on arXiv over the summer.

Stay tuned!

3D density plot in R with Plotly

Posted in General, R by Julyan Arbel on 30 June 2016


In Bayesian nonparametrics, many models address the problem of density regression, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).

Several densities can be plotted on the same 3D plot thanks to the Plotly R library, “an interactive, browser-based charting library built on the open source JavaScript graphing library, plotly.js.”

In our ecological example, the model provides a series of densities on the Y axis (in our case, posterior density of species diversity), indexed by some covariate X (a pollutant). See file density_plot.txt. The following Plotly R code

mydata = read.csv("density_plot.txt")
df =
plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines") 

provides a graph as below. For the interactive version, see the RPubs page here.

Capture d’écran 2016-06-30 à 12.15.57


%d bloggers like this: