# Statisfaction

## Update on inference with Wasserstein distances

Posted in Statistics by Pierre Jacob on 15 August 2017

You have to read the arXiv report to understand this figure. There’s no way around it.

Hi again,

As described in an earlier postEspen BerntonMathieu Gerber and Christian P. Robert and I are exploring Wasserstein distances for parameter inference in generative models. Generally, ABC and indirect inference are fun to play with, as they make the user think about useful distances between data sets (i.i.d. or not), which is sort of implicit in classical likelihood-based approaches. Thinking about distances between data sets can be a helpful and healthy exercise, even if not always necessary for inference. Viewing data sets as empirical distributions leads to considering the Wasserstein distance, and we try to demonstrate in the paper that it leads to an appealing inferential toolbox.

In passing, the first author Espen Bernton will be visiting Marco Cuturi,  Christian Robert, Nicolas Chopin and others in Paris from September to January; get in touch with him if you’re over there!

We have just updated the arXiv version of the paper, and the main modifications are as follows.

## Unbiased MCMC with couplings

Posted in Statistics by Pierre Jacob on 14 August 2017

Two chains meeting at time 10, and staying faithful forever. ❤

Hi,

With John O’Leary and Yves Atchadé , we have just arXived our work on removing the bias of MCMC estimators. Here I’ll explain what this bias is about, and the benefits of removing it.

## Particle methods in Statistics

Posted in General, Statistics by Pierre Jacob on 30 June 2017

A statistician sampling from a posterior distribution with particle methods

Hi there,

In this post, just in time for the summer, I propose a reading list for people interested in discovering the fascinating world of particle methods, aka sequential Monte Carlo methods, and their use in statistics. I also take the opportunity to advertise the SMC workshop in Uppsala (30 Aug – 1 Sept), which features an amazing list of speakers, including my postdoctoral collaborator Jeremy Heng:

www.it.uu.se/conferences/smc2017

## Likelihood calculation for the g-and-k distribution

Posted in R, Statistics by Pierre Jacob on 11 June 2017

Histogram of 1e5 samples from the g-and-k distribution, and overlaid probability density function

Hello,

An example often used in the ABC literature is the g-and-k distribution (e.g. reference [1] below), which is defined through the inverse of its cumulative distribution function (cdf). It is easy to simulate from such distributions by drawing uniform variables and applying the inverse cdf to them. However, since there is no closed-form formula for the probability density function (pdf) of the g-and-k distribution, the likelihood is often considered intractable. It has been noted in [2] that one can still numerically compute the pdf, by 1) numerically inverting the quantile function to get the cdf, and 2)  numerically differentiating the cdf, using finite differences, for instance. As it happens, this is very easy to implement, and I coded up an R tutorial at:

github.com/pierrejacob/winference/blob/master/inst/tutorials/tutorial_gandk.pdf

for anyone interested. This is part of the winference package that goes with our tech report on ABC with the Wasserstein distance  (joint work with Espen Bernton, Mathieu Gerber and Christian Robert, to be updated very soon!). This enables standard MCMC algorithms for the g-and-k example. It is also very easy to compute the likelihood for the multivariate extension of [3], since it only involves a fixed number of one-dimensional numerical inversions and differentiations (as opposed to a multivariate inversion).

Surprisingly, most of the papers that present the g-and-k example do not compare their ABC approximations to the posterior; instead, they typically compare the proposed ABC approach to existing ones. Similarly, the so-called Ricker model is commonly used in the ABC literature, and its posterior can be tackled efficiently using particle MCMC methods; as well as the M/G/1 model, which can be tackled either with particle MCMC methods or with tailor-made MCMC approaches such as [4].

These examples can still have great pedagogical value in ABC papers, but it would perhaps be nice to see more comparisons to the ground truth when it’s available; ground truth here being the actual posterior distribution.

1. Fearnhead, P. and Prangle, D. (2012) Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B, 74, 419–474.
2. Rayner, G. D. and MacGillivray, H. L. (2002) Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions. Statistics and Computing, 12, 57–75.
3. Drovandi, C. C. and Pettitt, A. N. (2011) Likelihood-free Bayesian estimation of multivari- ate quantile distributions. Computational Statistics & Data Analysis, 55, 2541–2556.
4. Shestopaloff, A. Y. and Neal, R. M. (2014) On Bayesian inference for the M/G/1 queue with efficient MCMC sampling. arXiv preprint arXiv:1401.5548.

## Sub-Gaussian property for the Beta distribution (part 1)

Posted in General by Julyan Arbel on 2 May 2017

With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the sub-Gaussianity of the Beta and Dirichlet distributions.

The notion, introduced by Jean-Pierre Kahane, is as follows:

A random variable $X$ with finite mean $\mu=\mathbb{E}[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that:

$\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.$

Such a constant $\sigma^2$ is called a proxy variance, and we say that $X$ is $\sigma^2$-sub-Gaussian. If $X$ is sub-Gaussian, one is usually interested in the optimal proxy variance:

$\sigma_{\text{opt}}^2(X)=\min\{\sigma^2\geq 0\text{ such that } X \text{ is } \sigma^2\text{-sub-Gaussian}\}.$

Note that the variance always gives a lower bound on the optimal proxy variance: $\text{Var}[X]\leq \sigma_{\text{opt}}^2(X)$. In particular, when $\sigma_{\text{opt}}^2(X)=\text{Var}[X]$, $X$ is said to be strictly sub-Gaussian.

The sub-Gaussian property is closely related to the tails of the distribution. Intuitively, being sub-Gaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let $Z\sim\mathcal{N}(0,1)$. Then:

$X \text{ is sub-Gaussian } \iff \exists c, \forall x\geq0:\, \mathsf{P}(|X-\mathbb{E}[X]|\geq x) \leq c\mathsf{P}(|Z|\geq x).$

That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian $Z\sim\mathcal{N}(0,\sigma^2)$ satisfies

$\mathsf{P}(Z\ge x)\le\exp(-\frac{x^2}{2\sigma^2}).$

That can also be seen directly: for a $\sigma^2$-sub-Gaussian variable $X$,

$\forall\, \lambda>0\,:\,\,\mathsf{P}(X-\mu\geq x) = \mathsf{P}(e^{\lambda(X-\mu)}\geq e^{\lambda x})\leq \frac{\mathbb{E}[e^{\lambda(X-\mu)}]}{e^{\lambda x}}\quad\text{by Markov inequality,}$

$\leq\exp(\frac{\sigma^2\lambda^2}{2}-\lambda x)\quad\text{by sub-Gaussianity.}$

The polynomial function $\lambda\mapsto \frac{\sigma^2\lambda^2}{2}-\lambda x$ is minimized on $\mathbb{R}_+$ at $\lambda = \frac{x}{\sigma^2}$, for which we obtain

$\mathsf{P}(X-\mu\geq x) \leq\exp(-\frac{x^2}{2\sigma^2})$.

In that sense, the sub-Gaussian property of any compactly supported random variable $X$ comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let $X$ be supported on $[a,b]$ with $\mathbb{E}[X]=0$. Then for any $\lambda\in\mathbb{R}$,

$\mathbb{E}[\exp(\lambda X)]\leq\exp\left(\frac{(b-a)^2}{8}\lambda^2\right)$

so $X$ is $\frac{(b-a)^2}{4}$-sub-Gaussian.

Back to the Beta where $[a,b]=[0,1]$, this shows the Beta is $\frac{1}{4}$-sub-Gaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound $\frac{1}{4(\alpha+\beta+1)}$. It matches with Hoeffding’s bound for the extremal case $\alpha\to0$, $\beta\to0$, where the Beta random variable concentrates on the two-point set $\{0,1\}$ (and when Hoeffding’s bound is tight).

In getting the bound $\frac{1}{4(\alpha+\beta+1)}$, we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.

Cheers!

Julyan

## ABC in Banff

Posted in General, Seminar/Conference, Statistics by Pierre Jacob on 6 March 2017

Banff, also known as not the worst location for a scientific meeting.

Hi all,

Last week I attended a wonderful meeting on Approximate Bayesian Computation in Banff, which gathered a nice crowd of ABC users and enthusiasts, including lots of people outside of computational stats, whom I wouldn’t have met otherwise. Christian blogged about it there. My talk on Inference with Wasserstein distances is available as a video here (joint work with Espen Bernton, Mathieu Gerber and Christian Robert, the paper is here). In this post, I’ll summarize a few (personal) points and questions on ABC methods, after recalling the basics of ABC (ahem).

## Statistical inference with the Wasserstein distance

Posted in Statistics by Pierre Jacob on 27 January 2017

Nature transporting piles of sand.

Hi! It’s been too long!

In a recent arXiv entryEspen Bernton, Mathieu Gerber and Christian P. Robert and I explore the use of the Wasserstein distance to perform parameter inference in generative models. A by-product is an ABC-type approach that bypasses the choice of summary statistics. Instead, one chooses a metric on the observation space. Our work fits in the minimum distance estimation framework and is particularly related to “On minimum Kantorovich distance estimators”, by Bassetti, Bodini and Regazzini. A recent and very related paper is “Wasserstein training of restricted Boltzmann machines“, by Montavon, Müller and Cuturi, who have similar objectives but are not considering purely generative models. Similarly to that paper, we make heavy use of recent breakthroughs in numerical methods to approximate Wasserstein distances, breakthroughs which were not available to Bassetti, Bodini and Regazzini in 2006.

Here I’ll describe the main ideas in a simple setting.  If you’re excited about ABCasymptotic properties of minimum Wasserstein estimators, Hilbert space-filling curves, delay reconstructions and Takens’ theorem, or SMC samplers with r-hit kernels, check our paper!

## Gaussian variates truncated to a finite interval

Posted in General by nicolaschopin on 29 December 2016

Alan Rogers, an anthropologist at University of Utah, got in touch with me about my paper on the simulation of truncated Gaussian distributions (journal version, arxiv version). The method I proposed in this paper works for either a finite interval [a,b], or a semi-finite one [a,+inf[, but my C code implements only the latter, and Alan needed the former.

Alan thus decided to re-implement my method and several others (including Christian Robert’s accept-reject algorithm proposed in this paper) in C; see here:

https://github.com/alanrogers/dtnorm

Alan also sent me this interesting plot that compares the different methods. The color of a dot at position (a,b) corresponds to the fastest method for simulating N(0,1) truncated to [a,b];

CPU time comparison of truncated Normal samplers (c=chopin, r=robert, g=N(0,1) proposal, e=exponential proposal)

A few personal remarks:

• My method is an accept-reject algorithm, where the proposal is a mixture of uniform distributions on rectangles. The point is to have a large probability that the number of basic operations (multiplication, division) needed to return the draw is small. However, the improvement brought by such a method might be be observable only in compiled languages. In an interpreted language such as R, Matlab and Python, loops over basic operations come with a certain overhead, which might cancel any improvement. This was the experience of a colleague who tried to implement it in Julia.
• Even in C, this comparison might depend on several factors (computer, compiler, libraries, and so on). If I remember correctly, the choice of the random generator in particular may have a significant impact. (I used the GSL library which makes it easy to try different generators for the same piece of code.)
• Also bear in mind that some progress has been made for computing the inverse CDF of a unit Gaussian distribution. Hence the basic inverse CDF method, while not being the fastest approach, works reasonably well these days, especially (again) in interpreted languages. (Update: Alan tells me the inverse CDF methods remains 10 times slower for his C implementation, based on the GSL library.)

## Faà di Bruno’s note on eponymous formula, trilingual version

Posted in General by Julyan Arbel on 20 December 2016

The Italian mathematician Francesco Faà di Bruno was born in Alessandria (Piedmont, Italy) in 1825 and died in Turin in 1888. At the time of his birth, Piedmont used to be part of the Kingdom of Sardinia, led by the Dukes of Savoy. Italy was then unified in 1861, and the Kingdom of Sardinia became the Kingdom of Italy, of which Turin was declared the first capital. At that time, Piedmontese used to commonly speak both Italian and French.

Faà di Bruno is probably best known today for the eponymous formula which generalizes the derivative of a composition of two functions, $\phi\circ \psi$, to any order:

$(\phi\circ \psi)^{(n)} = \sum \frac{n!}{m_1!\,\ldots m_n!}\phi^{(m_1+\,\cdots \,+m_n)}\circ \psi \cdot \prod_{i=1}^n\left(\frac{\psi^{(j)}}{j!}\right)^{m_j}$

over $n$-tuples $(m_1,\,\ldots \,, m_n)$ satisfying $\sum_{j=1}^{n}j m_j = n.$

Faà di Bruno published his formula in two notes:

• Faà Di Bruno, F. (1855). Sullo sviluppo delle funzioni. Annali di Scienze Matematiche e Fisiche, 6:479–480. Google Books link.
• Faà Di Bruno, F. (1857). Note sur une nouvelle formule de calcul différentiel. Quarterly Journal of Pure and Applied Mathematics, 1:359–360. Google Books link.

They both date from December 1855, and were signed in Paris. They are similar and essentially state the formula without a proof. I have arXived a note which contains a translation from the French version to English (reproduced below), as well as the two original notes in French and in Italian. I’ve used for this the Erasmus MMXVI font, thanks Xian for sharing! (more…)

## post-doc positions at ENSAE

Posted in General by nicolaschopin on 27 November 2016

Hi,

interested in doing a post-doc with me on anything related to Bayesian Computation? Please let me know, as there is currently a call for post-doc grants at the ENSAE, see below.

Nicolas Chopin

The Labex ECODEC is a research consortium in Economics and Decision Sciences common to three leading French higher education institutions based in the larger Paris area: École polytechnique, ENSAE and HEC Paris. The Labex Ecodec offers:

• One-year postdoctoral fellowships for 2017-2018

• Two-year postdoctoral fellowships for 2017-2019

The monthly gross salary of postdoctoral fellowships is 3 000 €.

Candidates are invited to contact as soon as possible members of the research group (see below) with whom they intend to work.

Research groups concerned by the call:

Area 1: Secure Careers in a Global Economy

Area 2: Financial Market Failures and Regulation

Area 3: Product Market Regulation and Consumer Decision-Making

Area 4: Evaluating the Impact of Public Policies and Firms’ Decisions

Area 5: New Challenges for New Data

Details of axis can be found on the website:

http://labex-ecodec.fr/

31st December 2016

Screening of applications and decisions can be made earlier for srong candidates who need an early decision.

The application should be sent to application@labex-ecodec.fr in PDF. Please mention the area number on which you apply in the subject.

The application package includes:

1. A cover letter with the name of a potential supervisor among the group;
2. A research statement;
3. A letter from the potential supervisor in support of the project;
4. A Curriculum vita (with the address of the candidate, phone and e-mail contact).
5. The Ph.D. dissertation or papers/preprint;
6. Reference letters, including one from the PhD advisor. A letter from a member of the research group with whom the candidate is willing to interact will be appreciated.

Please note that HEC, Genes, and X PhD students are not eligible to apply for this call.

Selection will be based on excellence and a research project matching the group’s research agenda.

Area 1 “Secure careers in a Global Economy”: Pierre Cahuc (ENSAE), Dominique Rouziès (HEC), Isabelle Méjean (École polytechnique)

Area 2: “Financial Market Failures and Regulation”: François Derrien (HEC), Jean-David Fermanian, (ENSAE) Edouard Challe (École polytechnique)

Area 3: “Decision-Making and Market Regulation”: Nicolas Vieille (HEC), Philippe Choné (ENSAE), Marie-Laure Allain (École polytechnique)

Area 4: “Evaluating the Impact of Public Policies and Firm’s Decisions”: Bruno Crépon (ENSAE), Yukio Koriyama (École polytechnique), Daniel Halbheer (HEC)

Area 5:  “New Challenges for New Data”: Anna Simoni (ENSAE), Gilles Stoltz (HEC)