A big problem in our community

Posted in General, Seminar/Conference, Statistics by Pierre Jacob on 14 December 2017

“Tout va très bien”, meaning “all is well”, by Franquin.

Hi all,

Kristian Lum, who was already one of my Statistics superheroes for her many interesting papers and great talks, bravely wrote the following text about her experience as a young statistician going to conferences:

I can’t thank Kristian enough for speaking out. Her experience is both shocking and hardly surprising. Many, many academics report similar stories. This simply can’t go on like that.

I happen to have gone to the conferences mentioned by Kristian, and my experience as a young man was completely different. It was all about meeting interesting people, discussing ideas, being challenged, and having good times. Nobody harassed, touched or assaulted me. There was some flirting, as I guess is natural when hundreds of people are put in sunny places far away from home, but I was never the victim of any misconduct or abuse of power. So instead of driving me out of the field, conferences became important, enriching and rewarding moments of my professional life.

Looking back at those conferences I feel sick, and heartbroken, at the thought that some of my peers were having such a difficult time, because of predators who don’t ever  face the consequences of their actions. Meanwhile I was part of the silent majority.

The recent series of revelations about sexual harassment and assaults in other professional environments indicate that this is not specific to our field, nor to academia. But this does not make it any more acceptable. I know for a fact that many leaders of our field take this issue extremely seriously (as Kristian mentions too),  but clearly much much more needs to be done. The current situation is just shameful; strong and  coordinated actions will be needed to fix it. Thanks again to Kristian for the wake-up call.


Dynamic publication list for research webpage using arXiv, HAL, or bibtex2html

Posted in General by Julyan Arbel on 23 October 2017

Well of course, dynamic is conditional upon some manual feeding. If you put your papers on arXiv or HAL, then those two propose dynamic widgets. If you maintain a .bib file of your papers, you can use tools like bibtex2html. This is not dynamic at all, but it allows for finer tuning of url links you might want to add than with arXiv or HAL options. I review below those three options. (more…)

ISBA elections, let’s go voting

Posted in General by Julyan Arbel on 16 October 2017

So it seems even Thomas B. went voting.

The International Society for Bayesian Analysis (ISBA), is running elections until November, 15. This year, two contributors on this blog, Nicolas Chopin and myself, are running for an ISBA Section office. The sections of the society, nine in number as of today, gather researchers with common research interests: Computation, Objective Bayes, Nonparametrics, etc.

Here are our candidate statements:


Tagged with: ,

Statistical learning in models made of modules

Posted in General, Statistics by Pierre Jacob on 9 September 2017



Graph of variables in a model made of two modules: the first with parameter theta1 and data Y1, and the second with parameter theta2 and data Y2, defined conditionally upon theta1.



With Lawrence Murray, Chris Holmes and Christian Robert, we have recently arXived a paper entitled “Better together? Statistical learning in models made of modules”. Christian blogged about it already. The context is the following: parameters of a first model appear as inputs in another model. The question is whether to consider a “joint model approach”, where all parameters are estimated simultaneously with all of the data. Or if one should instead follow a “modular approach”, where the first parameters are estimated with the first model only, ignoring the second model. Examples of modular approaches include the “cut distribution“, or “two-step estimators” (e.g. Chapter 6 of Newey & McFadden (1994)). In many fields, modular approaches are preferred, because the second model is suspected of being more misspecified than the first one. Misspecification of the second model can “contaminate” the joint model, with dire consequences on inference, as described e.g. in Bayarri, Berger & Liu (2009). Other reasons include computational constraints and the lack of simultaneous availability of all models and associated data. In the paper, we try to make sense of the defects of the joint model approach and we propose a principled, quantitative way of choosing between joint and modular approaches.


School of Statistics for Astrophysics, Autrans, France, October 9-13

Posted in General by Julyan Arbel on 7 September 2017

Didier Fraix-Burnet (IPAG), Stéphane Girard (Inria) and myself are organising a School of Statistics for Astrophysics, Stat4Astro, to be held in October in France. The primary goal of the School is to train astronomers to the use of modern statistical techniques. It also aims at bridging the gap between the two communities by emphasising on the practice during works in common, to give firm grounds to the theoretical lessons, and to initiate works on problems brought by the participants. There have been two previous sessions of this school, one on regression and one on clustering. The speakers of this edition, including Christian Robert, Roberto Trotta and David van Dyk, will focus on the Bayesian methodology, with the moral support of the Bayesian Society, ISBA. The interest of this statistical approach in astrophysics probably comes from its necessity and its success in determining the cosmological parameters from observations, especially from the cosmic background fluctuations.  The cosmological community has thus been very active in this field (see for instance the Cosmostatistics Initiative COIN).

But the Bayesian methodology, complementary to the more classical frequentist one, has many applications in physics in general due to its faculty to incorporate a priori knowledge into the inference computation, such as the uncertainties brought by the observational processes.

As for sophisticated statistical techniques, astronomers are not familiar with Bayesian methodology in general, while it is becoming more and more widespread and useful in the literature. This school will form the participants to both a strong theoretical background and a solid practice of Bayesian inference:

  • Introduction to R and Bayesian Statistics (Didier Fraix-Burnet, Institut de Planétologie et d’Astrophysique de Grenoble)
  • Foundations of Bayesian Inference (David van Dyk, Imperial College London)
  • Markov chain Monte Carlo (David van Dyk, Imperial College London)
  • Model Building (David van Dyk, Imperial College London)
  • Nested Sampling, Model Selection, and Bayesian Hierarchical Models (Roberto Trotta, Imperial College London)
  • Approximate Bayesian Computation (Christian Robert, Univ. Paris-Dauphine, Univ. Warwick and Xi’an (!))
  • Bayesian Nonparametric Approaches to Clustering (Julyan Arbel, Université Grenoble Alpes and Inria)

Feel free to register, we are not fully booked yet!


Particle methods in Statistics

Posted in General, Statistics by Pierre Jacob on 30 June 2017
Welding it together

A statistician sampling from a posterior distribution with particle methods

Hi there,

In this post, just in time for the summer, I propose a reading list for people interested in discovering the fascinating world of particle methods, aka sequential Monte Carlo methods, and their use in statistics. I also take the opportunity to advertise the SMC workshop in Uppsala (30 Aug – 1 Sept), which features an amazing list of speakers, including my postdoctoral collaborator Jeremy Heng:


Sub-Gaussian property for the Beta distribution (part 1)

Posted in General by Julyan Arbel on 2 May 2017


With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the sub-Gaussianity of the Beta and Dirichlet distributions.

The notion, introduced by Jean-Pierre Kahane, is as follows:

A random variable X with finite mean \mu=\mathbb{E}[X] is sub-Gaussian if there is a positive number \sigma such that:

\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.

Such a constant \sigma^2 is called a proxy variance, and we say that X is \sigma^2-sub-Gaussian. If X is sub-Gaussian, one is usually interested in the optimal proxy variance:

 \sigma_{\text{opt}}^2(X)=\min\{\sigma^2\geq 0\text{ such that } X \text{ is } \sigma^2\text{-sub-Gaussian}\}.

Note that the variance always gives a lower bound on the optimal proxy variance: \text{Var}[X]\leq \sigma_{\text{opt}}^2(X). In particular, when \sigma_{\text{opt}}^2(X)=\text{Var}[X], X is said to be strictly sub-Gaussian.

The sub-Gaussian property is closely related to the tails of the distribution. Intuitively, being sub-Gaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let Z\sim\mathcal{N}(0,1). Then:

X \text{ is sub-Gaussian } \iff \exists c, \forall x\geq0:\, \mathsf{P}(|X-\mathbb{E}[X]|\geq x) \leq c\mathsf{P}(|Z|\geq x).

That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian Z\sim\mathcal{N}(0,\sigma^2) satisfies

\mathsf{P}(Z\ge x)\le\exp(-\frac{x^2}{2\sigma^2}).

That can also be seen directly: for a \sigma^2-sub-Gaussian variable X,

\forall\, \lambda>0\,:\,\,\mathsf{P}(X-\mu\geq x) = \mathsf{P}(e^{\lambda(X-\mu)}\geq e^{\lambda x})\leq \frac{\mathbb{E}[e^{\lambda(X-\mu)}]}{e^{\lambda x}}\quad\text{by Markov inequality,}

\leq\exp(\frac{\sigma^2\lambda^2}{2}-\lambda x)\quad\text{by sub-Gaussianity.}

The polynomial function \lambda\mapsto \frac{\sigma^2\lambda^2}{2}-\lambda x is minimized on \mathbb{R}_+ at \lambda = \frac{x}{\sigma^2}, for which we obtain

\mathsf{P}(X-\mu\geq x) \leq\exp(-\frac{x^2}{2\sigma^2}).

In that sense, the sub-Gaussian property of any compactly supported random variable X comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let X be supported on [a,b] with \mathbb{E}[X]=0. Then for any \lambda\in\mathbb{R},

\mathbb{E}[\exp(\lambda X)]\leq\exp\left(\frac{(b-a)^2}{8}\lambda^2\right)

so X is \frac{(b-a)^2}{4}-sub-Gaussian.

Back to the Beta where [a,b]=[0,1], this shows the Beta is \frac{1}{4}-sub-Gaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound \frac{1}{4(\alpha+\beta+1)}. It matches with Hoeffding’s bound for the extremal case \alpha\to0, \beta\to0, where the Beta random variable concentrates on the two-point set \{0,1\} (and when Hoeffding’s bound is tight).

In getting the bound \frac{1}{4(\alpha+\beta+1)}, we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.



Tagged with:

ABC in Banff

Posted in General, Seminar/Conference, Statistics by Pierre Jacob on 6 March 2017

Banff, also known as not the worst location for a scientific meeting.

Hi all,

Last week I attended a wonderful meeting on Approximate Bayesian Computation in Banff, which gathered a nice crowd of ABC users and enthusiasts, including lots of people outside of computational stats, whom I wouldn’t have met otherwise. Christian blogged about it there. My talk on Inference with Wasserstein distances is available as a video here (joint work with Espen Bernton, Mathieu Gerber and Christian Robert, the paper is here). In this post, I’ll summarize a few (personal) points and questions on ABC methods, after recalling the basics of ABC (ahem).


Gaussian variates truncated to a finite interval

Posted in General by nicolaschopin on 29 December 2016

Alan Rogers, an anthropologist at University of Utah, got in touch with me about my paper on the simulation of truncated Gaussian distributions (journal version, arxiv version). The method I proposed in this paper works for either a finite interval [a,b], or a semi-finite one [a,+inf[, but my C code implements only the latter, and Alan needed the former.

Alan thus decided to re-implement my method and several others (including Christian Robert’s accept-reject algorithm proposed in this paper) in C; see here:

Alan also sent me this interesting plot that compares the different methods. The color of a dot at position (a,b) corresponds to the fastest method for simulating N(0,1) truncated to [a,b];


CPU time comparison of truncated Normal samplers (c=chopin, r=robert, g=N(0,1) proposal, e=exponential proposal)

A few personal remarks:

  • My method is an accept-reject algorithm, where the proposal is a mixture of uniform distributions on rectangles. The point is to have a large probability that the number of basic operations (multiplication, division) needed to return the draw is small. However, the improvement brought by such a method might be be observable only in compiled languages. In an interpreted language such as R, Matlab and Python, loops over basic operations come with a certain overhead, which might cancel any improvement. This was the experience of a colleague who tried to implement it in Julia.
  • Even in C, this comparison might depend on several factors (computer, compiler, libraries, and so on). If I remember correctly, the choice of the random generator in particular may have a significant impact. (I used the GSL library which makes it easy to try different generators for the same piece of code.)
  • Also bear in mind that some progress has been made for computing the inverse CDF of a unit Gaussian distribution. Hence the basic inverse CDF method, while not being the fastest approach, works reasonably well these days, especially (again) in interpreted languages. (Update: Alan tells me the inverse CDF methods remains 10 times slower for his C implementation, based on the GSL library.)


Faà di Bruno’s note on eponymous formula, trilingual version

Posted in General by Julyan Arbel on 20 December 2016


The Italian mathematician Francesco Faà di Bruno was born in Alessandria (Piedmont, Italy) in 1825 and died in Turin in 1888. At the time of his birth, Piedmont used to be part of the Kingdom of Sardinia, led by the Dukes of Savoy. Italy was then unified in 1861, and the Kingdom of Sardinia became the Kingdom of Italy, of which Turin was declared the first capital. At that time, Piedmontese used to commonly speak both Italian and French.

Faà di Bruno is probably best known today for the eponymous formula which generalizes the derivative of a composition of two functions, \phi\circ \psi, to any order:

(\phi\circ \psi)^{(n)} = \sum \frac{n!}{m_1!\,\ldots m_n!}\phi^{(m_1+\,\cdots \,+m_n)}\circ \psi \cdot \prod_{i=1}^n\left(\frac{\psi^{(j)}}{j!}\right)^{m_j}

over n-tuples (m_1,\,\ldots \,, m_n) satisfying \sum_{j=1}^{n}j m_j = n.

Faà di Bruno published his formula in two notes:

  • Faà Di Bruno, F. (1855). Sullo sviluppo delle funzioni. Annali di Scienze Matematiche e Fisiche, 6:479–480. Google Books link.
  • Faà Di Bruno, F. (1857). Note sur une nouvelle formule de calcul différentiel. Quarterly Journal of Pure and Applied Mathematics, 1:359–360. Google Books link.

They both date from December 1855, and were signed in Paris. They are similar and essentially state the formula without a proof. I have arXived a note which contains a translation from the French version to English (reproduced below), as well as the two original notes in French and in Italian. I’ve used for this the Erasmus MMXVI font, thanks Xian for sharing! (more…)

%d bloggers like this: