Sub-Gaussian property for the Beta distribution (part 1)

Posted in General by Julyan Arbel on 2 May 2017


With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the sub-Gaussianity of the Beta and Dirichlet distributions.

The notion, introduced by Jean-Pierre Kahane, is as follows:

A random variable X with finite mean \mu=\mathbb{E}[X] is sub-Gaussian if there is a positive number \sigma such that:

\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.

Such a constant \sigma^2 is called a proxy variance, and we say that X is \sigma^2-sub-Gaussian. If X is sub-Gaussian, one is usually interested in the optimal proxy variance:

 \sigma_{\text{opt}}^2(X)=\min\{\sigma^2\geq 0\text{ such that } X \text{ is } \sigma^2\text{-sub-Gaussian}\}.

Note that the variance always gives a lower bound on the optimal proxy variance: \text{Var}[X]\leq \sigma_{\text{opt}}^2(X). In particular, when \sigma_{\text{opt}}^2(X)=\text{Var}[X], X is said to be strictly sub-Gaussian.

The sub-Gaussian property is closely related to the tails of the distribution. Intuitively, being sub-Gaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let Z\sim\mathcal{N}(0,1). Then:

X \text{ is sub-Gaussian } \iff \exists c, \forall x\geq0:\, \mathsf{P}(|X-\mathbb{E}[X]|\geq x) \leq c\mathsf{P}(|Z|\geq x).

That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian Z\sim\mathcal{N}(0,\sigma^2) satisfies

\mathsf{P}(Z\ge x)\le\exp(-\frac{x^2}{2\sigma^2}).

That can also be seen directly: for a \sigma^2-sub-Gaussian variable X,

\forall\, \lambda>0\,:\,\,\mathsf{P}(X-\mu\geq x) = \mathsf{P}(e^{\lambda(X-\mu)}\geq e^{\lambda x})\leq \frac{\mathbb{E}[e^{\lambda(X-\mu)}]}{e^{\lambda x}}\quad\text{by Markov inequality,}

\leq\exp(\frac{\sigma^2\lambda^2}{2}-\lambda x)\quad\text{by sub-Gaussianity.}

The polynomial function \lambda\mapsto \frac{\sigma^2\lambda^2}{2}-\lambda x is minimized on \mathbb{R}_+ at \lambda = \frac{x}{\sigma^2}, for which we obtain

\mathsf{P}(X-\mu\geq x) \leq\exp(-\frac{x^2}{2\sigma^2}).

In that sense, the sub-Gaussian property of any compactly supported random variable X comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let X be supported on [a,b] with \mathbb{E}[X]=0. Then for any \lambda\in\mathbb{R},

\mathbb{E}[\exp(\lambda X)]\leq\exp\left(\frac{(b-a)^2}{8}\lambda^2\right)

so X is \frac{(b-a)^2}{4}-sub-Gaussian.

Back to the Beta where [a,b]=[0,1], this shows the Beta is \frac{1}{4}-sub-Gaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound \frac{1}{4(\alpha+\beta+1)}. It matches with Hoeffding’s bound for the extremal case \alpha\to0, \beta\to0, where the Beta random variable concentrates on the two-point set \{0,1\} (and when Hoeffding’s bound is tight).

In getting the bound \frac{1}{4(\alpha+\beta+1)}, we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.



Faà di Bruno’s note on eponymous formula, trilingual version

Posted in General by Julyan Arbel on 20 December 2016


The Italian mathematician Francesco Faà di Bruno was born in Alessandria (Piedmont, Italy) in 1825 and died in Turin in 1888. At the time of his birth, Piedmont used to be part of the Kingdom of Sardinia, led by the Dukes of Savoy. Italy was then unified in 1861, and the Kingdom of Sardinia became the Kingdom of Italy, of which Turin was declared the first capital. At that time, Piedmontese used to commonly speak both Italian and French.

Faà di Bruno is probably best known today for the eponymous formula which generalizes the derivative of a composition of two functions, \phi\circ \psi, to any order:

(\phi\circ \psi)^{(n)} = \sum \frac{n!}{m_1!\,\ldots m_n!}\phi^{(m_1+\,\cdots \,+m_n)}\circ \psi \cdot \prod_{i=1}^n\left(\frac{\psi^{(j)}}{j!}\right)^{m_j}

over n-tuples (m_1,\,\ldots \,, m_n) satisfying \sum_{j=1}^{n}j m_j = n.

Faà di Bruno published his formula in two notes:

  • Faà Di Bruno, F. (1855). Sullo sviluppo delle funzioni. Annali di Scienze Matematiche e Fisiche, 6:479–480. Google Books link.
  • Faà Di Bruno, F. (1857). Note sur une nouvelle formule de calcul différentiel. Quarterly Journal of Pure and Applied Mathematics, 1:359–360. Google Books link.

They both date from December 1855, and were signed in Paris. They are similar and essentially state the formula without a proof. I have arXived a note which contains a translation from the French version to English (reproduced below), as well as the two original notes in French and in Italian. I’ve used for this the Erasmus MMXVI font, thanks Xian for sharing! (more…)

MathSciNet reviews on Bayesian papers

Posted in General by Julyan Arbel on 18 October 2016



I recently started to review papers on Mathematical Reviews / MathSciNet a decided I would post the reviews here from time to time. Here are the first three which deal with (i) objective Bayes priors for discrete parameters, (ii) random probability measures and inference on species variety and (iii) Bayesian nonparametric asymptotic theory and contraction rates.

The paper deals with objective prior derivation in the discrete parameter setting. Previous treatment of this problem includes J. O. Berger, J.-M. Bernardo and D. Sun [J. Amer. Statist. Assoc. 107 (2012), no. 498, 636–648; MR2980073] who rely on embedding the discrete parameter into a continuous parameter space and then applying reference methodology (J.-M. Bernardo [J. Roy. Statist. Soc. Ser. B 41 (1979), no. 2, 113–147; MR0547240]). The main contribution here is to propose an all purpose objective prior based on the Kullback–Leibler (KL) divergence. More specifically, the prior \pi(\theta) at any parameter value \theta is obtained as follows: (i) compute the minimum KL divergence over \theta'\neq \theta between models indexed by \theta' and \theta; (ii) set \pi(\theta) proportional to a sound transform of the minimum obtained in (i). A good property of the proposed approach is that it is not problem specific. This objective prior is derived in five models (including binomial and hypergeometric) and is compared to the priors known in the literature. The discussion suggests possible extension to the continuous parameter setting.

A. Lijoi, R. H. Mena and I. Prünster [Biometrika 94 (2007), no. 4, 769–786; MR2416792] recently introduced a Bayesian nonparametric methodology for estimating the species variety featured by an additional unobserved sample of size m given an initial observed sample. This methodology was further investigated by S. Favaro, Lijoi and Prünster [Biometrics 68 (2012), no. 4, 1188–1196; MR3040025; Ann. Appl. Probab. 23 (2013), no. 5, 1721–1754; MR3114915]. Although it led to explicit posterior distributions under the general framework of Gibbs-type priors [A. V. Gnedin and J. W. Pitman (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83–102, 244–245;MR2160320], there are situations of practical interest where m is required to be very large and the computational burden for evaluating these posterior distributions makes impossible their concrete implementation. This paper presents a solution to this problem for a large class of Gibbs-type priors which encompasses the two parameter Poisson-Dirichlet prior and, among others, the normalized generalized Gamma prior. The solution relies on the study of the large m asymptotic behaviour of the posterior distribution of the number of new species in the additional sample. In particular a simple characterization of the limiting posterior distribution is introduced in terms of a scale mixture with respect to a suitable latent random variable; this characterization, combined with the adaptive rejection sampling, leads to derive a large m approximation of any feature of interest from the exact posterior distribution. The results are implemented through a simulation study and the analysis of a dataset in linguistics.

A novel prior distribution is proposed for adaptive Bayesian estimation, meaning that the associated posterior distribution contracts to the truth with the exact optimal rate and at the same time is adaptive regardless of the unknown smoothness. The prior is termed \textit{block prior} and is defined on the Fourier coefficients \{\theta_j\} of a curve f by independently assigning 0-mean Gaussian distributions on blocks of coefficients \{\theta_j\}_{j\in B_k} indexed by some B_k, with covariance matrix proportional to the identity matrix; the proportional coefficient is itself assigned a prior distribution g_k. Under conditions on g_k, it is shown that (i) the prior puts sufficient prior mass near the true signal and (ii) automatically concentrates on its effective dimension. The main result of the paper is a rate-optimal posterior contraction theorem obtained in a general framework for a modified version of a block prior. Compared to the closely related block spike and slab prior proposed by M. Hoffmann, J. Rousseau and J. Schmidt-Hieber [Ann. Statist. 43 (2015), no. 5, 2259–2295; MR3396985] which only holds for the white noise model, the present result can be applied in a wide range of models. This is illustrated through applications to five mainstream models: density estimation, white noise model, Gaussian sequence model, Gaussian regression and spectral density estimation. The results hold under Sobolev smoothness and their extension to more flexible Besov smoothness is discussed. The paper also provides a discussion on the absence of an extra log term in the posterior contraction rates (thus achieving the exact minimax rate) with a comparison to other priors commonly used in the literature. These include rescaled Gaussian processes [A. W. van der Vaart and H. van Zanten, Electron. J. Stat. 1 (2007), 433–448; MR2357712; Ann. Statist. 37 (2009), no. 5B, 2655–2675; MR2541442] and sieve priors [V. Rivoirard and J. Rousseau, Bayesian Anal. 7 (2012), no. 2, 311–333; MR2934953; J. Arbel, G. Gayraud and J. Rousseau, Scand. J. Stat. 40 (2013), no. 3, 549–570; MR3091697].

Collegio Carlo Alberto

Posted in General by Julyan Arbel on 12 September 2016

The Collegio in the center.

I have spent three years as a postdoc at the Collegio Carlo Alberto. This was a great time during which I have been able to interact with top colleagues and to prepare my applications in optimal conditions. Now that I have left for Inria Grenoble, here is a brief picture presentation of the Collegio. (more…)

3D density plot in R with Plotly

Posted in General, R by Julyan Arbel on 30 June 2016


In Bayesian nonparametrics, many models address the problem of density regression, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).

Several densities can be plotted on the same 3D plot thanks to the Plotly R library, “an interactive, browser-based charting library built on the open source JavaScript graphing library, plotly.js.”

In our ecological example, the model provides a series of densities on the Y axis (in our case, posterior density of species diversity), indexed by some covariate X (a pollutant). See file density_plot.txt. The following Plotly R code

mydata = read.csv("density_plot.txt")
df =
plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines") 

provides a graph as below. For the interactive version, see the RPubs page here.

Capture d’écran 2016-06-30 à 12.15.57


Bayesian demography

Posted in General by Julyan Arbel on 26 May 2016

“For about two centuries, Bayesian demography remained largely dormant. Only in recent decades has there been a revival of demographers’ interest in Bayesian methods, following the methodological and computational developments of Bayesian statistics. The area is currently growing fast, especially with the United Nations (UN) population projections becoming probabilistic—and Bayesian.”    Bijak and Bryant (2016)

It is interesting to see that Bayesian statistics have been infiltrating demography in the recent years. The review paper Bayesian demography 250 years after Bayes by Bijak and  Bryant (Population Studies, 2016) stresses that promising areas of application include demographic forecasts, problems with limited data, and highly structured and complex models. As an indication of this growing interest, ISBA meeting to be held next June will showcase a course and a session devoted to the field (given and organized by Adrian Raftery).

With Vianney Costemalle from INSEE, we recently modestly contributed to the field by proposing a Bayesian model (paper in French) which helps reconciling apparently inconsistent population datasets. The aim is to estimate annual migration flows to France (note that the work covers the period 2004-2011 (long publication process) and as a consequence does not take into account recent migration events). We follow the United Nations (UN) definition of a long-term migrant, who is someone who settles in a foreign country for at least one year. At least two datasets can be used to this aim: 1) the population census C, annual since 2004, and 2) data from residence permits R. (more…)

Workshop on Bayesian Nonparametrics in Turin

Posted in General by Julyan Arbel on 24 February 2016

Where is the Dirichlet process?

On February 19 took place at Collegio Carlo Alberto the second Statalks, a series of Italian workshops aimed at Master students, PhD students, post-docs and young researchers. This edition was dedicated to Bayesian Nonparametrics. The first two presentations were introductory tutorials while the last four focused on theory and applications. All six were clearly biased according to the scientific interests of our group. Below are the program and the slides.

  1. A gentle introduction to Bayesian Nonparametrics I (Antonio Canale)
  2. A gentle introduction to Bayesian Nonparametrics II (Julyan Arbel)
  3. Dependent processes in Bayesian Nonparametrics (Matteo Ruggiero)
  4. Asymptotics for discrete random measures (Pierpaolo De Blasi)
  5. Applications to Ecology and Marketing (Antonio Canale)
  6. Species sampling models (Julyan Arbel)


Champions League eight of finals’ draw: what are the odds?

Posted in General, Sport by Julyan Arbel on 11 December 2015


[This is a guest post by my friend and colleague Bernardo Nipoti from Collegio Carlo Alberto, Juventus Turin.]

The matches of the group stage of the UEFA Champions league have just finished and next Monday, the 14th of December 2015, in Nyon, there will be a round of draws for deciding the eight matches that will compose the first round of the knockout phase.

As explained on the UEFA website, rules are simple:

  1. two seeding pots have been formed: one consisting of group winners and the other of runners-up;
  2. no team can play a club from their group or any side from their own association;
  3. due to a decision by the UEFA Executive Committee, teams from Russia and Ukraine cannot meet.

The two pots are:

Group winners: Real Madrid (ESP), Wolfsburg (GER), Atlético Madrid (ESP), Manchester City (ENG), Barcelona (ESP, holders), Bayern München (GER), Chelsea (ENG), Zenit (RUS);
Group runners-up: Paris Saint-Germain (FRA), PSV Eindhoven (NED), Benfica (POR), Juventus (ITA), Roma (ITA), Arsenal (ENG), Dynamo Kyiv (UKR), Gent (BEL).

Giving these few constraints, are there some matches that are more likely to be drawn than others? For example, supporters of Barcelona might wonder whether the seven possible teams (PSG, PSV, Benfica, Juventus, Arsenal, Dynamo Kyiv and Gent) are all equally likely to be the next opponent of their favorite team. (more…)

Tagged with:

List of predatory publishers

Posted in General, publishing by Julyan Arbel on 3 December 2015

Yet another predatory publisher?

I have been recently invited to referee a paper for a journal I had never heard of before: the International Journal of Biological Instrumentation, published by VIBGYOR Online Publishers. This publisher happens to be on the blacklist of predatory publishers by Jeffrey Beall which inventory:

Potential, possible, or probable predatory scholarly open-access publishers.

I have kindly declined the invitation. Thanks Igor for the link.


Tagged with:

Some thoughts on the life of a mathematician, by Villani

Posted in General by Julyan Arbel on 3 November 2015

villani_turinSome time ago, Cédric Villani came to Turin for delivering two talks. One intended for youngsters (high school level say), another one for a wider audience, as a recipient of the Peano Prize. He commented on live, in Italian per favore:

“Grazie mille! Un grande piacere e un grande onore per me!”

I attended both. The reason why I attended the first being that I am acting as a research advisor for Math en Jeans groups. Villani spoke about his book, Birth of a Theorem, or Théorème Vivant. He also shared a list of se7en thoughts/tips about doing research, with illustrations. I find them quite inspiring, here they are.

  1. Documentation/literature
    Illustrating this by showing Faà di Bruno’s formula Wikipedia page. I like this quote, since the formula enters moment computation for objects I’m using everyday. And also because Faà di Bruno lived in Italian Piedmont, precisely in Turin.
  2. Motivation
    “The most important and the most mysterious.”
  3. Favorable environment
    Showing pictures of several places where he worked, including Institut Henri Poincaré. Not sure that this one is the most favorable environment for scientific productivity (as a Director I mean).
  4. Exchanges
    Meaning between scientists, not trade. Explaining briefly about polymath projects. And displaying a snapshot of Gowers’s Weblog as an illustration of how diverse exchanges he means. I also believe that blogs are a great information medium 🙂
  5. Constraints
    With snapshots of Musica Ricercata sheet music. And a paragraph of La disparition, a novel without the letter e by Georges Perec. Writing this makes me realize how foolish such an enterprise would look like in mathematics.
  6. Work & Intuition
    Interesting to see these two at the same level.
  7. Perseverance & Luck
    Same comment as for point 6.


%d bloggers like this: