SubGaussian property for the Beta distribution (part 3, final)
In this third and last post about the SubGaussian property for the Beta distribution [1] (post 1 and post 2), I would like to show the interplay with the Bernoulli distribution as well as some connexions with optimal transport (OT is a hot topic in general, and also on this blog with Pierre’s posts on Wasserstein ABC). (more…)
SubGaussian property for the Beta distribution (part 2)
As a followup on my previous post on the subGaussian property for the Beta distribution [1], I’ll give here a visual illustration of the proof.
A random variable with finite mean is subGaussian if there is a positive number such that:
We focus on X being a Beta random variable. Its moment generating function is known as the Kummer function, or confluent hypergeometric function . So X is subGaussian as soon as the difference function
remains positive on . This difference function is plotted on the right panel above for parameters . In the plot, is varying from green for the variance (which is a lower bound to the optimal proxy variance) to blue for the value , a simple upper bound given by Elder (2016), [2]. The idea of the proof is simple: the optimal proxyvariance corresponds to the value of for which admits a double zero, as illustrated with the red curve (black dot). The left panel shows the curves with varying, interpolating from green for to blue for , with only one curve qualifying as the optimal proxy variance in red.
References
[1] Marchal and Arbel (2017), On the subGaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017. Code on GitHub.
[2] Elder (2016), Bayesian Adaptive Data Analysis Guarantees from Subgaussianity, https://arxiv.org/abs/1611.00065
Dynamic publication list for research webpage using arXiv, HAL, or bibtex2html
Well of course, dynamic is conditional upon some manual feeding. If you put your papers on arXiv or HAL, then those two propose dynamic widgets. If you maintain a .bib file of your papers, you can use tools like bibtex2html. This is not dynamic at all, but it allows for finer tuning of url links you might want to add than with arXiv or HAL options. I review below those three options. (more…)
ISBA elections, let’s go voting
The International Society for Bayesian Analysis (ISBA), is running elections until November, 15. This year, two contributors on this blog, Nicolas Chopin and myself, are running for an ISBA Section office. The sections of the society, nine in number as of today, gather researchers with common research interests: Computation, Objective Bayes, Nonparametrics, etc.
Here are our candidate statements:
New R user community in Grenoble, France
Nine R user communities already exist in France and there is a much large number of R communities around the world. It was time for Grenoble to start its own!
The goal of the R user group is to facilitate the identification of local useRs, to initiate contacts, and to organise experience and knowledge sharing sessions. The group is open to any local useR interested in learning and sharing knowledge about R.
The group’s website features a map and table with members of the R group. Members with specific skills related to the use of R are referenced in a table and can be contacted by other members. A gitter allows members to discuss R issues and a calendar presents the upcoming events. (more…)
School of Statistics for Astrophysics, Autrans, France, October 913
Didier FraixBurnet (IPAG), Stéphane Girard (Inria) and myself are organising a School of Statistics for Astrophysics, Stat4Astro, to be held in October in France. The primary goal of the School is to train astronomers to the use of modern statistical techniques. It also aims at bridging the gap between the two communities by emphasising on the practice during works in common, to give firm grounds to the theoretical lessons, and to initiate works on problems brought by the participants. There have been two previous sessions of this school, one on regression and one on clustering. The speakers of this edition, including Christian Robert, Roberto Trotta and David van Dyk, will focus on the Bayesian methodology, with the moral support of the Bayesian Society, ISBA. The interest of this statistical approach in astrophysics probably comes from its necessity and its success in determining the cosmological parameters from observations, especially from the cosmic background fluctuations. The cosmological community has thus been very active in this field (see for instance the Cosmostatistics Initiative COIN).
But the Bayesian methodology, complementary to the more classical frequentist one, has many applications in physics in general due to its faculty to incorporate a priori knowledge into the inference computation, such as the uncertainties brought by the observational processes.
As for sophisticated statistical techniques, astronomers are not familiar with Bayesian methodology in general, while it is becoming more and more widespread and useful in the literature. This school will form the participants to both a strong theoretical background and a solid practice of Bayesian inference:
 Introduction to R and Bayesian Statistics (Didier FraixBurnet, Institut de Planétologie et d’Astrophysique de Grenoble)
 Foundations of Bayesian Inference (David van Dyk, Imperial College London)
 Markov chain Monte Carlo (David van Dyk, Imperial College London)
 Model Building (David van Dyk, Imperial College London)
 Nested Sampling, Model Selection, and Bayesian Hierarchical Models (Roberto Trotta, Imperial College London)
 Approximate Bayesian Computation (Christian Robert, Univ. ParisDauphine, Univ. Warwick and Xi’an (!))
 Bayesian Nonparametric Approaches to Clustering (Julyan Arbel, Université Grenoble Alpes and Inria)
Feel free to register, we are not fully booked yet!
Julyan
SubGaussian property for the Beta distribution (part 1)
With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the subGaussianity of the Beta and Dirichlet distributions.
The notion, introduced by JeanPierre Kahane, is as follows:
A random variable with finite mean is subGaussian if there is a positive number such that:
Such a constant is called a proxy variance, and we say that is subGaussian. If is subGaussian, one is usually interested in the optimal proxy variance:
Note that the variance always gives a lower bound on the optimal proxy variance: . In particular, when , is said to be strictly subGaussian.
The subGaussian property is closely related to the tails of the distribution. Intuitively, being subGaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let . Then:
That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian satisfies
That can also be seen directly: for a subGaussian variable ,
The polynomial function is minimized on at , for which we obtain
.
In that sense, the subGaussian property of any compactly supported random variable comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let be supported on with . Then for any ,
so is subGaussian.
Back to the Beta where , this shows the Beta is subGaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound . It matches with Hoeffding’s bound for the extremal case , , where the Beta random variable concentrates on the twopoint set (and when Hoeffding’s bound is tight).
In getting the bound , we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.
Cheers!
Julyan
Faà di Bruno’s note on eponymous formula, trilingual version
The Italian mathematician Francesco Faà di Bruno was born in Alessandria (Piedmont, Italy) in 1825 and died in Turin in 1888. At the time of his birth, Piedmont used to be part of the Kingdom of Sardinia, led by the Dukes of Savoy. Italy was then unified in 1861, and the Kingdom of Sardinia became the Kingdom of Italy, of which Turin was declared the first capital. At that time, Piedmontese used to commonly speak both Italian and French.
Faà di Bruno is probably best known today for the eponymous formula which generalizes the derivative of a composition of two functions, , to any order:
over tuples satisfying
Faà di Bruno published his formula in two notes:
 Faà Di Bruno, F. (1855). Sullo sviluppo delle funzioni. Annali di Scienze Matematiche e Fisiche, 6:479–480. Google Books link.

Faà Di Bruno, F. (1857). Note sur une nouvelle formule de calcul différentiel. Quarterly Journal of Pure and Applied Mathematics, 1:359–360. Google Books link.
They both date from December 1855, and were signed in Paris. They are similar and essentially state the formula without a proof. I have arXived a note which contains a translation from the French version to English (reproduced below), as well as the two original notes in French and in Italian. I’ve used for this the Erasmus MMXVI font, thanks Xian for sharing! (more…)
MathSciNet reviews on Bayesian papers
I recently started to review papers on Mathematical Reviews / MathSciNet a decided I would post the reviews here from time to time. Here are the first three which deal with (i) objective Bayes priors for discrete parameters, (ii) random probability measures and inference on species variety and (iii) Bayesian nonparametric asymptotic theory and contraction rates.
 An objective approach to prior mass functions for discrete parameter spaces, by Villa, C. and Walker, S. G., J. Amer. Statist. Assoc. 110 (2015), no. 511, 1072–1082.
The paper deals with objective prior derivation in the discrete parameter setting. Previous treatment of this problem includes J. O. Berger, J.M. Bernardo and D. Sun [J. Amer. Statist. Assoc. 107 (2012), no. 498, 636–648; MR2980073] who rely on embedding the discrete parameter into a continuous parameter space and then applying reference methodology (J.M. Bernardo [J. Roy. Statist. Soc. Ser. B 41 (1979), no. 2, 113–147; MR0547240]). The main contribution here is to propose an all purpose objective prior based on the Kullback–Leibler (KL) divergence. More specifically, the prior at any parameter value is obtained as follows: (i) compute the minimum KL divergence over between models indexed by and ; (ii) set proportional to a sound transform of the minimum obtained in (i). A good property of the proposed approach is that it is not problem specific. This objective prior is derived in five models (including binomial and hypergeometric) and is compared to the priors known in the literature. The discussion suggests possible extension to the continuous parameter setting.
 A note on nonparametric inference for species variety with Gibbstype priors, by Favaro, Stefano and James, Lancelot F., Electron. J. Stat. 9 (2015), no. 2, 2884–2902.
A. Lijoi, R. H. Mena and I. Prünster [Biometrika 94 (2007), no. 4, 769–786; MR2416792] recently introduced a Bayesian nonparametric methodology for estimating the species variety featured by an additional unobserved sample of size given an initial observed sample. This methodology was further investigated by S. Favaro, Lijoi and Prünster [Biometrics 68 (2012), no. 4, 1188–1196; MR3040025; Ann. Appl. Probab. 23 (2013), no. 5, 1721–1754; MR3114915]. Although it led to explicit posterior distributions under the general framework of Gibbstype priors [A. V. Gnedin and J. W. Pitman (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83–102, 244–245;MR2160320], there are situations of practical interest where is required to be very large and the computational burden for evaluating these posterior distributions makes impossible their concrete implementation. This paper presents a solution to this problem for a large class of Gibbstype priors which encompasses the two parameter PoissonDirichlet prior and, among others, the normalized generalized Gamma prior. The solution relies on the study of the large asymptotic behaviour of the posterior distribution of the number of new species in the additional sample. In particular a simple characterization of the limiting posterior distribution is introduced in terms of a scale mixture with respect to a suitable latent random variable; this characterization, combined with the adaptive rejection sampling, leads to derive a large approximation of any feature of interest from the exact posterior distribution. The results are implemented through a simulation study and the analysis of a dataset in linguistics.
 Rate exact Bayesian adaptation with modified block priors, by Gao, Chao and Zhou, Harrison H., Ann. Statist. 44 (2016), no. 1, 318–345.
A novel prior distribution is proposed for adaptive Bayesian estimation, meaning that the associated posterior distribution contracts to the truth with the exact optimal rate and at the same time is adaptive regardless of the unknown smoothness. The prior is termed \textit{block prior} and is defined on the Fourier coefficients of a curve by independently assigning 0mean Gaussian distributions on blocks of coefficients indexed by some , with covariance matrix proportional to the identity matrix; the proportional coefficient is itself assigned a prior distribution . Under conditions on , it is shown that (i) the prior puts sufficient prior mass near the true signal and (ii) automatically concentrates on its effective dimension. The main result of the paper is a rateoptimal posterior contraction theorem obtained in a general framework for a modified version of a block prior. Compared to the closely related block spike and slab prior proposed by M. Hoffmann, J. Rousseau and J. SchmidtHieber [Ann. Statist. 43 (2015), no. 5, 2259–2295; MR3396985] which only holds for the white noise model, the present result can be applied in a wide range of models. This is illustrated through applications to five mainstream models: density estimation, white noise model, Gaussian sequence model, Gaussian regression and spectral density estimation. The results hold under Sobolev smoothness and their extension to more flexible Besov smoothness is discussed. The paper also provides a discussion on the absence of an extra log term in the posterior contraction rates (thus achieving the exact minimax rate) with a comparison to other priors commonly used in the literature. These include rescaled Gaussian processes [A. W. van der Vaart and H. van Zanten, Electron. J. Stat. 1 (2007), 433–448; MR2357712; Ann. Statist. 37 (2009), no. 5B, 2655–2675; MR2541442] and sieve priors [V. Rivoirard and J. Rousseau, Bayesian Anal. 7 (2012), no. 2, 311–333; MR2934953; J. Arbel, G. Gayraud and J. Rousseau, Scand. J. Stat. 40 (2013), no. 3, 549–570; MR3091697].
Collegio Carlo Alberto
I have spent three years as a postdoc at the Collegio Carlo Alberto. This was a great time during which I have been able to interact with top colleagues and to prepare my applications in optimal conditions. Now that I have left for Inria Grenoble, here is a brief picture presentation of the Collegio. (more…)
4 comments