Sub-Gaussian property for the Beta distribution (part 1)

Posted in General by Julyan Arbel on 2 May 2017


With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the sub-Gaussianity of the Beta and Dirichlet distributions.

The notion, introduced by Jean-Pierre Kahane, is as follows:

A random variable X with finite mean \mu=\mathbb{E}[X] is sub-Gaussian if there is a positive number \sigma such that:

\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.

Such a constant \sigma^2 is called a proxy variance, and we say that X is \sigma^2-sub-Gaussian. If X is sub-Gaussian, one is usually interested in the optimal proxy variance:

 \sigma_{\text{opt}}^2(X)=\min\{\sigma^2\geq 0\text{ such that } X \text{ is } \sigma^2\text{-sub-Gaussian}\}.

Note that the variance always gives a lower bound on the optimal proxy variance: \text{Var}[X]\leq \sigma_{\text{opt}}^2(X). In particular, when \sigma_{\text{opt}}^2(X)=\text{Var}[X], X is said to be strictly sub-Gaussian.

The sub-Gaussian property is closely related to the tails of the distribution. Intuitively, being sub-Gaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let Z\sim\mathcal{N}(0,1). Then:

X \text{ is sub-Gaussian } \iff \exists c, \forall x\geq0:\, \mathsf{P}(|X-\mathbb{E}[X]|\geq x) \leq c\mathsf{P}(|Z|\geq x).

That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian Z\sim\mathcal{N}(0,\sigma^2) satisfies

\mathsf{P}(Z\ge x)\le\exp(-\frac{x^2}{2\sigma^2}).

That can also be seen directly: for a \sigma^2-sub-Gaussian variable X,

\forall\, \lambda>0\,:\,\,\mathsf{P}(X-\mu\geq x) = \mathsf{P}(e^{\lambda(X-\mu)}\geq e^{\lambda x})\leq \frac{\mathbb{E}[e^{\lambda(X-\mu)}]}{e^{\lambda x}}\quad\text{by Markov inequality,}

\leq\exp(\frac{\sigma^2\lambda^2}{2}-\lambda x)\quad\text{by sub-Gaussianity.}

The polynomial function \lambda\mapsto \frac{\sigma^2\lambda^2}{2}-\lambda x is minimized on \mathbb{R}_+ at \lambda = \frac{x}{\sigma^2}, for which we obtain

\mathsf{P}(X-\mu\geq x) \leq\exp(-\frac{x^2}{2\sigma^2}).

In that sense, the sub-Gaussian property of any compactly supported random variable X comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let X be supported on [a,b] with \mathbb{E}[X]=0. Then for any \lambda\in\mathbb{R},

\mathbb{E}[\exp(\lambda X)]\leq\exp\left(\frac{(b-a)^2}{8}\lambda^2\right)

so X is \frac{(b-a)^2}{4}-sub-Gaussian.

Back to the Beta where [a,b]=[0,1], this shows the Beta is \frac{1}{4}-sub-Gaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound \frac{1}{4(\alpha+\beta+1)}. It matches with Hoeffding’s bound for the extremal case \alpha\to0, \beta\to0, where the Beta random variable concentrates on the two-point set \{0,1\} (and when Hoeffding’s bound is tight).

In getting the bound \frac{1}{4(\alpha+\beta+1)}, we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: