# Statisfaction

## Sub-Gaussian property for the Beta distribution (part 3, final)

Posted in General, R by Julyan Arbel on 26 December 2017

When a Beta random variable wants to act like a Bernoulli: convergence of optimal proxy variance.

In this third and last post about the Sub-Gaussian property for the Beta distribution [1] (post 1 and post 2), I would like to show the interplay with the Bernoulli distribution as well as some connexions with optimal transport (OT is a hot topic in general, and also on this blog with Pierre’s posts on Wasserstein ABC). (more…)

## Sub-Gaussian property for the Beta distribution (part 2)

Posted in R by Julyan Arbel on 20 December 2017

Left: What makes the Beta optimal proxy variance (red) so special? Right: The difference function has a double zero (black dot).

As a follow-up on my previous post on the sub-Gaussian property for the Beta distribution [1], I’ll give here a visual illustration of the proof.

A random variable $X$ with finite mean $\mu=\mathbb{E}[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that:

$\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.$

We focus on X being a Beta$(\alpha,\beta)$ random variable. Its moment generating function $\mathbb{E}[\exp(\lambda X)]$ is known as the Kummer function, or confluent hypergeometric function $_1F_1(\alpha,\alpha+\beta,\lambda)$. So is $\sigma^2$-sub-Gaussian as soon as the difference function

$u_\sigma(\lambda)=\exp\left(\frac{\alpha}{\alpha+\beta}\lambda+\frac{\sigma^2}{2}\lambda^2\right)-_1F_1(\alpha,\alpha+\beta,\lambda)$

remains positive on $\mathbb{R}$. This difference function $u_\sigma(\cdot)$ is plotted on the right panel above for parameters $(\alpha,\beta)=(1,1.3)$. In the plot, $\sigma^2$ is varying from green for the variance $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ (which is a lower bound to the optimal proxy variance) to blue for the value $\frac{1}{4(\alpha+\beta+1)}$, a simple upper bound given by Elder (2016), [2]. The idea of the proof is simple: the optimal proxy-variance corresponds to the value of $\sigma^2$ for which $u_\sigma(\cdot)$ admits a double zero, as illustrated with the red curve (black dot). The left panel shows the curves with $\mu = \frac{\alpha}{\alpha+\beta}$ varying, interpolating from green for $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ to blue for $\frac{1}{4(\alpha+\beta+1)}$, with only one curve qualifying as the optimal proxy variance in red.

#### References

[1] Marchal and Arbel (2017), On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017. Code on GitHub.
[2] Elder (2016), Bayesian Adaptive Data Analysis Guarantees from Subgaussianity, https://arxiv.org/abs/1611.00065

Tagged with:

## Sub-Gaussian property for the Beta distribution (part 1)

Posted in General by Julyan Arbel on 2 May 2017

With my friend Olivier Marchal (mathematician, not filmmaker, nor the cop), we have just arXived a note on the sub-Gaussianity of the Beta and Dirichlet distributions.

The notion, introduced by Jean-Pierre Kahane, is as follows:

A random variable $X$ with finite mean $\mu=\mathbb{E}[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that:

$\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.$

Such a constant $\sigma^2$ is called a proxy variance, and we say that $X$ is $\sigma^2$-sub-Gaussian. If $X$ is sub-Gaussian, one is usually interested in the optimal proxy variance:

$\sigma_{\text{opt}}^2(X)=\min\{\sigma^2\geq 0\text{ such that } X \text{ is } \sigma^2\text{-sub-Gaussian}\}.$

Note that the variance always gives a lower bound on the optimal proxy variance: $\text{Var}[X]\leq \sigma_{\text{opt}}^2(X)$. In particular, when $\sigma_{\text{opt}}^2(X)=\text{Var}[X]$, $X$ is said to be strictly sub-Gaussian.

The sub-Gaussian property is closely related to the tails of the distribution. Intuitively, being sub-Gaussian amounts to having tails lighter than a Gaussian. This is actually a characterization of the property. Let $Z\sim\mathcal{N}(0,1)$. Then:

$X \text{ is sub-Gaussian } \iff \exists c, \forall x\geq0:\, \mathsf{P}(|X-\mathbb{E}[X]|\geq x) \leq c\mathsf{P}(|Z|\geq x).$

That equivalence clearly implies exponential upper bounds for the tails of the distribution since a Gaussian $Z\sim\mathcal{N}(0,\sigma^2)$ satisfies

$\mathsf{P}(Z\ge x)\le\exp(-\frac{x^2}{2\sigma^2}).$

That can also be seen directly: for a $\sigma^2$-sub-Gaussian variable $X$,

$\forall\, \lambda>0\,:\,\,\mathsf{P}(X-\mu\geq x) = \mathsf{P}(e^{\lambda(X-\mu)}\geq e^{\lambda x})\leq \frac{\mathbb{E}[e^{\lambda(X-\mu)}]}{e^{\lambda x}}\quad\text{by Markov inequality,}$

$\leq\exp(\frac{\sigma^2\lambda^2}{2}-\lambda x)\quad\text{by sub-Gaussianity.}$

The polynomial function $\lambda\mapsto \frac{\sigma^2\lambda^2}{2}-\lambda x$ is minimized on $\mathbb{R}_+$ at $\lambda = \frac{x}{\sigma^2}$, for which we obtain

$\mathsf{P}(X-\mu\geq x) \leq\exp(-\frac{x^2}{2\sigma^2})$.

In that sense, the sub-Gaussian property of any compactly supported random variable $X$ comes for free since in that case the tails are obviously lighter than those of a Gaussian. A simple general proxy variance is given by Hoeffding’s lemma. Let $X$ be supported on $[a,b]$ with $\mathbb{E}[X]=0$. Then for any $\lambda\in\mathbb{R}$,

$\mathbb{E}[\exp(\lambda X)]\leq\exp\left(\frac{(b-a)^2}{8}\lambda^2\right)$

so $X$ is $\frac{(b-a)^2}{4}$-sub-Gaussian.

Back to the Beta where $[a,b]=[0,1]$, this shows the Beta is $\frac{1}{4}$-sub-Gaussian. The question of finding the optimal proxy variance is a more challenging issue. In addition to characterizing the optimal proxy variance of the Beta distribution in the note, we provide the simple upper bound $\frac{1}{4(\alpha+\beta+1)}$. It matches with Hoeffding’s bound for the extremal case $\alpha\to0$, $\beta\to0$, where the Beta random variable concentrates on the two-point set $\{0,1\}$ (and when Hoeffding’s bound is tight).

In getting the bound $\frac{1}{4(\alpha+\beta+1)}$, we prove a recent conjecture made by Sam Elder in the context of Bayesian adaptive data analysis. I’ll say more about getting the optimal proxy variance in a next post soon.

Cheers!

Julyan

Tagged with: