Statisfaction

Dirichlet Process for dummies

Posted in Statistics by Julyan Arbel on 16 November 2010

A promise is a promise, here is a post on the so-called Dirichlet process, or DP.

What is it? a stochastic process, whose realizations are probability measures. So it is a distribution on distributions. A nice feature for nonparametric Bayesians, who can thus use the DP as a prior when the parameter itself is a probability measure.

As mentionned in an earlier post, a foundational paper and still a nice reading today, which introduced the DP, is Ferguson, T.S. (1973), A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230. I will not go in very deep details here, but mainly will stress the discreteness of the DP.

First, a DP, say on $\mathbb{R}$, has two parameters, a precision parameter $M$, and a base measure $G_0$ on $\mathbb{R}$. Basically, $G_0$ is the mean of the process, and $M$ measures the inverse of its variance. Formally, we write $G\sim DP(M,G_0)$ for a value of the DP. Then, for all measurable subset $A$ of $\mathbb{R}$, $E(G(A))=G_0(A)$, and $V(G(A))=\frac{1}{M+1}G_0(A)(1-G_0(A))$. Actually, a more acurate statement says that $G(A)\sim \text{Beta}(MG_0(A),M(1-G_0(A))$.

A realization is almost surely discrete. In other words, it is a mixture of Dirac masses. Let us explain this explicit expression as a countable mixture, due to Sethuraman (1994). Let $V_i\sim\text{Beta}(1,M)$, and $\theta_i\sim G_0$, mutually independent. Define $p_1=V_1$, and $p_i=V_i\prod_{j. Then $G$ writes $G=\sum p_i \delta_{\theta_i}$. This is called the Sethuraman representation, also refered to as “stick-breaking”. The reason for the name is in the definition of the weights $p_i$: each can be seen as the length of a part of a stick of unit lenght, broken in infinitely many sticks. The first stick is of length $V_1$. The remaining part has length $1-V_1$, and is broken at $V_2$ of its length, which defines a second stick of length $p_2=V_2(1-V_1)$. And so forth. We see easily that this builds a sequence of $p_i$s that sum to 1, because the remaining part at step $n$ has length $\prod_{j\leq n}(1-V_j)$, which goes to 0 almost surely.

Now let us illustrate this with the nice plots of Eric Barrat. He chooses a standard normal for $G_0$, which is quite usual, and $M=3$. A way to get a graphical view of a realization $G$ is to represent a Dirac mass by its weight: