Statisfaction

Dirichlet Process for dummies

Posted in Statistics by Julyan Arbel on 16 November 2010

A promise is a promise, here is a post on the so-called Dirichlet process, or DP.

What is it? a stochastic process, whose realizations are probability measures. So it is a distribution on distributions. A nice feature for nonparametric Bayesians, who can thus use the DP as a prior when the parameter itself is a probability measure.

As mentionned in an earlier post, a foundational paper and still a nice reading today, which introduced the DP, is Ferguson, T.S. (1973), A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230. I will not go in very deep details here, but mainly will stress the discreteness of the DP.

First, a DP, say on \mathbb{R}, has two parameters, a precision parameter M, and a base measure G_0 on \mathbb{R}. Basically, G_0 is the mean of the process, and M measures the inverse of its variance. Formally, we write G\sim DP(M,G_0) for a value of the DP. Then, for all measurable subset A of \mathbb{R}, E(G(A))=G_0(A), and V(G(A))=\frac{1}{M+1}G_0(A)(1-G_0(A)). Actually, a more acurate statement says that G(A)\sim \text{Beta}(MG_0(A),M(1-G_0(A)).

A realization is almost surely discrete. In other words, it is a mixture of Dirac masses. Let us explain this explicit expression as a countable mixture, due to Sethuraman (1994). Let V_i\sim\text{Beta}(1,M), and \theta_i\sim G_0, mutually independent. Define p_1=V_1, and p_i=V_i\prod_{j<i} (1-V_j). Then G writes G=\sum p_i \delta_{\theta_i}. This is called the Sethuraman representation, also refered to as “stick-breaking”. The reason for the name is in the definition of the weights p_i: each can be seen as the length of a part of a stick of unit lenght, broken in infinitely many sticks. The first stick is of length V_1. The remaining part has length 1-V_1, and is broken at V_2 of its length, which defines a second stick of length p_2=V_2(1-V_1). And so forth. We see easily that this builds a sequence of p_is that sum to 1, because the remaining part at step n has length \prod_{j\leq n}(1-V_j), which goes to 0 almost surely.

Now let us illustrate this with the nice plots of Eric Barrat. He chooses a standard normal for G_0, which is quite usual, and M=3. A way to get a graphical view of a realization G is to represent a Dirac mass by its weight:

2 Responses

Subscribe to comments with RSS.

  1. villani said, on 12 June 2011 at 14:12

    Nice post! Very compact, yet clear.

  2. […] for the next GTB meeting at Crest, 3rd May, I will present Peter Orbanz‘ work on Projective limit random probabilities on Polish spaces. It will follow my previous presentation about Bayesian nonparametrics on the Dirichlet process. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: