3D density plot in R with Plotly

Posted in General, R by Julyan Arbel on 30 June 2016


In Bayesian nonparametrics, many models address the problem of density regression, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).

Several densities can be plotted on the same 3D plot thanks to the Plotly R library, “an interactive, browser-based charting library built on the open source JavaScript graphing library, plotly.js.”

In our ecological example, the model provides a series of densities on the Y axis (in our case, posterior density of species diversity), indexed by some covariate X (a pollutant). See file density_plot.txt. The following Plotly R code

mydata = read.csv("density_plot.txt")
df =
plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines") 

provides a graph as below. For the interactive version, see the RPubs page here.

Capture d’écran 2016-06-30 à 12.15.57


Dirichlet Process for dummies

Posted in Statistics by Julyan Arbel on 16 November 2010

A promise is a promise, here is a post on the so-called Dirichlet process, or DP.

What is it? a stochastic process, whose realizations are probability measures. So it is a distribution on distributions. A nice feature for nonparametric Bayesians, who can thus use the DP as a prior when the parameter itself is a probability measure.

As mentionned in an earlier post, a foundational paper and still a nice reading today, which introduced the DP, is Ferguson, T.S. (1973), A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230. I will not go in very deep details here, but mainly will stress the discreteness of the DP.

First, a DP, say on \mathbb{R}, has two parameters, a precision parameter M, and a base measure G_0 on \mathbb{R}. Basically, G_0 is the mean of the process, and M measures the inverse of its variance. Formally, we write G\sim DP(M,G_0) for a value of the DP. Then, for all measurable subset A of \mathbb{R}, E(G(A))=G_0(A), and V(G(A))=\frac{1}{M+1}G_0(A)(1-G_0(A)). Actually, a more acurate statement says that G(A)\sim \text{Beta}(MG_0(A),M(1-G_0(A)).

A realization is almost surely discrete. In other words, it is a mixture of Dirac masses. Let us explain this explicit expression as a countable mixture, due to Sethuraman (1994). Let V_i\sim\text{Beta}(1,M), and \theta_i\sim G_0, mutually independent. Define p_1=V_1, and p_i=V_i\prod_{j<i} (1-V_j). Then G writes G=\sum p_i \delta_{\theta_i}. This is called the Sethuraman representation, also refered to as “stick-breaking”. The reason for the name is in the definition of the weights p_i: each can be seen as the length of a part of a stick of unit lenght, broken in infinitely many sticks. The first stick is of length V_1. The remaining part has length 1-V_1, and is broken at V_2 of its length, which defines a second stick of length p_2=V_2(1-V_1). And so forth. We see easily that this builds a sequence of p_is that sum to 1, because the remaining part at step n has length \prod_{j\leq n}(1-V_j), which goes to 0 almost surely.

Now let us illustrate this with the nice plots of Eric Barrat. He chooses a standard normal for G_0, which is quite usual, and M=3. A way to get a graphical view of a realization G is to represent a Dirac mass by its weight:

Séminaire Parisien de statistique – October

Posted in Seminar/Conference by Julyan Arbel on 28 October 2010

Last  Monday I attended my first Séminaire Parisien de statistique this year. It takes place at Institut Henri Poincaré, whose famous Director is one of the 2010 Fields Medalists, Cédric Villani.

To quote Valencia discussant favorite adjectives, the three talks were terrific and thought-provoking.

Eric Moulines presented Multiple Try Methods in MCMC algorithms. Instead of a unique proposal at each step, one proposes several values, among which only one is kept. The proposals can be chosen independant, but introducing dependence in the right way speeds up the rate of convergence to the stationary distribution. An interesting feature of this algorithm, espacially for Pierre, is that it allows parallel computation (in multiple propositions) whereas the standard Metropolis-Hastings algorithm is essentially sequential. See as well Pierre, Christian and Murray Smith’s block Independent Metropolis-Hastings algorithm for further details.

Jean-Marc Bardet introduced a way to detect ruptures in time series. He focuses on causal time series, ie they can be written only in terms of present and past innovations, for example AR(\infty). A rupture at time t means the parameters change at t.

The must-see talk for me was Eric Barat presentation on BNP modeling for sapce-time emission tomography. For new comer, BNP means more than a bank: Bayesian nonparametric. It is nice to see a very efficient application of BNP methods to a medical field. Eric kindly gives his slides (cf below) which I recommend, espacially the section on random probability measures: he reviews properties of the Dirichlet process, various representations (Chinese restaurant, Stick-breaking), and extends to the Pitman-Yor process and Pitman-Yor mixture. Then he gives posterior simulations by Gibbs sampling. I am interested in dependent over time models, and I am thankful for Eric for his pointer to a recent article of Chung and Dunson on local Dirichlet process, a nifty and simple  construction of a Dependent Dirichlet process.

In a few days, I will try to make clear what the Dirichlet process is!

Dirichlet process and related priors

Posted in Seminar/Conference by Julyan Arbel on 23 September 2010

Johann Dirichlet

My contribution this year to MCB seminar at CREST is about nonparametric Bayes (today at 2 pm, room 14). I shall start with 1) a few words of history, then introduce 2) the Dirichlet Process by several of its numerous defining properties. I will next introduce an extension of the Dirichlet Process, namely 3) the DP mixtures, useful for applications like 4) density estimation. Last, I will show 5) posterior MCMC simulations for a density model and give some 6) reference textbooks.

1) History

Kjell Doksum

Ferguson, T.S. (1973), A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230.
Doksum, K. (1974). Tailfree and neutral random probabilities and their posterior distributions. The Annals of Probability, 2 183-201.
Blackwell, D. (1973). Discreteness of Ferguson selections. The Annals of Statistics, 1 356-358.
Blackwell, D. and MacQueen, J. (1973). Ferguson distributions via Polya urn schemes. The Annals of Statistics, 1 353-355.

2) Dirichlet Process defining properties

Mainly based on Peter Müller’s slides of class 2 at Santa Cruz this summer.

Dirichlet distribution on finite partitions
Stick-breaking/ Sethuraman representation
Polya urn analogy for the predictive probability function
Normalization of a Gamma Process

3) Dirichlet Process Mixtures (DPM)

Convolution of a DP with a continuous kernel to circumvent its discretness.

4) Density estimation with DPMs

5) Posterior MCMC simulation

Based on Peter Müller’s slides of class 6.

6) Textbooks

Bayesian nonparametrics, 2003, J. K. Ghosh, R. V. Ramamoorthi.
Bayesian nonparametrics, 2010, Nils Lid Hjort, Chris Holmes, Peter Müller, Stephen G. Walker, and contributions by Subhashis Ghosal, Antonio Lijoi, Igor Prünster, Yee Whye Teh, Michael I. Jordan, Jim Griffin, David B. Dunson, Fernando Quintana.

NP Bayes slides

Posted in Seminar/Conference by Julyan Arbel on 1 September 2010

I attended in late August a conference on Bayesian nonparametric statistical methods in Santa Cruz, California. The main lecturer was Dr. Peter Muller. He featured the following topics

• Dirichlet processes and the Chinese restaurant process
• Dirichlet process mixture models
• Polya trees
• Dependent Dirichlet processes
• Species sampling models
• Product partition models
• Beta processes and the Indian buffet process
• Computational tools

If you are interested, the slides of his ten talks can be found here, or here for a zip file.

%d bloggers like this: