Statisfaction

Collegio Carlo Alberto

Posted in General by Julyan Arbel on 12 September 2016
dsc00837

The Collegio in the center.

I have spent three years as a postdoc at the Collegio Carlo Alberto. This was a great time during which I have been able to interact with top colleagues and to prepare my applications in optimal conditions. Now that I have left for Inria Grenoble, here is a brief picture presentation of the Collegio. (more…)

3D density plot in R with Plotly

Posted in General, R by Julyan Arbel on 30 June 2016

Plotly_logo_for_digital_final_(6)

In Bayesian nonparametrics, many models address the problem of density regression, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).

Several densities can be plotted on the same 3D plot thanks to the Plotly R library, “an interactive, browser-based charting library built on the open source JavaScript graphing library, plotly.js.”

In our ecological example, the model provides a series of densities on the Y axis (in our case, posterior density of species diversity), indexed by some covariate X (a pollutant). See file density_plot.txt. The following Plotly R code

library(plotly)
mydata = read.csv("density_plot.txt")
df = as.data.frame(mydata)
plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines") 

provides a graph as below. For the interactive version, see the RPubs page here.

Capture d’écran 2016-06-30 à 12.15.57

 

Why shrinking priors shrink ?

Posted in General, Statistics by JB Salomond on 19 October 2015

remakes-honey-i-shrunk-the-kids-590x3501

Hello there !

While I was in Amsterdam, I took the opportunity to go and work with the Leiden crowd, an more particularly with Stéphanie van der Pas and Johannes Schmidt-Heiber. Since Stéphanie had already obtained neat results for the Horseshoe prior and Johannes had obtained some super cool results for the spike and slab prior, they were the fist choice to team up with to work on sparse models. And guess what ? we have just ArXived a paper in which we study the sparse Gaussian sequence

X_i = \theta_i + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0,1), \quad i=1,...,n,

where only a small number  p_n \ll n of  \theta_i are non zero.

There is a rapidly growing literature on shrinking priors for such models, just look at Polson and Scott (2012), Caron and Doucet (2008), Carvalho, Polson, and Scott (2010) among many, many others, or simply have a look at the program of the last BNP conference. There is also an on growing literature on theoretical properties of some of these priors. The Horseshoe prior was studied in Pas, Kleijn, and Vaart (2014), an extention of the Horseshoe was then study in Ghosh and Chakrabarti (2015), and recently, the spike and slab Lasso was studied in Rocková (2015) (see also Xian ’Og)

All these results are super nice, but still we want to know why do some shinking priors shrink so well and others do not?! As we are all mathematicians here, I will reformulate this last question: What would be the conditions on the prior under which the posterior contracts at the minimax rate1 ?

We considered a Gaussian scale mixture prior on the sequence (\theta_i)

\theta_i \sim p(\theta_i) = \int \frac{e^{-\theta_i^2/(2\sigma^2)}}{\sqrt{2\pi \sigma^2}} \pi(\sigma^2) d\sigma^2

since this family of priors encomparse all the ones studied in the papers mentioned above (and more), so it seemed to be general enough.

Our main contribution is to give conditions on \pi such that the posterior converge at the good rate. We showed that in order to recover the parameter  \theta_i that are non-zeros, the prior should have tails that decays at most exponentially fast, which is similar to the condition impose for the Spike and Slab prior. Another expected condition is that the prior should put enough mass around 0, since our assumption is that the vector of parameter  \theta is nearly black i.e. most of its components are 0.

More surprisingly, in order to recover 0 parameters correctly, one also need some conditions on the tail of the prior. More specifically, the prior’s tails cannot be too big, and if they are, we can then construct a prior that puts enough mass near 0 but which does not concentrate at the minimax rate.

We showed that these conditions are satisfied for many priors including the Horseshoe, the Horseshoe+, the Normal-Gamma and the Spike and Slab Lasso.

The Gaussian scale mixture are also quite simple to use in practice. As explained in Caron and Doucet (2008) a simple Gibbs sampler can be implemented to sample from the posterior. We conducted simulation study to evaluate the sharpness of our conditions. We computed the \ell_2 loss for the Laplace prior, the global-local scale mixture of gaussian (called hereafter bad prior for simplicity), the Horseshoe and the Normal-Gamma prior. The first two do not satisfy our condition, and the last two do. The results are reported in the following picture.

SimulationLattice

As we can see, priors that do and do not satisfy our condition show different behaviour (it seems that the priors that do not fit our conditions have a \ell_2 risk larger than the minimax rate of a factor of n). This seems to indicate that our conditions are sharp.

At the end of the day, our results expands the class of shrinkage priors with theoretical guarantees for the posterior contraction rate. Not only can it be used to obtain the optimal posterior contraction rate for the horseshoe+, the inverse-Gaussian and normal-gamma priors, but the conditions provide some characterization of properties of sparsity priors that lead to desirable behaviour. Essentially, the tails of the prior on the local variance should be at least as heavy as Laplace, but not too heavy, and there needs to be a sizable amount of mass around zero compared to the amount of mass in the tails, in particular when the underlying mean vector grows to be more sparse.

Reference

Caron, François, and Arnaud Doucet. 2008. “Sparse Bayesian Nonparametric Regression.” In Proceedings of the 25th International Conference on Machine Learning, 88–95. ICML ’08. New York, NY, USA: ACM.

Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika 97 (2): 465–80.

Ghosh, Prasenjit, and Arijit Chakrabarti. 2015. “Posterior Concentration Properties of a General Class of Shrinkage Estimators Around Nearly Black Vectors.”

Pas, S.L. van der, B.J.K. Kleijn, and A.W. van der Vaart. 2014. “The Horseshoe Estimator: Posterior Concentration Around Nearly Black Vectors.” Electron. J. Stat. 8: 2585–2618.

Polson, Nicholas G., and James G. Scott. 2012. “Good, Great or Lucky? Screening for Firms with Sustained Superior Performance Using Heavy-Tailed Priors.” Ann. Appl. Stat. 6 (1): 161–85.

Rocková, Veronika. 2015. “Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior.”


  1. For those wondering why the heck with minimax rate here, just remember that a posterior that contracts at the minimax rate induces an estimator which converge at the same rate. It also gives us that confidence region will not be too large.

Visiting scholar in Bocconi, Milan

Posted in Seminar/Conference by Julyan Arbel on 31 March 2011

I am currently visiting Bocconi University for two weeks (following Pierre’s visit to Vancouver). The team here is very active in Bayesian nonparametrics, no surprise with Italy being the cradle of exchangeability with de Finetti’s work. I talk with Sonia Petrone and Pietro Muliere, and can interact with many PhD students in topics related with mine (Maria Anna di Lucca, Steffen Ventz and Sara Wade among others). The whole team plans to attend to this summer BNP conference in Veracruz, Mexico, for which I submitted a poster abstract today on Multidimensional covariate dependent Dirichlet processes. By the way, I am looking for someone for sharing a (cheaper rate) double room in Veracruz…
Bocconi’s building is very sophisticated, though looking like a bunker from outside, the architect had the good idea to separate the rooms with glass walls. Which makes it possible, and quite nice, to scribble equations all around on the wall. By chance I will give a talk for the PhD student seminar next Friday. I shall present our work with Robin and Nicolas on the Estimation of a regression for rank data with an application to the Eurovision Song Contest.

Dirichlet Process for dummies

Posted in Statistics by Julyan Arbel on 16 November 2010

A promise is a promise, here is a post on the so-called Dirichlet process, or DP.

What is it? a stochastic process, whose realizations are probability measures. So it is a distribution on distributions. A nice feature for nonparametric Bayesians, who can thus use the DP as a prior when the parameter itself is a probability measure.

As mentionned in an earlier post, a foundational paper and still a nice reading today, which introduced the DP, is Ferguson, T.S. (1973), A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230. I will not go in very deep details here, but mainly will stress the discreteness of the DP.

First, a DP, say on \mathbb{R}, has two parameters, a precision parameter M, and a base measure G_0 on \mathbb{R}. Basically, G_0 is the mean of the process, and M measures the inverse of its variance. Formally, we write G\sim DP(M,G_0) for a value of the DP. Then, for all measurable subset A of \mathbb{R}, E(G(A))=G_0(A), and V(G(A))=\frac{1}{M+1}G_0(A)(1-G_0(A)). Actually, a more acurate statement says that G(A)\sim \text{Beta}(MG_0(A),M(1-G_0(A)).

A realization is almost surely discrete. In other words, it is a mixture of Dirac masses. Let us explain this explicit expression as a countable mixture, due to Sethuraman (1994). Let V_i\sim\text{Beta}(1,M), and \theta_i\sim G_0, mutually independent. Define p_1=V_1, and p_i=V_i\prod_{j<i} (1-V_j). Then G writes G=\sum p_i \delta_{\theta_i}. This is called the Sethuraman representation, also refered to as “stick-breaking”. The reason for the name is in the definition of the weights p_i: each can be seen as the length of a part of a stick of unit lenght, broken in infinitely many sticks. The first stick is of length V_1. The remaining part has length 1-V_1, and is broken at V_2 of its length, which defines a second stick of length p_2=V_2(1-V_1). And so forth. We see easily that this builds a sequence of p_is that sum to 1, because the remaining part at step n has length \prod_{j\leq n}(1-V_j), which goes to 0 almost surely.

Now let us illustrate this with the nice plots of Eric Barrat. He chooses a standard normal for G_0, which is quite usual, and M=3. A way to get a graphical view of a realization G is to represent a Dirac mass by its weight:

YES Workshop in Eindhoven

Posted in Seminar/Conference by Julyan Arbel on 9 November 2010

Hi there,
this week I am attending a workshop in Eindhoven on Bayesian nonparametrics, YES IV. It is nice how much it matches my research interests, going basically from asymptotics (rates of convergence of posterior distributions and of posterior risks) to the use of stochastic processes for defining priors (among others, Dirichlet, Beta, Neutral to right processes). So I find it very stimulating.

I thank Ismaël Castillo, one of the two “local” organisers (along with Bas Kleijn), for having offered to me the possibility to present my submitted paper on convergence rates. Echoing Pierre’s last post, it took me a lot of “downs” to write the paper. It was like I would never make it with the neverending technicalities it involved. And then preparing the presentation was a lot of pressure. Unexpectedly, the talk yesterday was OK, since I was able to i) understand the questions and ii) answer most of them. Afterwards, it is super nice to get the feedback of specialists of the field, like Harry van Zanten, Bas Kleijn and Eduard Belitser.

For people interested in it, here are my slides:

%d bloggers like this: