Who has the biggest in Bayesian Nonparametrics?

Posted in General by Julyan Arbel on 2 September 2015
A graph of balls

A graph of balls

This very fine title quotes a pretty hilarious banquet speech by David Dunson at the last BNP conference held in Raleigh last June. The graph is by François Caron who used it in his talk there. See below for his explanation.

After the summer break, back to work. The academic year to come looks promising from a BNP point of view. Not least that three special issues have been announced, in Statistics & Computing (guest editors: Tamara Broderick (MIT), Katherine Heller (Duke), Peter Mueller (UT Austin)), the Electronic Journal of Statistics (guest editor: Subhashis Ghoshal (NCSU)), and in the International Journal of Approximate Reasoning (proposal deadline December 1st, guest editors: Alessio Benavoli (Lugano), Antonio Lijoi (Pavia) and Antonietta Mira (Lugano)).

BNP is also going to infiltrate MCMSki V, Lenzerheide, Switzerland, January 4-7 2016, with three sessions with a BNP flavor, in addition to plenary speakers David Dunson and Michael Jordan. The International Society for Bayesian Analysis World Meeting, 13 -17 June, 2016, should also host plenty of BNP sessions. And a De Finetti Lecture by Persi Diaconis (Stanford University). (more…)

Turing revisited in Turin, and Oxford

Posted in General by Julyan Arbel on 18 June 2015
Are you paying attention? Good. If you are not listening carefully, you will miss things.

Are you paying attention? Good. If you are not listening carefully, you will miss things. Important things.

With colleagues Stefano Favaro and Bernardo Nipoti from Turin and Yee Whye Teh from Oxford, we have just arXived an article on discovery probabilities. If you are looking for some info on a space shuttle, a cycling team or a TV channel, it’s the wrong place. Instead, discovery probabilities are central to ecology, biology and genomics where data can be seen as a population of individuals belonging to an (ideally) infinite number of species. Given a sample of size n, the l-discovery probability D_{n}(l) is the probability that the next individual observed matches a species with frequency l in the n-sample. For instance, the probability of observing a new species D_{n}(0) is key for devising sampling experiments.

By the way, why Alan Turing? Because with his fellow researcher at Bletchley Park Irving John Good, starred in The Imitation Game too, Turing is also known for the so-called Good-Turing estimator of the discovery probability


which involves m_{l+1,n}, the number of species with frequency l+1 in the sample (ie frequencies frequency, if you follow me). As it happens, this estimator defined in Good 1953 Biometrika paper became wildly popular among ecology-biology-genomics communities since then, at least in the small circles where wild popularity and probability aren’t mutually exclusive.

Simple explicit estimators \hat{\mathcal{D}}_{n}(l) of discovery probabilities in the Bayesian nonparametric (BNP) framework of Gibbs-type priors were given by Lijoi, Mena and Prünster in a 2007 Biometrika paper. The main difference between the two estimators of D_{n}(l) is that Good-Turing involves n and m_{l+1,n} only, while the BNP involves n, m_{l,n} (instead of m_{l+1,n}), and k_n, the total number of observed species. It has been shown in the literature that the BNP estimators are more reliable than Good-Turing estimators.

How do we contribute? (i) we describe the posterior distribution of the discovery probabilities in the BNP model, which is pretty useful for deriving exact credible intervals of the estimates, and (ii) we investigate large n asymptotic behavior of the estimators.


Reading Bayesian classics — presentations

Posted in General by Julyan Arbel on 21 April 2015

The students did a great job in presenting some Bayesian classics. I enjoyed reading the papers (pdfs can be found here), most of which I hadn’t read before, and enjoyed also the students’ talks. I share here some of the best ones, as well as some demonstrative excerpts from the papers. In chronological order (presentations on slideshare below):

  • W. Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 1970.

In this paper, we shall consider Markov chain methods of sampling that are generalizations of a method proposed by Metropolis et al. (1953), which has been used extensively for numerical problems in statistical mechanics.

  • Dennis V. Lindley and Adrian F.M. Smith. Bayes estimates for the linear model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), with discussion, 1–41, 1972.

From Prof. B. de Finetti discussion (note the valliant collaborator Smith!):

I think that the main point to stress about this interesting and important paper is its significance for the philosophical questions underlying the acceptance of the Bayesian standpoint as the true foundation for inductive reasoning, and in particular for statistical inference. So far as I can remember, the present paper is the first to emphasize the role of the Bayesian standpoint as a logical framework for the analysis of intricate statistical situation. […] I would like to express my warmest congratulations to my friend Lindley and his valiant collaborator Smith.


Statistics journals network

Posted in General, R, Statistics by Julyan Arbel on 16 April 2015
Statistical journals friendship (clic for SVG format)

Statistical journals friendship (clic for SVG format)

Xian blogged recently on the incoming RSS read paper: Statistical Modelling of Citation Exchange Between Statistics Journals, by Cristiano Varin, Manuela Cattelan and David Firth. Following the last JRSS B read paper by one of us! The data that are used in the paper (and can be downloaded here) are quite fascinating for us, academics fascinated by academic rankings, for better or for worse (ironic here). They consist in cross citations counts C = (C_{ij}) for 47 statistics journals (see list and abbreviations page 5): C_{ij} is the number of citations from articles published in journal j in 2010 to papers published in journal i in the 2001-2010 decade. The choice of the list of journals is discussed in the paper. Major journals missing include Bayesian Analysis (published from 2006), The Annals of Applied Statistics (published from 2007).

I looked at the ratio of Total Citations Received by Total Citations made. This is a super simple descriptive statistic which happen to look rather similar to Figure 4 which plots Export Scores from Stigler model (can’t say more about it, I haven’t read in detail). The top five is the same modulo the swap between Annals of Statistics and Biometrika. Of course a big difference is that the Cited/Citation ratio isn’t endowed with a measure of uncertainty (below, left is my making, right is Fig. 4 in the paper).


I was surprised not to see a graph / network representation of the data in the paper. As it happens I wanted to try the gephi software for drawing graphs, used for instance by François Caron and Emily Fox in their sparse graphs paper. I got the above graph, where:

  • for the data, I used the citations matrix C renormalized by the total number of citations made, which I denote by \tilde C. This is a way to account for the size (number of papers published) of the journal. This is just a proxy though since the actual number of papers published by the journal is not available in the data. Without that correction, CSDA is way ahead of all the others.
  • the node size represents the Cited/Citing ratio
  • the edge width represents the renormalized \tilde C_{ij}. I’m unsure of what gephi does here, since it converts my directed graph into an undirected graph. I suppose that it displays only the largest of the two edges \tilde C_{ij} and \tilde C_{ji}.
  • for a better visibility I kept only the first decile of heaviest edges.
  • the clusters identified by four colors are modularity classes obtained by the Louvain method.

Some remarks

The two software journals included in the dataset are quite outliers:

  • the Journal of Statistical Software (JSS) is disconnected from the others, meaning it has no normalized citations \tilde C_{ij} in the first decile. Except from its self citations which are quite big and make it the 4th Impact Factor from the total list in 2010 (and apparently the first in 2015).
  • the largest \tilde C_{ij} is the self citations of the STATA Journal (StataJ).


  • CSDA is the most central journal in the sense of the highest (unweighted) degree.

Some further thoughts

All that is just for the fun of it. As mentioned by the authors, citation counts are heavy-tailed, meaning that just a few papers account for much of the citations of a journal while most of the papers account for few citations. As a matter of fact, the total of citations received is mostly driven by a few super-cited papers, and also is the Cited/Citations matrix \tilde C that I use throughout for building the graph. A reason one could put forward about why JRSS B makes it so well is the read papers: for instance, Spiegelhalter et al. (2002), DIC, received alone 11.9% of all JRSS B citations in 2010. Who’d bet the number of citation this new read paper (JRSS A though) will receive?

Bayesian classics

Posted in Statistics by Julyan Arbel on 17 March 2015

Collegio Carlo Alberto in a sunny day

This week I’ll start my Bayesian Statistics master’s course at the Collegio Carlo Alberto. I realized that some of last year students got PhD positions in prestigious US universities. So I thought that letting this year’s students have a first grasp of some great Bayesian papers wouldn’t do harm. The idea is that in addition to the course, the students will pick a paper from a list and present it (or rather part of it) to the others and to me. Which will let them earn some extra points for the final exam mark. It’s in the spirit of Xian’s Reading Classics Seminar (his list here).

I’ve made up the list below, inspired by two textbooks references lists and biased by personal tastes: Xian’s Bayesian Choice and Peter Hoff’s First Course in Bayesian Statistical Methods. See the pdf list and zipped folder for papers. Comments on the list are much welcome!


PS: reference n°1 isn’t a joke!

Tagged with:

momentify R package at BAYSM14

Posted in General, R, Seminar/Conference, Statistics by Julyan Arbel on 20 September 2014

I presented an arxived paper of my postdoc at the big success Young Bayesian Conference in Vienna. The big picture of the talk is simple: there are situations in Bayesian nonparametrics where you don’t know how to sample from the posterior distribution, but you can only compute posterior expectations (so-called marginal methods). So e.g. you cannot provide credible intervals. But sometimes all the moments of the posterior distribution are available as posterior expectations. So morally, you should be able to say more about the posterior distribution than just reporting the posterior mean. To be more specific, we consider a hazard (h) mixture model

\displaystyle h(t)=\int k(t;y)\mu(dy)

where k is a kernel, and the mixing distribution \mu is random and discrete (Bayesian nonparametric approach).

We consider the survival function S which is recovered from the hazard rate h by the transform

\displaystyle S(t)=\exp\Big(-\int_0^t h(s)ds\Big)

and some possibly censored survival data having survival S. Then it turns out that all the posterior moments of the survival curve S(t) evaluated at any time t can be computed.

The nice trick of the paper is to use the representation of a distribution in a [Jacobi polynomial] basis where the coefficients are linear combinations of the moments. So one can sample from [an approximation of] the posterior, and with a posterior sample we can do everything! Including credible intervals.

I’ve wrapped up the few lines of code in an R package called momentify (not on CRAN). With a sequence of moments of a random variable supported on [0,1] as an input, the package does two things:

  • evaluates the approximate density
  • samples from it

A package example for a mixture of beta and 2 to 7 moments gives that result:


Using R in LaTeX with knitr and RStudio

Posted in Geek, LaTeX, R by Julyan Arbel on 28 February 2013


I presented today at INSEE R user group (FL\tauR) how to use knitr (Sweave evolution) for writing \LaTeX documents which are self contained with respect to the source code: your data changed? No big deal, just compile your .Rnw file again and you are done with an updated version of your paper![Ctrl+Shift+I] is easy. Some benefits with respect to having two separate .R and .tex files: it is integrated in a single software (RStudio), you can call variables in your text with the \Sexpr{} command. The slow speed at compilation is no more a real matter as one can put “cache=TRUE” in code chunk options not to reevaluate unchanged chunks, which fastens things.

I share the (brief) slides below. They won’t help much those who already use knitr, but they give the first steps for those who would like to give it a try.

Dropbox Space Race

Posted in Geek, General by Julyan Arbel on 1 December 2012


additionally to the referral program (you refer a new user, you win an extra .5 Go), the Dropbox Space Race will give you 3 Go extra space (for 2 years) if you register with your email from a competing university. The best schools will get more space. Here are the 100 top schools. Com’ on, there is no french school in the 100 top !

Thanks Nicolas for the info.

Tagged with: , ,

Next R meeting in Paris INSEE: ggplot2 and parallel computing

Posted in R by Julyan Arbel on 12 June 2012

our group of R users from INSEE, aka FL\tauR, meets monthly in Paris. Next meeting is on Wed 13 (tomorrow), 1-2 pm, room 539 (an ID is needed to come in,  map to access INSEE \tauR), about ggplot2 and parallel computing. Since the first meeting in February, presentations have included hot topics like webscrapping, C in R, RStudio, SQLite databases or cartography (most of them in French). See you there!

Priors on probability measures

Posted in Seminar/Conference, Statistics by Julyan Arbel on 24 April 2012


for the next GTB meeting at Crest, 3rd May, I will present Peter Orbanz‘ work on Projective limit random probabilities on Polish spaces. It will follow my previous presentation about Bayesian nonparametrics on the Dirichlet process.

The article provides a means of constructing any arbitrary prior distribution on the set of probability measures by working on its finite-dimensional marginals. The vanilla example is the Dirichlet process, which is characterized by its Dirichlet distribution marginals on any finite partition of the space (other examples are the Normalized Inverse Gaussian Process and the Pòlya Tree). The figure above illustrates the projective property of the marginals.

Peter will speak at ISBA 2012 Kyoto session : On the uses of random probabilities in Bayesian inference, along with Ramses Mena and Antonio Lijoi. I’ll write more about that later on!


Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: