My last post dates back to May 2015… thanks to JB and Julyan for keeping the place busy! I’m not (quite) dead and intend to go back to posting stuff every now and then. And by the way, congrats to both for their new jobs!
Last July, I’ve also started a new job, as an assistant professor in the Department of Statistics at Harvard University, after having spent two years in Oxford. At some point, I might post something on the cultural difference between the
European English and American communities of statisticians.
In the coming weeks, I’ll tell you all about a new paper entitled Coupling of Particle Filters, co-written with Fredrik Lindsten and Thomas B. Schön from Uppsala University in Sweden. We are excited about this coupling idea because it’s simple and yet brings massive gains in many important aspects of inference for state space models (including both parameter inference and smoothing). I’ll be talking about it at the World Congress in Probability and Statistics in Toronto next week and at JSM in Chicago, early in August.
I’ll also try to write about another exciting project, joint work with Christian Robert, Chris Holmes and Lawrence Murray, on modularization, cutting feedback, the infamous cut function of BUGS and all that funny stuff. I’ve talked about it in ISBA 2016, and intend to put the associated tech report on arXiv over the summer.
In Bayesian nonparametrics, many models address the problem of density regression, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).
In our ecological example, the model provides a series of densities on the Y axis (in our case, posterior density of species diversity), indexed by some covariate X (a pollutant). See file density_plot.txt. The following Plotly R code
library(plotly) mydata = read.csv("density_plot.txt") df = as.data.frame(mydata) plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines")
provides a graph as below. For the interactive version, see the RPubs page here.
“For about two centuries, Bayesian demography remained largely dormant. Only in recent decades has there been a revival of demographers’ interest in Bayesian methods, following the methodological and computational developments of Bayesian statistics. The area is currently growing fast, especially with the United Nations (UN) population projections becoming probabilistic—and Bayesian.” Bijak and Bryant (2016)
It is interesting to see that Bayesian statistics have been infiltrating demography in the recent years. The review paper Bayesian demography 250 years after Bayes by Bijak and Bryant (Population Studies, 2016) stresses that promising areas of application include demographic forecasts, problems with limited data, and highly structured and complex models. As an indication of this growing interest, ISBA meeting to be held next June will showcase a course and a session devoted to the field (given and organized by Adrian Raftery).
With Vianney Costemalle from INSEE, we recently modestly contributed to the field by proposing a Bayesian model (paper in French) which helps reconciling apparently inconsistent population datasets. The aim is to estimate annual migration flows to France (note that the work covers the period 2004-2011 (long publication process) and as a consequence does not take into account recent migration events). We follow the United Nations (UN) definition of a long-term migrant, who is someone who settles in a foreign country for at least one year. At least two datasets can be used to this aim: 1) the population census , annual since 2004, and 2) data from residence permits . (more…)
On February 19 took place at Collegio Carlo Alberto the second Statalks, a series of Italian workshops aimed at Master students, PhD students, post-docs and young researchers. This edition was dedicated to Bayesian Nonparametrics. The first two presentations were introductory tutorials while the last four focused on theory and applications. All six were clearly biased according to the scientific interests of our group. Below are the program and the slides.
- A gentle introduction to Bayesian Nonparametrics I (Antonio Canale)
- A gentle introduction to Bayesian Nonparametrics II (Julyan Arbel)
- Dependent processes in Bayesian Nonparametrics (Matteo Ruggiero)
- Asymptotics for discrete random measures (Pierpaolo De Blasi)
- Applications to Ecology and Marketing (Antonio Canale)
- Species sampling models (Julyan Arbel)
[This is a guest post by my friend and colleague Bernardo Nipoti from Collegio Carlo Alberto,
The matches of the group stage of the UEFA Champions league have just finished and next Monday, the 14th of December 2015, in Nyon, there will be a round of draws for deciding the eight matches that will compose the first round of the knockout phase.
As explained on the UEFA website, rules are simple:
- two seeding pots have been formed: one consisting of group winners and the other of runners-up;
- no team can play a club from their group or any side from their own association;
- due to a decision by the UEFA Executive Committee, teams from Russia and Ukraine cannot meet.
The two pots are:
Group winners: Real Madrid (ESP), Wolfsburg (GER), Atlético Madrid (ESP), Manchester City (ENG), Barcelona (ESP, holders), Bayern München (GER), Chelsea (ENG), Zenit (RUS);
Group runners-up: Paris Saint-Germain (FRA), PSV Eindhoven (NED), Benfica (POR), Juventus (ITA), Roma (ITA), Arsenal (ENG), Dynamo Kyiv (UKR), Gent (BEL).
Giving these few constraints, are there some matches that are more likely to be drawn than others? For example, supporters of Barcelona might wonder whether the seven possible teams (PSG, PSV, Benfica, Juventus, Arsenal, Dynamo Kyiv and Gent) are all equally likely to be the next opponent of their favorite team. (more…)
I have been recently invited to referee a paper for a journal I had never heard of before: the International Journal of Biological Instrumentation, published by VIBGYOR Online Publishers. This publisher happens to be on the blacklist of predatory publishers by Jeffrey Beall which inventory:
Potential, possible, or probable predatory scholarly open-access publishers.
I have kindly declined the invitation. Thanks Igor for the link.
Some time ago, Cédric Villani came to Turin for delivering two talks. One intended for youngsters (high school level say), another one for a wider audience, as a recipient of the Peano Prize. He commented on live, in Italian per favore:
“Grazie mille! Un grande piacere e un grande onore per me!”
I attended both. The reason why I attended the first being that I am acting as a research advisor for Math en Jeans groups. Villani spoke about his book, Birth of a Theorem, or Théorème Vivant. He also shared a list of se7en thoughts/tips about doing research, with illustrations. I find them quite inspiring, here they are.
Illustrating this by showing Faà di Bruno’s formula Wikipedia page. I like this quote, since the formula enters moment computation for objects I’m using everyday. And also because Faà di Bruno lived in Italian Piedmont, precisely in Turin.
“The most important and the most mysterious.”
- Favorable environment
Showing pictures of several places where he worked, including Institut Henri Poincaré. Not sure that this one is the most favorable environment for scientific productivity (as a Director I mean).
Meaning between scientists, not trade. Explaining briefly about polymath projects. And displaying a snapshot of Gowers’s Weblog as an illustration of how diverse exchanges he means. I also believe that blogs are a great information medium
With snapshots of Musica Ricercata sheet music. And a paragraph of La disparition, a novel without the letter e by Georges Perec. Writing this makes me realize how foolish such an enterprise would look like in mathematics.
- Work & Intuition
Interesting to see these two at the same level.
- Perseverance & Luck
Same comment as for point 6.
Hello there !
While I was in Amsterdam, I took the opportunity to go and work with the Leiden crowd, an more particularly with Stéphanie van der Pas and Johannes Schmidt-Heiber. Since Stéphanie had already obtained neat results for the Horseshoe prior and Johannes had obtained some super cool results for the spike and slab prior, they were the fist choice to team up with to work on sparse models. And guess what ? we have just ArXived a paper in which we study the sparse Gaussian sequence
where only a small number of are non zero.
There is a rapidly growing literature on shrinking priors for such models, just look at Polson and Scott (2012), Caron and Doucet (2008), Carvalho, Polson, and Scott (2010) among many, many others, or simply have a look at the program of the last BNP conference. There is also an on growing literature on theoretical properties of some of these priors. The Horseshoe prior was studied in Pas, Kleijn, and Vaart (2014), an extention of the Horseshoe was then study in Ghosh and Chakrabarti (2015), and recently, the spike and slab Lasso was studied in Rocková (2015) (see also Xian ’Og)
All these results are super nice, but still we want to know why do some shinking priors shrink so well and others do not?! As we are all mathematicians here, I will reformulate this last question: What would be the conditions on the prior under which the posterior contracts at the minimax rate1 ?
We considered a Gaussian scale mixture prior on the sequence
since this family of priors encomparse all the ones studied in the papers mentioned above (and more), so it seemed to be general enough.
Our main contribution is to give conditions on such that the posterior converge at the good rate. We showed that in order to recover the parameter that are non-zeros, the prior should have tails that decays at most exponentially fast, which is similar to the condition impose for the Spike and Slab prior. Another expected condition is that the prior should put enough mass around 0, since our assumption is that the vector of parameter is nearly black i.e. most of its components are 0.
More surprisingly, in order to recover 0 parameters correctly, one also need some conditions on the tail of the prior. More specifically, the prior’s tails cannot be too big, and if they are, we can then construct a prior that puts enough mass near 0 but which does not concentrate at the minimax rate.
We showed that these conditions are satisfied for many priors including the Horseshoe, the Horseshoe+, the Normal-Gamma and the Spike and Slab Lasso.
The Gaussian scale mixture are also quite simple to use in practice. As explained in Caron and Doucet (2008) a simple Gibbs sampler can be implemented to sample from the posterior. We conducted simulation study to evaluate the sharpness of our conditions. We computed the loss for the Laplace prior, the global-local scale mixture of gaussian (called hereafter bad prior for simplicity), the Horseshoe and the Normal-Gamma prior. The first two do not satisfy our condition, and the last two do. The results are reported in the following picture.
As we can see, priors that do and do not satisfy our condition show different behaviour (it seems that the priors that do not fit our conditions have a risk larger than the minimax rate of a factor of ). This seems to indicate that our conditions are sharp.
At the end of the day, our results expands the class of shrinkage priors with theoretical guarantees for the posterior contraction rate. Not only can it be used to obtain the optimal posterior contraction rate for the horseshoe+, the inverse-Gaussian and normal-gamma priors, but the conditions provide some characterization of properties of sparsity priors that lead to desirable behaviour. Essentially, the tails of the prior on the local variance should be at least as heavy as Laplace, but not too heavy, and there needs to be a sizable amount of mass around zero compared to the amount of mass in the tails, in particular when the underlying mean vector grows to be more sparse.
Caron, François, and Arnaud Doucet. 2008. “Sparse Bayesian Nonparametric Regression.” In Proceedings of the 25th International Conference on Machine Learning, 88–95. ICML ’08. New York, NY, USA: ACM.
Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika 97 (2): 465–80.
Ghosh, Prasenjit, and Arijit Chakrabarti. 2015. “Posterior Concentration Properties of a General Class of Shrinkage Estimators Around Nearly Black Vectors.”
Pas, S.L. van der, B.J.K. Kleijn, and A.W. van der Vaart. 2014. “The Horseshoe Estimator: Posterior Concentration Around Nearly Black Vectors.” Electron. J. Stat. 8: 2585–2618.
Polson, Nicholas G., and James G. Scott. 2012. “Good, Great or Lucky? Screening for Firms with Sustained Superior Performance Using Heavy-Tailed Priors.” Ann. Appl. Stat. 6 (1): 161–85.
Rocková, Veronika. 2015. “Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior.”
- For those wondering why the heck with minimax rate here, just remember that a posterior that contracts at the minimax rate induces an estimator which converge at the same rate. It also gives us that confidence region will not be too large.↩
while everyone was away in July, James Ridgway and I posted our “leave (the) pima paper alone” paper on arxiv, in which we discuss to which extent probit/logit regression and not too big datasets (such as the now famous Pima Indians dataset) constitute a relevant benchmark for Bayesian computation.
The actual title of the paper is “Leave Pima Indians alone…”, but xian changed it to “Leave *the* Pima Indians alone…” when discussing it on his blog. Any opinion on whether it does sound better with “the”?
On a different note, one of our findings is that Expectation-Propagation works wonderfully for such models; yes it is an approximate method, but it is very fast, and the approximation error is consistently negligible on all the datasets we looked at.
James has just posted on CRAN the EPGLM package, which computes an EP approximation of the posterior of a logit or probit model. The documentation is a bit terse at the moment, but it is very straightforward to use.
Comments on the package, the paper, its grammar or Pima Indians are most welcome!
This very fine title quotes a pretty hilarious banquet speech by David Dunson at the last BNP conference held in Raleigh last June. The graph is by François Caron who used it in his talk there. See below for his explanation.
After the summer break, back to work. The academic year to come looks promising from a BNP point of view. Not least that three special issues have been announced, in Statistics & Computing (guest editors: Tamara Broderick (MIT), Katherine Heller (Duke), Peter Mueller (UT Austin)), the Electronic Journal of Statistics (guest editor: Subhashis Ghoshal (NCSU)), and in the International Journal of Approximate Reasoning (proposal deadline December 1st, guest editors: Alessio Benavoli (Lugano), Antonio Lijoi (Pavia) and Antonietta Mira (Lugano)).
BNP is also going to infiltrate MCMSki V, Lenzerheide, Switzerland, January 4-7 2016, with three sessions with a BNP flavor, in addition to plenary speakers David Dunson and Michael Jordan. The International Society for Bayesian Analysis World Meeting, 13 -17 June, 2016, should also host plenty of BNP sessions. And a De Finetti Lecture by Persi Diaconis (Stanford University). (more…)