I recently started to review papers on Mathematical Reviews / MathSciNet a decided I would post the reviews here from time to time. Here are the first three which deal with (i) objective Bayes priors for discrete parameters, (ii) random probability measures and inference on species variety and (iii) Bayesian nonparametric asymptotic theory and contraction rates.

- An objective approach to prior mass functions for discrete parameter spaces, by Villa, C. and Walker, S. G.,
*J. Amer. Statist. Assoc.*110 (2015), no. 511, 1072–1082.

The paper deals with objective prior derivation in the discrete parameter setting. Previous treatment of this problem includes J. O. Berger, J.-M. Bernardo and D. Sun [J. Amer. Statist. Assoc. 107 (2012), no. 498, 636–648; MR2980073] who rely on embedding the discrete parameter into a continuous parameter space and then applying reference methodology (J.-M. Bernardo [J. Roy. Statist. Soc. Ser. B 41 (1979), no. 2, 113–147; MR0547240]). The main contribution here is to propose an all purpose objective prior based on the Kullback–Leibler (KL) divergence. More specifically, the prior at any parameter value is obtained as follows: (i) compute the minimum KL divergence over between models indexed by and ; (ii) set proportional to a sound transform of the minimum obtained in (i). A good property of the proposed approach is that it is not problem specific. This objective prior is derived in five models (including binomial and hypergeometric) and is compared to the priors known in the literature. The discussion suggests possible extension to the continuous parameter setting.

- A note on nonparametric inference for species variety with Gibbs-type priors, by Favaro, Stefano and James, Lancelot F.,
*Electron. J. Stat.*9 (2015), no. 2, 2884–2902.

A. Lijoi, R. H. Mena and I. Prünster [Biometrika 94 (2007), no. 4, 769–786; MR2416792] recently introduced a Bayesian nonparametric methodology for estimating the species variety featured by an additional unobserved sample of size given an initial observed sample. This methodology was further investigated by S. Favaro, Lijoi and Prünster [Biometrics 68 (2012), no. 4, 1188–1196; MR3040025; Ann. Appl. Probab. 23 (2013), no. 5, 1721–1754; MR3114915]. Although it led to explicit posterior distributions under the general framework of Gibbs-type priors [A. V. Gnedin and J. W. Pitman (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83–102, 244–245;MR2160320], there are situations of practical interest where is required to be very large and the computational burden for evaluating these posterior distributions makes impossible their concrete implementation. This paper presents a solution to this problem for a large class of Gibbs-type priors which encompasses the two parameter Poisson-Dirichlet prior and, among others, the normalized generalized Gamma prior. The solution relies on the study of the large asymptotic behaviour of the posterior distribution of the number of new species in the additional sample. In particular a simple characterization of the limiting posterior distribution is introduced in terms of a scale mixture with respect to a suitable latent random variable; this characterization, combined with the adaptive rejection sampling, leads to derive a large approximation of any feature of interest from the exact posterior distribution. The results are implemented through a simulation study and the analysis of a dataset in linguistics.

- Rate exact Bayesian adaptation with modified block priors, by Gao, Chao and Zhou, Harrison H.,
*Ann. Statist.*44 (2016), no. 1, 318–345.

A novel prior distribution is proposed for adaptive Bayesian estimation, meaning that the associated posterior distribution contracts to the truth with the exact optimal rate and at the same time is adaptive regardless of the unknown smoothness. The prior is termed \textit{block prior} and is defined on the Fourier coefficients of a curve by independently assigning 0-mean Gaussian distributions on blocks of coefficients indexed by some , with covariance matrix proportional to the identity matrix; the proportional coefficient is itself assigned a prior distribution . Under conditions on , it is shown that (i) the prior puts sufficient prior mass near the true signal and (ii) automatically concentrates on its effective dimension. The main result of the paper is a rate-optimal posterior contraction theorem obtained in a general framework for a modified version of a block prior. Compared to the closely related block spike and slab prior proposed by M. Hoffmann, J. Rousseau and J. Schmidt-Hieber [Ann. Statist. 43 (2015), no. 5, 2259–2295; MR3396985] which only holds for the white noise model, the present result can be applied in a wide range of models. This is illustrated through applications to five mainstream models: density estimation, white noise model, Gaussian sequence model, Gaussian regression and spectral density estimation. The results hold under Sobolev smoothness and their extension to more flexible Besov smoothness is discussed. The paper also provides a discussion on the absence of an extra *log term* in the posterior contraction rates (thus achieving the exact minimax rate) with a comparison to other priors commonly used in the literature. These include rescaled Gaussian processes [A. W. van der Vaart and H. van Zanten, Electron. J. Stat. 1 (2007), 433–448; MR2357712; Ann. Statist. 37 (2009), no. 5B, 2655–2675; MR2541442] and sieve priors [V. Rivoirard and J. Rousseau, Bayesian Anal. 7 (2012), no. 2, 311–333; MR2934953; J. Arbel, G. Gayraud and J. Rousseau, Scand. J. Stat. 40 (2013), no. 3, 549–570; MR3091697].

]]>

I have spent three years as a postdoc at the Collegio Carlo Alberto. This was a great time during which I have been able to interact with top colleagues and to prepare my applications in optimal conditions. Now that I have left for Inria Grenoble, here is a brief picture presentation of the Collegio.

Today, the Collegio is a research center specialized in Economics, Statistics and Social Sciences in general. The Stat team focusses mainly on Bayesian Nonparametrics (Antonio Canale, Pierpaolo De Blasi, Stefano Favaro, Guillaume Kon Kam King, Antonio Lijoi, Bernardo Nipoti, Igor Prünster, Matteo Ruggiero), algebraic statistics (Giovanni Pistone) and functional analysis (Bertrand Lods). An “Allievi Honor Program” is training some of the best University students in order to lead them to US PhD programs.

The Collegio started to train the Turin “elite” several centuries ago. The legend says that the hidden child of Mussolini was sent to school there. Each year, the head of the class had the honor of hanging his portrait (that was a boy school, by the way) in the corridors of the Collegio. Thus achieving posterity! But the painting was to be paid by the family, which explains uneven qualities in the portraits.

Click to view slideshow.

Oddly, a collection of stuffed animals keeps company with the pupils portraits, along with a collection of minerals.

Click to view slideshow.The place also used to host a weather station which collected some of the oldest known weather time series. Today, the rooftop observatory is one of the highlights of the place for visiting scholars with its breathtaking view over the city and the Alps in clear days.

Click to view slideshow.

]]>

Hi again!

In this post, I’ll explain the new smoother introduced in our paper Coupling of Particle Filters with Fredrik Lindsten and Thomas B. Schön from Uppsala University. Smoothing refers to the task of estimating a latent process of length , given noisy measurements of it, ; the smoothing distribution refers to . The setting is state-space models (what else?!), with a fixed parameter assumed to have been previously estimated.

Our smoother builds upon two recent innovations: the first is the conditional particle filter of Andrieu, Doucet and Holenstein (2010), and the second one is a debiasing technique of Glynn and Rhee (2014). The conditional particle filter (CPF) is a Markov kernel on the space of trajectories. The CPF kernel leaves the smoothing distribution invariant, so it can be iterated to obtain an MCMC sample approximating the smoothing distribution. On the other hand, the debiasing method takes a Markov kernel and a function and spits out an unbiased estimator of the integral of with respect to the invariant distribution of .

So here is the algorithm: we start with a pair of arbitrary trajectories, denoted by and . We first apply a step of CPF to , yielding (and we do nothing to ; bear with me!). Then we apply a “coupled CPF” kernel to the pair , yielding a new pair . What is a coupled CPF? It’s essentially a CPF kernel applied to both trajectories, using common random numbers and with a fancy resampling scheme as alluded to in the last post.

Then we iterate the coupled CPF kernel, yielding pairs , , etc. It’s just a Markov chain on the space of pairs of trajectories… BUT! at some step , the two trajectories become identical: . Tadaaaah! See the figure above, where the two trajectories meet in a few steps. And once the two trajectories meet, they stay together forever after (so cute). This is exciting because it means that (just adding infinitely many zeros). Now it is easy to see that this infinite sum has expectation given precisely by . Indeed, provided that we can swap limit and expectation, the expectation of the sum is . Since by the coupling construction, the sum is telescopic and we are left with , simply because the CPF chain itself is ergodic.

What’s the point? Instead of running a long MCMC chain, staring blankly at the chain to decide how many iterations are enough iterations, the above scheme can be run times completely in parallel. Each run takes a random but small number of steps, and produces an unbiased smoothing estimator. We can then average these estimators to get the final result, with accurate confidence intervals provided by our good friend the central limit theorem.

Thanks for reading, you deserve a Santana song on smoothing.

]]>

In this post, I’ll write about coupling particle filters, as proposed in our recent paper with Fredrik Lindsten and Thomas B. Schön from Uppsala University, available on arXiv; and also in this paper by colleagues at NUS. The paper is about a methodology with multiple direct consequences. In this first post, I’ll focus on correlated likelihood estimators; in a later post, I’ll describe a new smoothing algorithm. Both are described in detail in the article. We’ve been blessed to have been advertised by xi’an’s og, so glory is just around the corner.

Everybody is interested in estimating the likelihood , for some parameter , in general state space models (right?). These likelihoods are typically approximated by algorithms called particle filters. They take as input and spit out a likelihood estimator . I ran particle filters five times for many parameters , and plotted the resulting estimators in the following figure. The red curve indicates the exact log-likelihood.

These estimators are not so great! They all underestimate the log-likelihood by a large amount. Why? Here the model has a five-dimensional latent process over time steps, so that the likelihood is effectively defined by a -dimensional integral. Since I’ve used only particles in the filter, I obtain poor results. A first solution would be to use more particles, but the algorithmic complexity increases (linearly) with . If the maximum number of particles that I can use is , what can I do?

Suppose that in fact, we are interested in comparing two different likelihood values. This is a common task: think for instance of likelihood ratios in Metropolis-Hastings acceptance ratios. To make the estimator of a likelihood ratio more precise, we can introduce dependencies between the numerator and the denominator; an old variance reduction trick! More precisely, for two parameters and , we can consider correlated likelihood estimators and , such that, if over/under-estimates , then also over/under-estimates , so that overall the ratio can be more accurately estimated. This has been attempted many times in particle filtering, one of the first attempts being Mike Pitt’s. The basic idea is to use common random numbers for both particle filters, and then to fiddle with the resampling step, where the main difficulty lies. Intrinsically, the resampling step is a discrete operator that introduces discontinuities in the likelihood estimator, as a function of .

Following Mike Pitt and other works, we propose new resampling schemes related to optimal transport and maximal coupling ideas. The new schemes result in estimators of the likelihood as shown in the following figure.

Note that we are still performing poorly in absolute terms, but now the comparison between two likelihood estimators for nearby values of is more faithful to the true likelihood. This turns out to have pretty drastic effects in the estimation of the score function, or in Metropolis-Hastings schemes such as the correlated pseudo-marginal algorithm.

]]>

My last post dates back to May 2015… thanks to JB and Julyan for keeping the place busy! I’m not (quite) dead and intend to go back to posting stuff every now and then. And by the way, congrats to both for their new jobs!

Last July, I’ve also started a new job, as an assistant professor in the Department of Statistics at Harvard University, after having spent two years in Oxford. At some point, I might post something on the cultural difference between the ~~European~~ English and American communities of statisticians.

In the coming weeks, I’ll tell you all about a new paper entitled Coupling of Particle Filters, co-written with Fredrik Lindsten and Thomas B. Schön from Uppsala University in Sweden. We are excited about this coupling idea because it’s simple and yet brings massive gains in many important aspects of inference for state space models (including both parameter inference and smoothing). I’ll be talking about it at the World Congress in Probability and Statistics in Toronto next week and at JSM in Chicago, early in August.

I’ll also try to write about another exciting project, joint work with Christian Robert, Chris Holmes and Lawrence Murray, on modularization, cutting feedback, the infamous cut function of BUGS and all that funny stuff. I’ve talked about it in ISBA 2016, and intend to put the associated tech report on arXiv over the summer.

Stay tuned!

]]>

In Bayesian nonparametrics, many models address the problem of *density regression*, including covariate dependent processes. These were settled by the pioneering works by [current ISBA president] MacEachern (1999) who introduced the general class of dependent Dirichlet processes. The literature on dependent processes was developed in numerous models, such as nonparametric regression, time series data, meta-analysis, to cite but a few, and applied to a wealth of fields such as, e.g., epidemiology, bioassay problems, genomics, finance. For references, see for instance the chapter by David Dunson in the Bayesian nonparametrics textbook (edited in 2010 by Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker). With Kerrie Mengersen and Judith Rousseau, we have proposed a dependent model in the same vein for modeling the influence of fuel spills on species diversity (arxiv).

Several densities can be plotted on the same 3D plot thanks to the Plotly R library, *“an interactive, browser-based charting library built on the open source JavaScript graphing library, plotly.js.”*

In our ecological example, the model provides a series of densities on the *Y* axis (in our case, posterior density of species diversity), indexed by some covariate *X* (a pollutant). See file density_plot.txt. The following Plotly R code

library(plotly) mydata = read.csv("density_plot.txt") df = as.data.frame(mydata) plot_ly(df, x = Y, y = X, z = Z, group = X, type = "scatter3d", mode = "lines")

provides a graph as below. For the interactive version, see the RPubs page here.

]]>

*“For about two centuries, Bayesian demography remained largely dormant. Only in recent decades has there been a revival of demographers’ interest in Bayesian methods, following the methodological and computational developments of Bayesian statistics. The area is currently growing fast, especially with the United Nations (UN) population projections becoming probabilistic—and Bayesian.”* Bijak and Bryant (2016)

It is interesting to see that Bayesian statistics have been infiltrating demography in the recent years. The review paper Bayesian demography 250 years after Bayes by Bijak and Bryant (Population Studies, 2016) stresses that promising areas of application include *demographic forecasts, problems with limited data, and highly structured and complex models*. As an indication of this growing interest, ISBA meeting to be held next June will showcase a course and a session devoted to the field (given and organized by Adrian Raftery).

With Vianney Costemalle from INSEE, we recently modestly contributed to the field by proposing a Bayesian model (paper in French) which helps reconciling apparently inconsistent population datasets. The aim is to estimate annual migration flows to France (note that the work covers the period 2004-2011 (long publication process) and as a consequence does not take into account recent migration events). We follow the United Nations (UN) definition of a *long-term migrant, who is someone who settles in a foreign country for at least one year. *At least two datasets can be used to this aim: 1) the population census , annual since 2004, and 2) data from residence permits .

For every migration year , both datasets provide longitudinal data for subsequent years

= #(migrants entered year , counted in census of year ),

= #(migrants entered year , obtained (first) permit year ).

The two datasets are apparently in contradiction, in that the raw numbers would give as much as 50% more permits than census counts per year. But and actually do not account for the same populations. We interpret the longitudinal data counts and as fractions of the unknown total number of migrants entered year , denoted by :

Binomial ,

Binomial .

Note that we assume that the probabilities and are stationary over time (which would be too strong an assumption for recent data, but was OK over the period 2004-2011). Those two parameters take on the interpretation of presence to the census year(s) after entrance for , and proportion of permits delivered after year(s) for . By definition, the need to sum up to some less than or equal to 1. We have observed that letting unfixed led to non convergence of the chains in posterior sampling, which we interpreted as non identifiability of the parameters in this case. Thus, we have estimated the model under different scenarios corresponding to fixed values for . The interpretation is that a proportion of the migrants do not get a residence permit at all during their stay.

We have used noninformative priors on the parameters, and have implemented this model in JAGS (I did not know about Stan at that time).

Such an interplay of the two datasets with common (unknown) total number of migrants is key to borrowing strength between the datasets and being able to estimate the probabilities and . Implementation in JAGS nicely accommodates with disaggregating the data by age, gender and nationality, hence allowing for estimation of probabilities and by categories of age, gender and nationality:

]]>

On February 19 took place at Collegio Carlo Alberto the second Statalks, a series of Italian workshops aimed at Master students, PhD students, post-docs and young researchers. This edition was dedicated to Bayesian Nonparametrics. The first two presentations were introductory tutorials while the last four focused on theory and applications. All six were clearly biased according to the scientific interests of our group. Below are the program and the slides.

- A gentle introduction to Bayesian Nonparametrics I (Antonio Canale)
- A gentle introduction to Bayesian Nonparametrics II (Julyan Arbel)
- Dependent processes in Bayesian Nonparametrics (Matteo Ruggiero)
- Asymptotics for discrete random measures (Pierpaolo De Blasi)
- Applications to Ecology and Marketing (Antonio Canale)
- Species sampling models (Julyan Arbel)

]]>

*[This is a guest post by my friend and colleague Bernardo Nipoti from Collegio Carlo Alberto, Juventus Turin.]*

The matches of the group stage of the UEFA Champions league have just finished and next Monday, the 14th of December 2015, in Nyon, there will be a round of draws for deciding the eight matches that will compose the first round of the knockout phase.

As explained on the UEFA website, rules are simple:

- two seeding pots have been formed: one consisting of group winners and the other of runners-up;
- no team can play a club from their group or any side from their own association;
- due to a decision by the UEFA Executive Committee, teams from Russia and Ukraine cannot meet.

The two pots are:

Group winners: Real Madrid (ESP), Wolfsburg (GER), Atlético Madrid (ESP), Manchester City (ENG), Barcelona (ESP, holders), Bayern München (GER), Chelsea (ENG), Zenit (RUS);

Group runners-up: Paris Saint-Germain (FRA), PSV Eindhoven (NED), Benfica (POR), Juventus (ITA), Roma (ITA), Arsenal (ENG), Dynamo Kyiv (UKR), Gent (BEL).

Giving these few constraints, are there some matches that are more likely to be drawn than others? For example, supporters of Barcelona might wonder whether the seven possible teams (PSG, PSV, Benfica, Juventus, Arsenal, Dynamo Kyiv and Gent) are all equally likely to be the next opponent of their favorite team.

Although it surely could have been done analytically, I decided to tackle this problem with the brute force of simulation and, in few seconds, I got the answer that is summarized by next Table.

Rows refer to teams in the group of winners while columns are dedicated to runner-up teams. In each cell we have the probability that a match between the two teams corresponding to row and column, will be drawn. For example, the probability that Barcelona and Juventus will play a re-match of last year final is approximated by 0.133. A zero in the table indicates that the corresponding match cannot be played since it would violate one of the aforementioned constraints (e.g. Zenit and Dynamo Kyiv cannot meet due to rule 3). As an example, I report the bar graphs displaying the distribution for the next opponent of Arsenal (top), Barcelona (center) and Juventus (bottom).

The most likely match turns out to be Zenit-Arsenal with an approximated probability of 0.232: this can be explained by the fact that Zenit has only 6 and Arsenal only 5 possible opponents.

Finally, if I had to bet on the outcome of next Monday’s draw, I would pick this list of eight matches:

Real Madrid – Roma

Wolfsburg – Benfica

Atlético Madrid – Paris Saint-Germain

Manchester City – Dynamo Kyiv

Barcelona – PSV Eindhoven

Bayern München – Gent

Chelsea – Juventus

Zenit – Arsenal

According to the probabilities reported in the above Table, this is the most likely list of eight matches between all the outcomes that I observed in my simulation. Be aware though that the chances of winning the bet are very low since I observed this exact outcome only 110 times out of one million simulated draws!

Bernardo

]]>

I have been recently invited to referee a paper for a journal I had never heard of before: the International Journal of Biological Instrumentation, published by VIBGYOR Online Publishers. This publisher happens to be on the blacklist of *predatory publishers* by Jeffrey Beall which inventory:

## Potential, possible, or probable predatory scholarly open-access publishers.

I have kindly declined the invitation. Thanks Igor for the link.

Julyan

]]>