## Different ways of using MCMC algorithms

Hi,

This post is about different ways of using Markov chain Monte Carlo (MCMC) algorithms for numerical integration or sampling. It can be a hard job to design an MCMC algorithm for a given target distribution. Once it’s finally implemented, it gives a way of sampling a new point X’ given an existing point X. From there, the algorithm can be used in various ways to construct estimators of integrals/distribution of interest. Some ways are more amenable to parallel computing than others. I give some examples with references below.

## Scaling of MCMC with dimension (experiments)

Hi all,

In this post, I’ll go through numerical experiments illustrating the scaling of some MCMC algorithms with respect to the dimension. I will focus on a simple setting, to illustrate some theoretical results developed by Gareth Roberts, Andrew Stuart, Alexandros Beskos, Jeff Rosenthal and many of their co-authors over many years, for instance here for random walk Metropolis-Hastings (RWMH, and see here more recently), here for Modified Adjusted Langevin Algorithm (MALA), here for Hamiltonian Monte Carlo (HMC).

## A big problem in our community

Hi all,

Kristian Lum, who was already one of my Statistics superheroes for her many interesting papers and great talks, bravely wrote the following text about her experience as a young statistician going to conferences:

https://medium.com/@kristianlum/statistics-we-have-a-problem-304638dc5de5

I can’t thank Kristian enough for speaking out. Her experience is both shocking and hardly surprising. Many, many academics report similar stories. This simply can’t go on like that.

I happen to have gone to the conferences mentioned by Kristian, and my experience as a young man was completely different. It was all about meeting interesting people, discussing ideas, being challenged, and having good times. Nobody harassed, touched or assaulted me. There was some flirting, as I guess is natural when hundreds of people are put in sunny places far away from home, but I was never the victim of any misconduct or abuse of power. So instead of driving me out of the field, conferences became important, enriching and rewarding moments of my professional life.

Looking back at those conferences I feel sick, and heartbroken, at the thought that some of my peers were having such a difficult time, because of predators who don’t ever face the consequences of their actions. Meanwhile I was part of the silent majority.

The recent series of revelations about sexual harassment and assaults in other professional environments indicate that this is not specific to our field, nor to academia. But this does not make it any more acceptable. I know for a fact that many leaders of our field take this issue extremely seriously (as Kristian mentions too), but clearly much much more needs to be done. The current situation is just shameful; strong and coordinated actions will be needed to fix it. Thanks again to Kristian for the wake-up call.

## Bayesian model comparison with vague or improper priors

Hi,

With Stephane Shao, Jie Ding and Vahid Tarokh we have just arXived a tech report entitled “Bayesian model comparison with the Hyvärinen score: computation and consistency“. Here I’ll explain the context, that is, scoring rules and Hyvärinen scores (originating in Hyvärinen’s score matching approach to inference), and then what we actually do in the paper.

## Approximating the cut distribution

Hi,

This post is about computational issues with the cut distribution for Bayesian inference in misspecified models. Some motivation was given in a previous post about a recent paper on modular Bayesian inference. The cut distribution, or variants of it, might play an important role in combining statistical models, especially in settings where one wants to propagate uncertainty while preventing misspecification from damaging estimation. The cut distribution can also be seen as a probabilistic analog of two-step point estimators. So the cut distribution is more than just a trick! And it raises interesting computational issues which I’ll describe here along with a solution via unbiased MCMC.

## Unbiased Hamiltonian Monte Carlo with couplings

With Jeremy Heng we have recently arXived a paper describing how to remove the burn-in bias of Hamiltonian Monte Carlo (HMC). This follows a recent work on unbiased MCMC estimators in general on which I blogged here. The case of HMC requires a specific yet very simple coupling. A direct consequence of this work is that Hamiltonian Monte Carlo can be massively parallelized: instead of running one chain for many iterations, one can run short coupled chains independently in parallel. The proposed estimators are consistent in the limit of the number of parallel replicates. This is appealing as the number of available processors increases much faster than clock speed, over recent years and for the years to come, for a number of reasons explained e.g. here.

## Statistical learning in models made of modules

Hi,

With Lawrence Murray, Chris Holmes and Christian Robert, we have recently arXived a paper entitled “Better together? Statistical learning in models made of modules”. Christian blogged about it already. The context is the following: parameters of a first model appear as inputs in another model. The question is whether to consider a “joint model approach”, where all parameters are estimated simultaneously with all of the data. Or if one should instead follow a “modular approach”, where the first parameters are estimated with the first model only, ignoring the second model. Examples of modular approaches include the “cut distribution“, or “two-step estimators” (e.g. Chapter 6 of Newey & McFadden (1994)). In many fields, modular approaches are preferred, because the second model is suspected of being more misspecified than the first one. Misspecification of the second model can “contaminate” the joint model, with dire consequences on inference, as described e.g. in Bayarri, Berger & Liu (2009). Other reasons include computational constraints and the lack of simultaneous availability of all models and associated data. In the paper, we try to make sense of the defects of the joint model approach and we propose a principled, quantitative way of choosing between joint and modular approaches.

## Sampling from a maximal coupling

Hi,

In a recent work on parallel computation for MCMC, and also in another one, and in fact also in an earlier one, my co-authors and I use a simple yet very powerful object that is standard in Probability but not so well-known in Statistics: the maximal coupling. Here I’ll describe what this is and an algorithm to sample from such couplings.

## Update on inference with Wasserstein distances

Hi again,

As described in an earlier post, Espen Bernton, Mathieu Gerber and Christian P. Robert and I are exploring Wasserstein distances for parameter inference in generative models. Generally, ABC and indirect inference are fun to play with, as they make the user think about useful distances between data sets (i.i.d. or not), which is sort of implicit in classical likelihood-based approaches. Thinking about distances between data sets can be a helpful and healthy exercise, even if not always necessary for inference. Viewing data sets as empirical distributions leads to considering the Wasserstein distance, and we try to demonstrate in the paper that it leads to an appealing inferential toolbox.

In passing, the first author Espen Bernton will be visiting Marco Cuturi, Christian Robert, Nicolas Chopin and others in Paris from September to January; get in touch with him if you’re over there!

We have just updated the arXiv version of the paper, and the main modifications are as follows.

## Unbiased MCMC with couplings

Hi,

With John O’Leary and Yves Atchadé , we have just arXived our work on removing the bias of MCMC estimators. Here I’ll explain what this bias is about, and the benefits of removing it.

2comments