Statisfaction

Different ways of using MCMC algorithms

Posted in General, Statistics by Pierre Jacob on 19 July 2018

2018-07-diagramasymptotics-mcmc

Hi,

This post is about different ways of using Markov chain Monte Carlo (MCMC) algorithms for numerical integration or sampling. It can be a hard job to design an MCMC algorithm for a given target distribution. Once it’s finally implemented, it gives a way of sampling a new point X’ given an existing point X. From there, the algorithm can be used in various ways to construct estimators of integrals/distribution of interest. Some ways are more amenable to parallel computing than others. I give some examples with references below.

Traditional MCMC and regeneration

The traditional way of using an MCMC algorithm consists of constructing Markov chains by repeatedly applying the algorithm, for as many iterations as possible. If parallel processors are available, then each processor can produce a chain. A cartoon representation of the construction is in the above diagram, with 3 chains. The estimators generated this way are typically made of some ergodic averages computed on each chain, then averaged across the chains. These estimators are valid as the number of iterations goes to infinity, in an “iterative asymptotic regime”. This was not a concern in the 80s-90s in the “MCMC revolution” when processors’ clock speeds were rapidly increasing. As clock speeds stagnate and available hardware becomes increasingly parallel, these estimators become problematic (“Popular algorithms from the sequential computing era may fade in popularity“). There is a pressing need for estimators that are valid in a “parallel asymptotic regime”, that is, in an asymptotic regime where most of the computations can be done in parallel.

  • Rosenthal, Jeffrey S. “Parallel computing and Monte Carlo algorithms.” Far east journal of theoretical statistics 4.2 (2000): 207-236. [PDF]

There have been various attempts at addressing that need. A classic way of producing parallel-friendly estimators is via the regeneration approach. It consists of identifying independent bits (or “tours”) within the trace of a Markov chain, bits that can be recombined across parallel chains. The resulting estimators can then be consistent in the limit of the number of parallel chains. The methods are pretty generically applicable, especially since the second paper below.

  • Mykland, Per, Luke Tierney, and Bin Yu. “Regeneration in Markov chain samplers.” Journal of the American Statistical Association 90.429 (1995): 233-241. [JSTOR]
  • Brockwell, Anthony E., and Joseph B. Kadane. “Identification of regeneration times in MCMC simulation, with application to adaptive schemes.” Journal of Computational and Graphical Statistics 14.2 (2005): 436-458. [PDF]

Annealed importance sampling and SMC samplers

Annealed importance sampling makes use of MCMC kernels in an importance sampling approach. It requires defining a sequence of target distributions leading to the one of interest, and there are pretty generic ways of doing that based on annealing/tempering. Then a number of (inhomogeneous) Markov chains are generated, evolve according to a sequence of Markov kernels, and the final estimators are asymptotically valid when the number of chains (also called particles in that literature) goes to infinity. Sequential Monte Carlo samplers enjoy the same “parallel asymptotic regime” and add some interactions between the chains through resampling steps. Because of the interactions, some level of communication is necessary between the processors. These methods are thus particularly suited to architectures in which processing units can communicate quickly with one another, such as Graphics Processing Units. A cartoon is below, representing 7 chains interacting over 5 steps.

2018-07-diagramasymptotics-smc

  • Neal, Radford M. “Annealed importance sampling.” Statistics and computing 11.2 (2001): 125-139. [PDF]
  • Chopin, Nicolas. “A sequential particle filter method for static models.” Biometrika 89.3 (2002): 539-552. [PDF]
  • Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. “Sequential monte carlo samplers.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.3 (2006): 411-436. [PDF]
  • Schweizer, Nikolaus. “Non-asymptotic error bounds for sequential MCMC and stability of Feynman-Kac propagators.” arXiv preprint arXiv:1204.2382 (2012). [PDF]
  • Schweizer, Nikolaus. “Non-asymptotic error bounds for sequential MCMC methods in multimodal settings.” arXiv preprint arXiv:1205.6733 (2012). [PDF]
  • Whiteley, Nick, Anthony Lee, and Kari Heine. “On the role of interaction in sequential Monte Carlo algorithms.” Bernoulli22.1 (2016): 494-529. [PDF]

Unbiased MCMC

The unbiased MCMC approach can be understood as yet another way of using MCMC algorithms. There, each processor constructs a pair of chains, for a random number of iterations related to some stopping time. From the pair of chains, an estimator is calculated with the guarantee that it is unbiased for the integral of interest. Then one can average independent copies of the estimators computed in parallel. This yields consistent estimators in the limit of the number of parallel copies, as illustrated by the cartoon below.

2018-07-diagramasymptotics-umcmc

  • Glynn, Peter W., and Chang-han Rhee. “Exact estimation for Markov chain equilibrium expectations.” Journal of Applied Probability 51.A (2014): 377-389. [PDF]
  • Jacob, Pierre E., Fredrik Lindsten, and Thomas B. Schön. “Smoothing with couplings of conditional particle filters.” arXiv preprint arXiv:1701.02002 (2017). [PDF]
  • Jacob, Pierre E., John O’Leary, and Yves F. Atchadé. “Unbiased Markov chain Monte Carlo with couplings.” arXiv preprint arXiv:1708.03625 (2017). [PDF]
  • Heng, Jeremy, and Pierre E. Jacob. “Unbiased Hamiltonian Monte Carlo with couplings.” arXiv preprint arXiv:1709.00404 (2017). [PDF]

The method has strong links to the approach of the paper below, with essentially the same aims (originally it was a tech report from 1999):

  • Neal, Radford M. “Circularly-coupled Markov chain sampling.” arXiv preprint arXiv:1711.04399 (2017). [PDF]

Coupling from the past provides another early example of creative use of Markov kernels and of couplings for sampling:

  • Propp, James Gary, and David Bruce Wilson. “Exact sampling with coupled Markov chains and applications to statistical mechanics.” Random Structures & Algorithms 9.1‐2 (1996): 223-252. [PDF]

The point is: once you have implemented an MCMC algorithm, you can use it in the traditional way. But there might be more things you could do with it. In this post, I’ve emphasized on the motivation from parallel computing, but you could also revisit the different methods under the prism of multimodal target distributions or of normalizing constant estimation.

Advertisements

2 Responses

Subscribe to comments with RSS.

  1. xi'an said, on 20 July 2018 at 20:43

    Great triptych! I find both regeneration and coupling from the past beautiful concepts, but list faith in their ability to apply in realistic situations…

    • Pierre Jacob said, on 20 July 2018 at 22:57

      I was under the same impression but Anthony Lee pointed me to Section 7.4 in his paper https://arxiv.org/abs/1407.5770 where they demonstrate regeneration in a PMMH case.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: