This post is about different ways of using Markov chain Monte Carlo (MCMC) algorithms for numerical integration or sampling. It can be a hard job to design an MCMC algorithm for a given target distribution. Once it’s finally implemented, it gives a way of sampling a new point X’ given an existing point X. From there, the algorithm can be used in various ways to construct estimators of integrals/distribution of interest. Some ways are more amenable to parallel computing than others. I give some examples with references below.
Traditional MCMC and regeneration
The traditional way of using an MCMC algorithm consists of constructing Markov chains by repeatedly applying the algorithm, for as many iterations as possible. If parallel processors are available, then each processor can produce a chain. A cartoon representation of the construction is in the above diagram, with 3 chains. The estimators generated this way are typically made of some ergodic averages computed on each chain, then averaged across the chains. These estimators are valid as the number of iterations goes to infinity, in an “iterative asymptotic regime”. This was not a concern in the 80s-90s in the “MCMC revolution” when processors’ clock speeds were rapidly increasing. As clock speeds stagnate and available hardware becomes increasingly parallel, these estimators become problematic (“Popular algorithms from the sequential computing era may fade in popularity“). There is a pressing need for estimators that are valid in a “parallel asymptotic regime”, that is, in an asymptotic regime where most of the computations can be done in parallel.
- Rosenthal, Jeffrey S. “Parallel computing and Monte Carlo algorithms.” Far east journal of theoretical statistics 4.2 (2000): 207-236. [PDF]
There have been various attempts at addressing that need. A classic way of producing parallel-friendly estimators is via the regeneration approach. It consists of identifying independent bits (or “tours”) within the trace of a Markov chain, bits that can be recombined across parallel chains. The resulting estimators can then be consistent in the limit of the number of parallel chains. The methods are pretty generically applicable, especially since the second paper below.
- Mykland, Per, Luke Tierney, and Bin Yu. “Regeneration in Markov chain samplers.” Journal of the American Statistical Association 90.429 (1995): 233-241. [JSTOR]
- Brockwell, Anthony E., and Joseph B. Kadane. “Identification of regeneration times in MCMC simulation, with application to adaptive schemes.” Journal of Computational and Graphical Statistics 14.2 (2005): 436-458. [PDF]
Annealed importance sampling and SMC samplers
Annealed importance sampling makes use of MCMC kernels in an importance sampling approach. It requires defining a sequence of target distributions leading to the one of interest, and there are pretty generic ways of doing that based on annealing/tempering. Then a number of (inhomogeneous) Markov chains are generated, evolve according to a sequence of Markov kernels, and the final estimators are asymptotically valid when the number of chains (also called particles in that literature) goes to infinity. Sequential Monte Carlo samplers enjoy the same “parallel asymptotic regime” and add some interactions between the chains through resampling steps. Because of the interactions, some level of communication is necessary between the processors. These methods are thus particularly suited to architectures in which processing units can communicate quickly with one another, such as Graphics Processing Units. A cartoon is below, representing 7 chains interacting over 5 steps.
- Neal, Radford M. “Annealed importance sampling.” Statistics and computing 11.2 (2001): 125-139. [PDF]
- Chopin, Nicolas. “A sequential particle filter method for static models.” Biometrika 89.3 (2002): 539-552. [PDF]
- Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. “Sequential monte carlo samplers.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.3 (2006): 411-436. [PDF]
- Schweizer, Nikolaus. “Non-asymptotic error bounds for sequential MCMC and stability of Feynman-Kac propagators.” arXiv preprint arXiv:1204.2382 (2012). [PDF]
- Schweizer, Nikolaus. “Non-asymptotic error bounds for sequential MCMC methods in multimodal settings.” arXiv preprint arXiv:1205.6733 (2012). [PDF]
- Whiteley, Nick, Anthony Lee, and Kari Heine. “On the role of interaction in sequential Monte Carlo algorithms.” Bernoulli22.1 (2016): 494-529. [PDF]
The unbiased MCMC approach can be understood as yet another way of using MCMC algorithms. There, each processor constructs a pair of chains, for a random number of iterations related to some stopping time. From the pair of chains, an estimator is calculated with the guarantee that it is unbiased for the integral of interest. Then one can average independent copies of the estimators computed in parallel. This yields consistent estimators in the limit of the number of parallel copies, as illustrated by the cartoon below.
- Glynn, Peter W., and Chang-han Rhee. “Exact estimation for Markov chain equilibrium expectations.” Journal of Applied Probability 51.A (2014): 377-389. [PDF]
- Jacob, Pierre E., Fredrik Lindsten, and Thomas B. Schön. “Smoothing with couplings of conditional particle filters.” arXiv preprint arXiv:1701.02002 (2017). [PDF]
- Jacob, Pierre E., John O’Leary, and Yves F. Atchadé. “Unbiased Markov chain Monte Carlo with couplings.” arXiv preprint arXiv:1708.03625 (2017). [PDF]
- Heng, Jeremy, and Pierre E. Jacob. “Unbiased Hamiltonian Monte Carlo with couplings.” arXiv preprint arXiv:1709.00404 (2017). [PDF]
The method has strong links to the approach of the paper below, with essentially the same aims (originally it was a tech report from 1999):
- Neal, Radford M. “Circularly-coupled Markov chain sampling.” arXiv preprint arXiv:1711.04399 (2017). [PDF]
Coupling from the past provides another early example of creative use of Markov kernels and of couplings for sampling:
- Propp, James Gary, and David Bruce Wilson. “Exact sampling with coupled Markov chains and applications to statistical mechanics.” Random Structures & Algorithms 9.1‐2 (1996): 223-252. [PDF]
The point is: once you have implemented an MCMC algorithm, you can use it in the traditional way. But there might be more things you could do with it. In this post, I’ve emphasized on the motivation from parallel computing, but you could also revisit the different methods under the prism of multimodal target distributions or of normalizing constant estimation.