In a recent work on parallel computation for MCMC, and also in another one, and in fact also in an earlier one, my co-authors and I use a simple yet very powerful object that is standard in Probability but not so well-known in Statistics: the maximal coupling. Here I’ll describe what this is and an algorithm to sample from such couplings.
Consider two distributions, and on a general (i.e. discrete or continuous) state space . A coupling refers to a joint distribution, say on , with first marginal and second marginal , i.e.
An independent coupling has a density function . A maximal coupling, on the other hand, is such that pairs have maximal probability of being identical, i.e. is maximal under the marginal constraints and .
At first, it might sound weird that there would be any possibility of and being identical since their distributions are on a continuous state space. But indeed it is possible. Intuitively, imagine that is sampled from : as long as then could also be a sample from , could it not?
The following algorithm provides samples from a maximal coupling of and .
- Sample from .
- Sample a uniform variable . If , then output .
- Otherwise, sample from and , until , and output .
It is clear that, if is produced by the above algorithm, then (from step 1). Checking that takes a few lines of calculus, similar to proving the validity of rejection sampling. It can be found e.g. in the appendix of this article. This scheme must have been known for a long time and is definitely in Thorisson’s book on coupling. I’m not quite sure who first came up with it, any tips welcome!
To understand the algorithm, let’s look at the following figure, where the density functions of and are plotted along with a shaded area under the curve . The algorithm tries to sample from the distribution represented by the shaded area first, and if it does not succeed, it samples from the remainder which has density , up to a normalizing constant.
The graph at the beginning of the article shows pairs generated by the algorithm. The red dots represent samples with . The probability of this event is precisely one minus the total variation between and , and by the coupling inequality (e.g. here, Lemma 4.9), this is the maximum probability of the event under the marginal constraints.