Sampling from a maximal coupling


Sample from a maximal coupling of two Normal distributions, X ~ N(0.5,0.82) and Y ~ N(-0.5,0.22).


In a recent work on parallel computation for MCMC, and also in another one, and in fact also in an earlier one, my co-authors and I use a simple yet very powerful object that is standard in Probability but not so well-known in Statistics: the maximal coupling. Here I’ll describe what this is and an algorithm to sample from such couplings.

Consider two distributions,  p and q on a general (i.e. discrete or continuous) state space \mathcal{X}. A coupling refers to a joint distribution, say \bar{\pi} on \mathcal{X}\times \mathcal{X}, with first marginal p and second marginal q, i.e.

\forall x\quad \int \bar\pi(x,y) dy = p(x), \quad\forall y\quad \int \bar\pi(x,y) dx = q(y).

An independent coupling has a density function (x,y)\mapsto \bar\pi(x,y) = p(x)q(y). A maximal coupling, on the other hand, is such that pairs (X,Y) \sim\bar\pi have maximal probability of being identical, i.e. \mathbb{P}(X = Y) is maximal under the marginal constraints X \sim p and Y \sim q.

At first, it might sound weird that there would be any possibility of X and Y being identical since their distributions are on a continuous state space. But indeed it is possible. Intuitively, imagine that X is sampled from p: as long as q(X)>0 then X could also be a sample from q, could it not?

The following algorithm provides samples (X,Y) from a maximal coupling of p and q.

  1. Sample X from p.
  2. Sample a uniform variable U\sim\mathcal{U}[0,p(X)]. If U < q(X),  then output (X,X).
  3. Otherwise, sample Y^\star from q and U^\star\sim\mathcal{U}[0,q(Y^\star)], until U^\star > p(Y^\star), and output (X, Y^\star).

It is clear that, if (X,Y) is produced by the above algorithm, then X\sim p (from step 1). Checking that Y \sim q takes a few lines of calculus, similar to proving the validity of rejection sampling. It can be found e.g. in the appendix of this article. This scheme must have been known for a long time and is definitely in Thorisson’s book on coupling. I’m not quite sure who first came up with it, any tips welcome!

To understand the algorithm, let’s look at the following figure, where the density functions of p and q are plotted along with a shaded area under the curve x \mapsto \min(p(x),q(x)). The algorithm tries to sample from the distribution represented by the shaded area first, and if it does not succeed, it samples Y from the remainder which has density x \mapsto q(x) - \min(p(x),q(x)), up to a normalizing constant.


The graph at the beginning of the article shows pairs (X,Y) generated by the algorithm. The red dots represent samples with X = Y. The probability of this event is precisely one minus the total variation between p and q, and by the coupling inequality (e.g. here, Lemma 4.9), this is the maximum probability of the event \{X = Y\} under the marginal constraints.

Published by Pierre Jacob

Professor of statistics, ESSEC Business School

12 thoughts on “Sampling from a maximal coupling

  1. Is there any sense in comparing the orders ($p,q)$ and $(q,p)$? I do not think so but I have been wrong in many occasions already!

    1. Definitely, the procedure is not “symmetric” in p and q. But the expected cost and the probability of {X = Y} are unchanged by swapping p and q. So I’m not sure what would be reasons to prefer one over the other. Perhaps if the costs of sampling/evaluating p and q were asymmetric, and if one wanted to control the tails of the cost instead of the expectation.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: