# Sampling from a maximal coupling

Hi,

In a recent work on parallel computation for MCMC, and also in another one, and in fact also in an earlier one, my co-authors and I use a simple yet very powerful object that is standard in Probability but not so well-known in Statistics: the maximal coupling. Here I’ll describe what this is and an algorithm to sample from such couplings.

Consider two distributions,  $p$ and $q$ on a general (i.e. discrete or continuous) state space $\mathcal{X}$. A coupling refers to a joint distribution, say $\bar{\pi}$ on $\mathcal{X}\times \mathcal{X}$, with first marginal $p$ and second marginal $q$, i.e.

$\forall x\quad \int \bar\pi(x,y) dy = p(x), \quad\forall y\quad \int \bar\pi(x,y) dx = q(y).$

An independent coupling has a density function $(x,y)\mapsto \bar\pi(x,y) = p(x)q(y)$. A maximal coupling, on the other hand, is such that pairs $(X,Y) \sim\bar\pi$ have maximal probability of being identical, i.e. $\mathbb{P}(X = Y)$ is maximal under the marginal constraints $X \sim p$ and $Y \sim q$.

At first, it might sound weird that there would be any possibility of $X$ and $Y$ being identical since their distributions are on a continuous state space. But indeed it is possible. Intuitively, imagine that $X$ is sampled from $p$: as long as $q(X)>0$ then $X$ could also be a sample from $q$, could it not?

The following algorithm provides samples $(X,Y)$ from a maximal coupling of $p$ and $q$.

1. Sample $X$ from $p$.
2. Sample a uniform variable $U\sim\mathcal{U}[0,p(X)]$. If $U < q(X)$,  then output $(X,X)$.
3. Otherwise, sample $Y^\star$ from $q$ and $U^\star\sim\mathcal{U}[0,q(Y^\star)]$, until $U^\star > p(Y^\star)$, and output $(X, Y^\star)$.

It is clear that, if $(X,Y)$ is produced by the above algorithm, then $X\sim p$ (from step 1). Checking that $Y \sim q$ takes a few lines of calculus, similar to proving the validity of rejection sampling. It can be found e.g. in the appendix of this article. This scheme must have been known for a long time and is definitely in Thorisson’s book on coupling. I’m not quite sure who first came up with it, any tips welcome!

To understand the algorithm, let’s look at the following figure, where the density functions of $p$ and $q$ are plotted along with a shaded area under the curve $x \mapsto \min(p(x),q(x))$. The algorithm tries to sample from the distribution represented by the shaded area first, and if it does not succeed, it samples $Y$ from the remainder which has density $x \mapsto q(x) - \min(p(x),q(x))$, up to a normalizing constant.

The graph at the beginning of the article shows pairs $(X,Y)$ generated by the algorithm. The red dots represent samples with $X = Y$. The probability of this event is precisely one minus the total variation between $p$ and $q$, and by the coupling inequality (e.g. here, Lemma 4.9), this is the maximum probability of the event $\{X = Y\}$ under the marginal constraints.

1. Is there any sense in comparing the orders ($p,q)$ and $(q,p)$? I do not think so but I have been wrong in many occasions already!