Consider two distributions, and on a general (i.e. discrete or continuous) state space . A coupling refers to a joint distribution, say on , with first marginal and second marginal , i.e.

An independent coupling has a density function . A maximal coupling, on the other hand, is such that pairs have maximal probability of being identical, i.e. is maximal under the marginal constraints and .

At first, it might sound weird that there would be any possibility of and being identical since their distributions are on a continuous state space. But indeed it is possible. Intuitively, imagine that is sampled from : as long as then could also be a sample from , could it not?

The following algorithm provides samples from a maximal coupling of and .

Sample from .

Sample a uniform variable . If , then output .

Otherwise, sample from and , until , and output .

It is clear that, if is produced by the above algorithm, then (from step 1). Checking that takes a few lines of calculus, similar to proving the validity of rejection sampling. It can be found e.g. in the appendix of this article. This scheme must have been known for a long time and is definitely in Thorisson’s book on coupling. I’m not quite sure who first came up with it, any tips welcome!

To understand the algorithm, let’s look at the following figure, where the density functions of and are plotted along with a shaded area under the curve . The algorithm tries to sample from the distribution represented by the shaded area first, and if it does not succeed, it samples from the remainder which has density , up to a normalizing constant.

The graph at the beginning of the article shows pairs generated by the algorithm. The red dots represent samples with . The probability of this event is precisely one minus the total variation between and , and by the coupling inequality (e.g. here, Lemma 4.9), this is the maximum probability of the event under the marginal constraints.

Definitely, the procedure is not “symmetric” in p and q. But the expected cost and the probability of {X = Y} are unchanged by swapping p and q. So I’m not sure what would be reasons to prefer one over the other. Perhaps if the costs of sampling/evaluating p and q were asymmetric, and if one wanted to control the tails of the cost instead of the expectation.

Is there any sense in comparing the orders ($p,q)$ and $(q,p)$? I do not think so but I have been wrong in many occasions already!

Definitely, the procedure is not “symmetric” in p and q. But the expected cost and the probability of {X = Y} are unchanged by swapping p and q. So I’m not sure what would be reasons to prefer one over the other. Perhaps if the costs of sampling/evaluating p and q were asymmetric, and if one wanted to control the tails of the cost instead of the expectation.