Coupling of particle filters: smoothing

withAS — Two trajectories made for each other.

Hi again!

In this post, I’ll explain the new smoother introduced in our paper Coupling of Particle Filters with Fredrik Lindsten and Thomas B. Schön from Uppsala University. Smoothing refers to the task of estimating a latent process $x_{0:T} = (x_0,\ldots, x_T)$ of length $T$ , given noisy measurements of it, $y_{1:T} = (y_0,\ldots, y_T)$ ; the smoothing distribution refers to $p(dx_{0:T}|y_{1:T})$ . The setting is state-space models (what else?!), with a fixed parameter assumed to have been previously estimated.

Our smoother builds upon two recent innovations: the first is the conditional particle filter of Andrieu, Doucet and Holenstein (2010), and the second one is a debiasing technique of Glynn and Rhee (2014). The conditional particle filter (CPF) is a Markov kernel on the space of trajectories. The CPF kernel leaves the smoothing distribution invariant, so it can be iterated to obtain an MCMC sample approximating the smoothing distribution. On the other hand, the debiasing method takes a Markov kernel $K$ and a function $h$ and spits out an unbiased estimator of the integral of $h$ with respect to the invariant distribution of $K$ .

So here is the algorithm: we start with a pair of arbitrary trajectories, denoted by $X^{(0)}$ and $\tilde{X}^{(0)}$ . We first apply a step of CPF to $X^{(0)}$ , yielding $X^{(1)}$ (and we do nothing to $\tilde{X}^{(0)}$ ; bear with me!). Then we apply a “coupled CPF” kernel to the pair $(X^{(1)},\tilde{X}^{(0)})$ , yielding a new pair $(X^{(2)},\tilde{X}^{(1)})$ . What is a coupled CPF? It’s essentially a CPF kernel applied to both trajectories, using common random numbers and with a fancy resampling scheme as alluded to in the last post.

Then we iterate the coupled CPF kernel, yielding pairs $(X^{(3)},\tilde{X}^{(2)})$ , $(X^{(4)},\tilde{X}^{(3)})$ , etc. It’s just a Markov chain on the space of pairs of trajectories… BUT! at some step $\tau$ , the two trajectories become identical: $X^{(\tau)} = \tilde{X}^{(\tau-1)}$ . Tadaaaah! See the figure above, where the two trajectories meet in a few steps. And once the two trajectories meet, they stay together forever after (so cute). This is exciting because it means that $X^{(0)} + \sum_{n=1}^{\tau-1} (X^{(n)} - \tilde{X}^{(n-1)}) =X^{(0)} + \sum_{n=1}^\infty (X^{(n)} - \tilde{X}^{(n-1)})$ (just adding infinitely many zeros). Now it is easy to see that this infinite sum has expectation given precisely by $\mathbb{E}[x_{0:T}|y_{1:T}]$ . Indeed, provided that we can swap limit and expectation, the expectation of the sum is $\mathbb{E}[X^{(0)}] + \sum_{n=1}^\infty (\mathbb{E}[X^{(n)}] -\mathbb{E}[\tilde{X}^{(n-1)}])$ . Since $\mathbb{E}[X^{(n)}] = \mathbb{E}[\tilde{X}^{(n)}]$ by the coupling construction, the sum is telescopic and we are left with $\lim_{n\to\infty} \mathbb{E}[X^{(n)}] =\mathbb{E}[x_{0:T}|y_{1:T}]$ , simply because the CPF chain itself is ergodic.

What’s the point? Instead of running a long MCMC chain, staring blankly at the chain to decide how many iterations are enough iterations, the above scheme can be run $R$ times completely in parallel. Each run takes a random but small number of steps, and produces an unbiased smoothing estimator. We can then average these estimators to get the final result, with accurate confidence intervals provided by our good friend the central limit theorem.