# Statisfaction

## Coupling of particle filters: likelihood curves

Posted in Statistics by Pierre Jacob on 19 July 2016

Hi!

In this post, I’ll write about coupling particle filters, as proposed in our recent paper with Fredrik Lindsten and Thomas B. Schön from Uppsala University, available on arXiv; and also in this paper by colleagues at NUS. The paper is about a methodology with multiple direct consequences. In this first post, I’ll focus on correlated likelihood estimators; in a later post, I’ll describe a new smoothing algorithm. Both are described in detail in the article. We’ve been blessed to have been advertised by xi’an’s og, so glory is just around the corner.

Everybody is interested in estimating the likelihood $p(y|\theta)$, for some parameter $\theta$, in general state space models (right?). These likelihoods are typically approximated by algorithms called particle filters. They take $\theta$ as input and spit out a likelihood estimator $\hat{p}(y|\theta)$. I ran particle filters five times for many  parameters $\theta$, and plotted the resulting estimators in the following figure. The red curve indicates the exact log-likelihood.

These estimators are not so great! They all underestimate the log-likelihood by a large amount. Why? Here the model has a five-dimensional latent process over $T=1000$ time steps, so that the likelihood is effectively defined by a $5000$-dimensional integral. Since I’ve used only $N=128$ particles in the filter, I obtain poor results. A first solution would be to use more particles, but the algorithmic complexity increases (linearly) with $N$. If the maximum number of particles that I can use is $N = 128$, what can I do?

Suppose that in fact, we are interested in comparing two different likelihood values. This is a common task: think for instance of likelihood ratios in Metropolis-Hastings acceptance ratios. To make the estimator of a likelihood ratio more precise, we can introduce dependencies between the numerator and the denominator; an old variance reduction trick! More precisely, for two parameters $\theta$ and $\tilde\theta$, we can consider correlated likelihood estimators $\hat{p}(y|\theta)$ and $\hat{p}(y|\tilde\theta)$, such that, if $\hat{p}(y|\theta)$ over/under-estimates $p(y|\theta)$, then $\hat{p}(y|\tilde\theta)$ also over/under-estimates $p(y|\tilde\theta)$, so that overall the ratio can be more accurately estimated. This has been attempted many times in particle filtering, one of the first attempts being Mike Pitt’s. The basic idea is to use common random numbers for both particle filters, and then to fiddle with the resampling step, where the main difficulty lies. Intrinsically, the resampling step is a discrete operator that introduces discontinuities in the likelihood estimator, as a function of $\theta$.

Following Mike Pitt and other works, we propose new resampling schemes related to optimal transport and maximal coupling ideas. The new schemes result in estimators of the likelihood as shown in the following figure.

Note that we are still performing poorly in absolute terms, but now the comparison between two likelihood estimators for nearby values of $\theta$ is more faithful to the true likelihood. This turns out to have pretty drastic effects in the estimation of the score function, or in Metropolis-Hastings schemes such as the correlated pseudo-marginal algorithm.