Statisfaction

Coupling of particle filters: likelihood curves

Posted in Statistics by Pierre Jacob on 19 July 2016

Hi!

In this post, I’ll write about coupling particle filters, as proposed in our recent paper with Fredrik Lindsten and Thomas B. Schön from Uppsala University, available on arXiv; and also in this paper by colleagues at NUS. The paper is about a methodology with multiple direct consequences. In this first post, I’ll focus on correlated likelihood estimators; in a later post, I’ll describe a new smoothing algorithm. Both are described in detail in the article. We’ve been blessed to have been advertised by xi’an’s og, so glory is just around the corner.

Everybody is interested in estimating the likelihood p(y|\theta), for some parameter \theta, in general state space models (right?). These likelihoods are typically approximated by algorithms called particle filters. They take \theta as input and spit out a likelihood estimator \hat{p}(y|\theta). I ran particle filters five times for many  parameters \theta, and plotted the resulting estimators in the following figure. The red curve indicates the exact log-likelihood.

likelihoodprofiles.independent

These estimators are not so great! They all underestimate the log-likelihood by a large amount. Why? Here the model has a five-dimensional latent process over T=1000 time steps, so that the likelihood is effectively defined by a 5000-dimensional integral. Since I’ve used only N=128 particles in the filter, I obtain poor results. A first solution would be to use more particles, but the algorithmic complexity increases (linearly) with N. If the maximum number of particles that I can use is N = 128, what can I do?

Suppose that in fact, we are interested in comparing two different likelihood values. This is a common task: think for instance of likelihood ratios in Metropolis-Hastings acceptance ratios. To make the estimator of a likelihood ratio more precise, we can introduce dependencies between the numerator and the denominator; an old variance reduction trick! More precisely, for two parameters \theta and \tilde\theta, we can consider correlated likelihood estimators \hat{p}(y|\theta) and \hat{p}(y|\tilde\theta), such that, if \hat{p}(y|\theta) over/under-estimates p(y|\theta), then \hat{p}(y|\tilde\theta) also over/under-estimates p(y|\tilde\theta), so that overall the ratio can be more accurately estimated. This has been attempted many times in particle filtering, one of the first attempts being Mike Pitt’s. The basic idea is to use common random numbers for both particle filters, and then to fiddle with the resampling step, where the main difficulty lies. Intrinsically, the resampling step is a discrete operator that introduces discontinuities in the likelihood estimator, as a function of \theta.

Following Mike Pitt and other works, we propose new resampling schemes related to optimal transport and maximal coupling ideas. The new schemes result in estimators of the likelihood as shown in the following figure.

likelihoodprofiles.indexcoupled

Note that we are still performing poorly in absolute terms, but now the comparison between two likelihood estimators for nearby values of \theta is more faithful to the true likelihood. This turns out to have pretty drastic effects in the estimation of the score function, or in Metropolis-Hastings schemes such as the correlated pseudo-marginal algorithm.

2 Responses

Subscribe to comments with RSS.

  1. […] So here is the algorithm: we start with a pair of arbitrary trajectories, denoted by and . We first apply a step of CPF to , yielding  (and we do nothing to ; bear with me!). Then we apply a “coupled CPF” kernel to the pair , yielding a new pair . What is a coupled CPF? It’s essentially a CPF kernel applied to both trajectories, using common random numbers and with a fancy resampling scheme as alluded to in the last post. […]

  2. […] So here is the algorithm: we start with a pair of arbitrary trajectories, denoted by and . We first apply a step of CPF to , yielding  (and we do nothing to ; bear with me!). Then we apply a “coupled CPF” kernel to the pair , yielding a new pair . What is a coupled CPF? It’s essentially a CPF kernel applied to both trajectories, using common random numbers and with a fancy resampling scheme as alluded to in the last post. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: