# Statisfaction

## Path storage in the particle filter

Posted in Statistics by Pierre Jacob on 12 July 2013

Typical ancestry tree generated by a particle filter

Hey particle lovers,

With Lawrence Murray and Sylvain Rubenthaler we looked at how to store the paths in the particle filter, and the related expected memory cost. We just arXived a technical report about it. Would you like to know more?

Consider a particle filter with $N$ particles. At each step of the algorithm, positive weights $w_1, \ldots, w_N$ are given to the sample. The particles with large weights are duplicated and the particles with low weights are killed. One way to perform this resampling step is called “multinomial resampling”, where $N$ indices $a_1, \ldots, a_N$ are drawn from $\{1, \ldots, N\}$ with probabilities $\{\omega_1, \ldots, \omega_N\}$ proportional to the weights. Those indices represent the parent particles for the future generation: the new first particle stems from particle $a_1$, the second from $a_2$, etc. For instance if none of the ancestor indices $(a_i)_{1\leq i \leq N}$ is equal to the index $k$, then the associated particle is killed.

Hence each resampling step kills particles, and looking at the whole genealogy of the particles after $T$ steps you end up with an “ancestry tree” like the one on the figure above. The latest generation on the right has $N$ members, and they all have a common ancestor if you go back in time far enough. We call the part of the tree with only one branch the “trunk“, and the rest is the “crown“. I personally favoured the term “bush” instead of “crown” but I was advised not to use it in the paper. However if you’re getting bored reading this post, please mentally replace “crown” by “bush”. Anyways! If you need to store the whole tree, how costly is it going to be?

Our main result bounds the expected number of nodes in the tree by $T + C N \log N$, where $T$ is the number of time steps, $N$ is the number of particles, and $C>0$ is a constant independent of $T$ and $N$. The first term, $T$, trivially bounds the number of nodes in the trunk, while the second term, $C N \log N$, bounds the number of nodes in the crown. The calculation for the crown turned out to be quite hairy (really sorry for that one). It is actually related to some combinatorial results on random trees arising in population genetics such as described e.g. here. The difference is that we deal with a more general class of trees, but obtain much less precise results.

The document also presents a parallel-friendly algorithm to store the tree recursively during the run of the particle filter. It alternates between two procedures: one to insert the new generation in the tree, and one to “prune” the tree (i.e. to remove the dead wood?). For the numerical section we used Lawrence‘s LibBi, described in this tech report, on which I’ll try to blog about soon!