# Statisfaction

## Update on inference with Wasserstein distances

Posted in Statistics by Pierre Jacob on 15 August 2017

You have to read the arXiv report to understand this figure. There’s no way around it.

Hi again,

As described in an earlier postEspen BerntonMathieu Gerber and Christian P. Robert and I are exploring Wasserstein distances for parameter inference in generative models. Generally, ABC and indirect inference are fun to play with, as they make the user think about useful distances between data sets (i.i.d. or not), which is sort of implicit in classical likelihood-based approaches. Thinking about distances between data sets can be a helpful and healthy exercise, even if not always necessary for inference. Viewing data sets as empirical distributions leads to considering the Wasserstein distance, and we try to demonstrate in the paper that it leads to an appealing inferential toolbox.

In passing, the first author Espen Bernton will be visiting Marco Cuturi,  Christian Robert, Nicolas Chopin and others in Paris from September to January; get in touch with him if you’re over there!

We have just updated the arXiv version of the paper, and the main modifications are as follows.

• We propose a new distance between time series termed “curve-matching”, which turns out to be quite similar to dynamic time warping or Skorokhod distances. This distance might be particularly relevant for models generating non-stationary time series, such as Susceptible-Infected-Recovered models.
• Our theoretical results are generally improved. In particular, for the minimum Wasserstein/Kantorovich estimator and variants of it, the proofs are now based on the notion of epi-convergence, commonly used in optimization, and various results from Rockafellar and Wets (2009), Variational analysis.
• On top of the Hilbert distance, based on the Hilbert space-filling curve, we consider the use of the swapping distance of

Puccetti (2017), An algorithm to approximate the optimal expected inner product of two vectors with given marginals.

So we have the Hilbert distance, computable in $n \log(n)$, where $n$ is the number of data points, the swapping distance in $n^2$ and of course the Wasserstein distance in $n^3$. Various other distances are discussed in Section 6 of the paper.

• On the asymptotic behavior of ABC posteriors, our results now cover the use of Hilbert and swapping distances. This is thanks to the convenient property that the Hilbert distance is indeed a distance, and is always larger than the Wasserstein distance; we also rely on some of Mathieu‘s recent results. And the swapping distance (if initialized with Hilbert sorting) is always sandwiched between Wasserstein and Hilbert.
• The numerical experiments have been revised: there is now a bivariate g-and-k example with comparisons to the actual posterior; the toggle switch example from systems biology is unchanged; a new queueing model example, with comparisons to the actual posterior obtained with particle MCMC (we could have also used the method of Shestopaloff and Neal). Finally, we have a more detailed study of the Lévy-driven stochastic volatility model, with 10,000 observations. There we show how transport distances can be combined with summaries to estimate all model parameters (we previously got only four out of five parameters, using transport distances alone).

The supplementary materials for the new version are here (while the supplementary for the previous arXiv version are still online here).