Statistical inference on MCMC traces

2020-inferencemcmc-traceplot

Hi everyone,

and Happy New Year! This post is about some statistical inferences that one can do using as “data” the output of MCMC algorithms. Consider the trace plot above. It has been generated by Metropolis–Hastings using a Normal random walk proposal, with a standard deviation “sigma”, on a certain target. Suppose that you are given a function that evaluates the pdf of that target. Can you retrieve the value of sigma used to generate that chain?

As a statistical problem this is a well-defined question. We view the chain as a time series, and, for once, the model is well-specified! But the difficulty comes from the likelihood function being intractable; see that classic paper by Tierney, equation (1), for an expression of the transition kernel of MH. Specifically, the issue occurs whenever two consecutive states in the chain are identical, which indicates that some proposal was rejected during the course of the algorithm. This results in a term in the likelihood equal to the “rejection probability” from that state, namely

r(x) = \int(1 - \alpha(x,x^{\star}))\mathcal{N}(x^{\star};x,\sigma^{2})dx^{\star}

where \alpha(x,x^\star) is the acceptance probability of state x^\star from state x. That term is intractable because of the integral. But we can estimate r(x)!

A naive estimator is obtained by drawing x^\star from the Normal distribution in the integral, and evaluating  1 - \alpha(x,x^\star). The issue with that estimator is that it can be exactly equal to zero, with a non-negligible probability. If many such estimators are multiplied together to estimate the full likelihood, then there is a large chance that at least one of these estimators will be zero, resulting in an overall likelihood estimator equal to zero. This is a bit problematic since we want to compare the likelihood associated with different values of sigma!

There’s a nice trick in “The Alive Particle Filter” by Jasra, Lee, Yau, Zhang which exploits a property of Negative Binomial variables established by Neuts and Zacks in 1967. The estimator is provided by the algorithm below.

2020-algobinomial

The output of that algorithm has expectation r(x) and is guaranteed to never be equal to zero. Equipped with this, we can obtain unbiased, non-negative estimators of the full likelihood of sigma. In combination with some prior information, we can run a pseudo-marginal Metropolis-Hastings algorithm on the sigma space, the output of which is in the figure below.

2020-inferencemcmc-metatraceplot

At this point, a new “meta” problem would be the inference of the standard deviation used in the pseudo-marginal algorithm defined on the sigma space!…

The problem is related to some works on the modeling of animal movements, for instance, “Inference in MCMC step selection models” by Michelot, Blackwell, Chamaillé-Jammes and Matthiopoulos. There, MCMC-type algorithms are used as statistical models for animal movements. Their appeal is to provide simple mechanisms to describe local moves, while being also guaranteed to admit a specified global stationary distribution that might describe where animals roam “on average”.

The code producing the above figures is here: https://github.com/pierrejacob/statisfaction-code/blob/master/2020-01-inferenceMCMC.R

Published by Pierre Jacob

Associate professor, Harvard University

2 thoughts on “Statistical inference on MCMC traces

  1. Thanks for blog post. It is really cool that one can estimate the MH rejection probabilities using the negative Binomial trick – this must be useful for other stuff! I have a question about your implementation: instead of sampling B as a Bernoulli random variable with probability 1-A, wouldn’t it suffice to just set B as an indicator function on the event that 1-A is not equal to 0? If so, this should have lower variance by a “Rao-Blackwellization” argument?

    1. Hey, I might be missing something but if you were to define B the way you suggest, I don’t see why B would be Bernoulli with probability r(x) given x, as required for S to be Negative Binomial (or maybe we don’t need that?). But I agree that it seems wasteful to draw the B’s, which seems to only add noise in the algorithm, and it seems restrictive to force S to be an integer. I’m sure there must be better estimators. I imagine the authors of the “alive particle filter” have thought about these things.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: