## Non-negative unbiased estimators

Hey hey,

With Alexandre Thiéry we’ve been working on non-negative unbiased estimators for a while now. Since I’ve been talking about it at conferences and since we’ve just arXived the second version of the article, it’s time for a blog post. This post is kind of a follow-up of a previous post from July, where I was commenting on Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade.

The setting is the combination of two components.

**1°)** There are techniques to “debias” consistent estimators. Consider a sequence converging to in the sense . Introduce an integer-valued random variable and the survival probabilities . Then the random variable is an unbiased estimator of , i.e. its expectation is . Under additional assumptions it has a finite variance and a finite expected computational time… wow. We’ve just removed the bias off a sequence of biased estimators. We’ve reached the limit, we’ve reached infinity, we’re beyond heaven. That random truncation trick has been invented and reinvented (from Von Neumann and Ulam!) over the years but the most thorough and general study is found in Rhee & Glynn (2013). See for instance Rychlik (1990) for an early example of the same trick.

**2°)** Now, since there’s one way to debias estimators, there might be others. In particular there might be some way to remove the bias *and* to guarantee some positivity constraint. That is, assume now that is in . We might want to have an unbiased estimator of that takes almost surely non-negative values. A motivating example is precisely the Russian Roulette paper mentioned above, and in general the pseudo-marginal methods. With those methods we can perform “exact inference” on a posterior distribution, as long as we have access to non-negative unbiased estimators of its probability density function point-wise evaluations.

Our results identify cases where non-negative unbiased estimators can be obtained, in the following sense. For instance, assume that we have access to a real-valued unbiased estimator of , from which we can draw independent copies. We show that there is no algorithm taking those estimators as input and producing almost surely non-negative unbiased estimators of that . So that it’s impossible to “positivate” an unbiased estimator just like that. To prove such a result we rely on a precise definition of algorithm, which we believe is not restrictive.

More generally we show that if we have unbiased estimators of and want to obtain non-negative unbiased estimators of for some function , well that’s impossible in general. We are sorry.

However if you have an unbiased estimator of taking values in an interval , then it can be possible to have a non-negative unbiased estimator of , depending on the function considered, and in this case the problem is very much related to the Bernoulli Factory problem of Von Neumann (again! Damn you v.N.). In other words, if you have more knowledge on your unbiased estimator used as input (in this case lower and upper bounds), the problem might have a solution. In practice this type of knowledge would be model specific.

When there isn’t any non-negative unbiased estimators available, pseudo-marginal methods cannot be directly applied. Since those methods have proven very successful in some important areas such as hidden Markov models, we believe it’s interesting to characterize the other settings in which they might be applied. In the paper we discuss exact simulation of diffusions, inference for big data, doubly intractable distribution and inference based on reference priors. In those fields (at least the first three) people have tried to come up with general non-negative unbiased estimators, so we hope to save them some time!

Dan Simpsonsaid, on 13 May 2014 at 16:47A somewhat random comment: I’m not sure the assumption that \labmda \in [a,b] (for a known finite interval [a,b]) is very unusual. If you think about probabilistic numerics (like the Monte Carlo estimate of the log-determinant in our paper, or “compressed sensing” ideas where you replace a problem with it’s “sketched” version, which involves sampling a matrix from a discrete distribution), it’s not uncommon for there to be hard bounds on the size of the thing being estimated.

Pierre Jacobsaid, on 13 May 2014 at 18:15True, it’s very model-specific. In the “large dataset” case, it seems very restrictive (especially the lower bound); in other contexts I can believe it’s different. But who isn’t obsessed with large datasets nowadays!

Thanks for your interest, Dan. I’m sadly going to miss you in Oxford since I’m heading for Paris tomorrow. Hope you’ll enjoy your stay.