Non-negative unbiased estimators

Posted in Statistics by Pierre Jacob on 13 May 2014
Benedict Cumberbatch having to choose between unbiasedness and non-negativity.

Benedict has to choose between unbiasedness and non-negativity.

Hey hey,

With Alexandre Thiéry we’ve been working on non-negative unbiased estimators for a while now. Since I’ve been talking about it at conferences and since we’ve just arXived the second version of the article, it’s time for a blog post. This post is kind of a follow-up of a previous post from July, where I was commenting on Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade.

The setting is the combination of two components.

1°) There are techniques to “debias” consistent estimators. Consider a sequence (S_n)_{n\geq 0} converging to \lambda \in\mathbb{R} in the sense \lim_{n\to\infty} \mathbb{E}(S_n) = \lambda. Introduce an integer-valued random variable N and the survival probabilities w_n = 1/\mathbb{P}(N\geq n). Then the random variable Y = \sum_{n=0}^N w_n (S_n - S_{n-1}) is an unbiased estimator of \lambda, i.e. its expectation is \lambda. Under additional assumptions it has a finite variance and a finite expected computational time… wow. We’ve just removed the bias off a sequence of biased estimators. We’ve reached the limit, we’ve reached infinity, we’re beyond heaven. That random truncation trick has been invented and reinvented (from Von Neumann and Ulam!) over the years but the most thorough and general study is found in Rhee & Glynn (2013). See for instance Rychlik (1990) for an early example of the same trick.

2°) Now, since there’s one way to debias estimators, there might be others. In particular there might be some way to remove the bias *and* to guarantee some positivity constraint. That is, assume now that \lambda is in \mathbb{R}^+. We might want to have an unbiased estimator of \lambda that takes almost surely non-negative values. A motivating example is precisely the Russian Roulette paper mentioned above, and in general the pseudo-marginal methods. With those methods we can perform “exact inference” on a posterior distribution, as long as we have access to non-negative unbiased estimators of its probability density function point-wise evaluations.

Our results identify cases where non-negative unbiased estimators can be obtained, in the following sense. For instance, assume that we have access to a real-valued unbiased estimator of \lambda \in \mathbb{R}^+, from which we can draw independent copies. We show that there is no algorithm taking those estimators as input and producing almost surely non-negative unbiased estimators of that \lambda. So that it’s impossible to “positivate” an unbiased estimator just like that. To prove such a result we rely on a precise definition of algorithm, which we believe is not restrictive.

More generally we show that if we have unbiased estimators of \lambda\in \mathbb{R} and want to obtain non-negative unbiased estimators of f(\lambda) for some function f:\mathbb{R}\to\mathbb{R}^+, well that’s impossible in general. We are sorry.

However if you have an unbiased estimator of \lambda taking values in an interval [a,b], then it can be possible to have a non-negative unbiased estimator of f(\lambda), depending on the function f considered, and in this case the problem is very much related to the Bernoulli Factory problem of Von Neumann (again! Damn you v.N.). In other words, if you have more knowledge on your unbiased estimator used as input (in this case lower and upper bounds), the problem might have a solution. In practice this type of knowledge would be model specific.

When there isn’t any non-negative unbiased estimators available, pseudo-marginal methods cannot be directly applied. Since those methods have proven very successful in some important areas such as hidden Markov models, we believe it’s interesting to characterize the other settings in which they might be applied. In the paper we discuss exact simulation of diffusions, inference for big data, doubly intractable distribution and inference based on reference priors. In those fields (at least the first three) people have tried to come up with general non-negative unbiased estimators, so we hope to save them some time!


2 Responses

Subscribe to comments with RSS.

  1. Dan Simpson said, on 13 May 2014 at 16:47

    A somewhat random comment: I’m not sure the assumption that \labmda \in [a,b] (for a known finite interval [a,b]) is very unusual. If you think about probabilistic numerics (like the Monte Carlo estimate of the log-determinant in our paper, or “compressed sensing” ideas where you replace a problem with it’s “sketched” version, which involves sampling a matrix from a discrete distribution), it’s not uncommon for there to be hard bounds on the size of the thing being estimated.

    • Pierre Jacob said, on 13 May 2014 at 18:15

      True, it’s very model-specific. In the “large dataset” case, it seems very restrictive (especially the lower bound); in other contexts I can believe it’s different. But who isn’t obsessed with large datasets nowadays!

      Thanks for your interest, Dan. I’m sadly going to miss you in Oxford since I’m heading for Paris tomorrow. Hope you’ll enjoy your stay.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: