Intractable likelihoods, unbiased estimators and sign problem
We’re at the Big Data era blablabla, but the advanced computational methods usually don’t scale well enough to match the increasing sizes of datasets. For instance, even in a simple case of i.i.d. data and an associated likelihood function , the cost of evaluating the likelihood function at any parameter is typically growing at least linearly with . If you then plug that likelihood into an optimization technique to find the Maximum Likelihood Estimate, or into a sampling technique such as Metropolis-Hastings to sample from the posterior distribution, the computational cost grows accordingly for a fixed number of iterations. However you can get unbiased estimates of the log-likelihood by drawing points uniformly in the index set and by computing . This way you sub-sample from the whole dataset, and you can choose according to your computational budget. However is it possible to perform inference with these estimates instead of the complete log-likelihood?
An approach to this problem is provided in a particularly remarkable paper that studies a Metropolis-Hastings algorithm using unbiased estimates of the target density instead of the target density itself, and shows that the result is *correct* in the sense that you get a Markov chain converging towards the exact target distribution. The problem is that it is not so trivial to get unbiased likelihood estimates. In the i.i.d case mentioned above, you get unbiased estimates of the log-likelihood, but of course the unbiased property is lost if you simply take the exponential of your estimates. Abstracting from the statistical context, suppose you have a random variable with expectation . Then doesn’t have expectation , but is it still possible to construct an unbiased estimate of by doing something else? More generally, for any function and not only for the exponential?
This very relevant article appeared recently on arXiv: Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, see also Xian’s comments on it. The motivation of the paper is slightly different than the one described above but the techniques are similar. Let me summarize the main ideas used in that paper. First develop the function into a Taylor series:
and take an infinite triangular array of independent copies of . Then the random variable
has expectation . Of course it is not practical because the series is infinite, so the second idea is about making the series finite. As my new colleague Alex Thiery remarked, if you want to compute an infinite series
you can always draw an integer-valued random variable and compute
Combining the Taylor expansion and the random truncation, you can get unbiased estimate of for any function , given the ability to sample independent copies of a random variable with expectation . The mentioned paper deals with more specific functions and different truncation schemes, but essentially it’s the same deal. So is that it? Problem solved?
One big remaining issue with this scheme is that you do not control the sign of the resulting estimates; so that if you are estimating a positive quantity such as a likelihood in the i.i.d. case mentioned above, applying the expansion/truncation scheme does not guarantee that your estimates are going to take almost-surely non-negative values. In the Russian Roulette paper it is not a big issue but it could potentially be one. So let me conclude with this problem which is still open to the best of my knowledge.
Suppose you have a real-valued random variable with expectation and a function . You know how to sample any finite number of independent copies of . Can you construct an almost-surely non-negative random variable such that ?
Spoiler alert: the answer is no in general… more about this soon!