Pseudo-Bayes: a quick and awfully incomplete review
A recently arxived paper by Pier Bissiri, Chris Holmes and Steve Walker piqued my curiosity about “pseudo-Bayesian” approaches, that is, statistical approaches based on a pseudo-posterior:
where is some pseudo-likelihood. Pier, Chris and Steve use in particular
where is some empirical risk function. A good example is classification; then could be the proportion of properly classified points:
where is some score function parametrised by , and . (Side note: I find the ML convention for the more convenient than the stats convention.)
It turns out that this particular kind of pseudo-posterior has already been encountered before, but with different motivations:
- Chernozhukov and Hong (JoE, 2003) used it to define new Frequentist estimators based on moment estimation ideas (i.e. take above to be some empirical moment constraint). Focus is on establishing Frequentist properties of say the expectation of the pseudo-posterior. (It seems to me that few people have heard about this this paper in Stats).
- the PAC-Bayesian approach which originates from Machine Learning also relies on this kind of pseudo-posterior. To be more precise, PAC-Bayes usually starts by minimising the upper bound of an oracle inequality within a class of randomised estimators. Then, as a result, you obtain as a possible solution, say, a single draw for the pseudo-posterior defined above. A good introduction is this book by Olivier Catoni.
- Finally, Pier, Chris and Steve’s approach is by far the most Bayesian of these three pseudo-Bayesian approaches, in the sense that they try to maintain an interpretation of the pseudo-posterior as a representation on the uncertainty on . Crudely speaking, they don’t look only at the expectation, like the two approaches aboves, but also at the spread of the pseudo-posterior.
Let me mention briefly that quite a few papers have considered using other types of pseudo-likelihood in a pseudo-posterior, such as empirical likelihood, composite likelihood, and so on, but I will shamefully skip them for now.
To which extent this growing interest in “Pseudo-Bayes” should have an impact on Bayesian computation? For one thing, more problems to throw at our favourite algorithms should be good news. In particular, Chernozhukov and Hong mention the possibility to use MCMC as a big advantage for their approach, because typically the function they consider could be difficult to minimise directly by optimisation algorithms. PAC-Bayesians also seem to recommend MCMC, but I could not find so many PAC-Bayesian papers that go beyond the theory and actually implement it; an exception is this.
On the other hand, these pseudo posteriors might be quite nasty. First, given the way they are defined, they should not have the kind of structure that makes it possible to use Gibbs sampling. Second, many interesting choices for seem to be irregular or multimodal. Again, in the classification example, the 0-1 loss function is typically not continuous. Hopefully the coming years will witness some interesting research on which computational approaches are more fit for pseudo-Bayes computation, but readers will not be surprised if I put my Euros on (some form of) SMC!