Sequential Bayesian inference for time series
I have just arXived a review article, written for ESAIM: Proceedings and Surveys, called Sequential Bayesian inference for implicit hidden Markov models and current limitations. The topic is sequential Bayesian estimation: you want to perform inference (say, parameter inference, or prediction of future observations), taking into account parameter and model uncertainties, using hidden Markov models. I hope that the article can be useful for some people: I have tried to stay at a general level, but there are more than 90 references if you’re interested in learning more (sorry in advance for not having cited your article on the topic!). Below I’ll comment on a few points.
Hidden Markov models are very flexible tools to model time series: the observations are assumed to be noisy measurements of a Markov process. The Markov process can represent the complex dynamics of the underlying phenomenon (in the example of the article, it is a prey-predator model for the population growth of planktons). The noise in the measurements accounts for the error of the measuring devices, the fact that the underlying process is partially observed, etc.
The term “implicit”, introduced in Time series analysis via mechanistic models, refers to models where the latent process is a “black box”: we can simulate it, but that’s it. On the other hand, we assume that we can evaluate the probability density function of the measurement distribution.
Sequential inference refers to the ability to update the estimation as new observations arrive. For instance, the observations might be acquired on a daily basis, and thus we might want to update our predictions every day. If the predictions were obtained using “batch techniques” (e.g. MCMC), we would need to re-run the algorithms “from scratch” every day. With sequential methods (such as SMC), we can assimilate the latest observation, for a hopefully small cost every day. Unfortunately, even the recent techniques reviewed in the article fail to be “truly online”, in the sense that the statistical error will eventually blow up when parameter uncertainty is taken into account. If the parameters are kept as fixed values, then the problem becomes easier and can be dealt with in a truly online way. This is one of the current challenges that I’m discussing in the article.
I also want to comment on model uncertainty: in many scenarios, we have a few possible models to deal with a given time series. People often quote George E. P. Box: “all models are wrong, but some are useful”. Wrong here means that the observations actually are not realizations of one of the models, and this is undoubtedly correct. A common belief is that statistical inference naively assumes that one of the models is true; this is far from correct. A lot of articles have investigated the “misspecified setting” in details, and many statistical procedures (including MLE, Bayesian inference and model comparison techniques) provide justifiable answers without assuming that the model is true. In the article, I discuss the Bayes factor between two models. It has a perfectly reasonable justification as a prior predictive criterion, in other words, it compares models on the grounds of how likely the observations are under the prior distributions. Thus one does not need to assume anything about the data-generating process in order to use Bayes factors.
Another interesting aspect of the Bayes factor, by the way, is Occam’s razor principle. The simpler models are favoured over more complex models until enough data have been gathered to say otherwise. The same principle motivates AIC and BIC criteria, which can be seen as asymptotic approximations of Bayes factors for particular choices of priors. In the figure, we can see that a “wrong model” is better than the true data generating model until about 50 to 100 observations are assimilated (in the prior predictive sense of the Bayes factor). The figure shows the estimated Bayes factors for five independent runs of one the numerical methods reviewed in the article (SMC^2): we see that the runs diverge because the errors accumulate over time, hence the method is not “online”.