Derivative-free estimate of derivatives

Posted in Statistics by Pierre Jacob on 23 April 2013
Two Hessian soldiers having a ball.

Two Hessian soldiers having a ball.


Arnaud Doucet, Sylvain Rubenthaler and I have just put a technical report on arXiv about estimating the first- and second-order derivatives of the log-likelihood (also called the score and the observed information matrix respectively) in general (intractable) statistical models, and in particular in (non-linear non-Gaussian) state-space models. We call them “derivative-free” estimates because they can be computed even if the user cannot compute any kind of derivatives related to the model (as opposed to e.g. this paper and this paper). Actually in some cases of interest we cannot even evaluate the log-likelihood point-wise (we do not have a formula for it), so forget about explicit derivatives. Would you like to know more?

Our tech report builds heavily upon the Iterated Filtering series of papers (see this first PNAS paper, then this technical Annals of Stats paper). It simply extends it to the second-order derivatives (actually we also propose an alternate estimate for the score). The main idea can be interpreted in terms of Bayesian asymptotics. Say, you have some (univariate, for clarity) parameter \theta, and you want to evaluate the derivatives of the log-likelihood at some point \theta_0; introduce a prior distribution p(\theta; \theta_0, \tau^2) with mean \theta_0 and variance \tau^2. Consider the behaviour of posterior distribution when 1) the dataset Y is fixed (hence the likelihood \mathcal{L}(\theta; Y) is also fixed) and 2) the prior distribution concentrates, that is \tau \to 0. What happens to the posterior distribution?

As you can imagine it also shrinks: it looks more and more like the prior distribution when \tau decreases. Now interestingly enough, under mild regularity assumptions and a Gaussian prior distribution we have the two following inequalities:

\left\lvert \nabla \ell(\theta_0) - \tau^{-2} \left(\mathbb{E}\left[\theta \vert Y \right] - \theta_0\right) \right\rvert \leq C_1 \tau^2


\left\lvert \nabla^2 \ell(\theta_0) - \tau^{-4} \left(\mathbb{V}\left[\theta \vert Y \right] - \tau^2 \right) \right\rvert \leq C_2 \tau^2

for some C_1, C_2 < \infty. It means that the shift from the prior mean to the posterior mean is proportional to the first derivative of the log-likelihood \nabla \ell(\theta_0) (already known from the IF papers), while the shift from the prior variance to the posterior variance is proportional to its second order derivative \nabla^2 \ell(\theta_0) (new stuff). All of this up to an error term going to zero at the speed of \tau^2.

In practical terms, it means that the problem of computing the first two derivatives is turned into a problem of computing posterior expectations, which can be tackled with Monte Carlo methods for a very broad class of models.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: