## Estimating convergence of Markov chains

Hi all,

Niloy Biswas (PhD student at Harvard) and I have recently arXived a manuscript on the assessment of MCMC convergence (using couplings!). Here I’ll describe the main result, and some experiments (that are not in the current version of the paper) revisiting a 1996 paper by Mary Kathryn Cowles and Jeff Rosenthal entitled “A simulation approach to convergence rates for Markov chain Monte Carlo algorithms“. Code in R producing the figures of this post is available here.

## Budget constrained simulations

Hi all,

This post is about some results from “Bias Properties of Budget Constrained Simulations“, by Glynn & Heidelberger and published in Operations Research in 1990. I have found these results extremely useful, and our latest manuscript on unbiased MCMC recalls them in detail. Below I go through some of the results and describe the simulations that lead to the above figure.

## Another update on unbiased smoothing

Hi all,

This is a short update on my research on unbiased smoothing with coupled conditional particle filters. In a previous post I naively explained that I was done with the project since the article was accepted for publication in a journal.

However, a bug was found in the code thanks to a very careful reader. So I’ve fixed the code, re-launched all the simulations, and updated the arXiv version straight away; that was on September 5, 2018. The conclusions of the article are unchanged, thankfully; in fact, the text of the article is exactly the same, only the numerical values in the tables and the figures are different. Phew!

Since then the version with the buggy results was retracted from the journal, and hopefully, the updated version will be published soon. In the meantime the arXiv version is up-to-date.

## Final update on unbiased smoothing

Hi,

Two years ago I blogged about couplings of conditional particle filters for smoothing. The paper with Fredrik Lindsten and Thomas Schön has just been accepted for publication at JASA, and the arXiv version and github repository are hopefully in their final forms. Here I’ll mention a few recent developments and follow-up articles by other researchers.

## Couplings of Normal variables

Hi,

Just to play a bit with the gganimate package, and to celebrate National Coupling Day, the above plot shows different couplings of two univariate Normal distributions, Normal(0,1) and Normal(2,1). That is, each point is a pair (x,y) where x follows a Normal(0,1) and y follows a Normal(2,1). Below I’ll recall briefly how each coupling operates, in the Normal case. The code is available at the end of the post.

## Different ways of using MCMC algorithms

Hi,

This post is about different ways of using Markov chain Monte Carlo (MCMC) algorithms for numerical integration or sampling. It can be a hard job to design an MCMC algorithm for a given target distribution. Once it’s finally implemented, it gives a way of sampling a new point X’ given an existing point X. From there, the algorithm can be used in various ways to construct estimators of integrals/distribution of interest. Some ways are more amenable to parallel computing than others. I give some examples with references below.

## Scaling of MCMC with dimension (experiments)

Hi all,

In this post, I’ll go through numerical experiments illustrating the scaling of some MCMC algorithms with respect to the dimension. I will focus on a simple setting, to illustrate some theoretical results developed by Gareth Roberts, Andrew Stuart, Alexandros Beskos, Jeff Rosenthal and many of their co-authors over many years, for instance here for random walk Metropolis-Hastings (RWMH, and see here more recently), here for Modified Adjusted Langevin Algorithm (MALA), here for Hamiltonian Monte Carlo (HMC).

## A big problem in our community

Hi all,

Kristian Lum, who was already one of my Statistics superheroes for her many interesting papers and great talks, bravely wrote the following text about her experience as a young statistician going to conferences:

https://medium.com/@kristianlum/statistics-we-have-a-problem-304638dc5de5

I can’t thank Kristian enough for speaking out. Her experience is both shocking and hardly surprising. Many, many academics report similar stories. This simply can’t go on like that.

I happen to have gone to the conferences mentioned by Kristian, and my experience as a young man was completely different. It was all about meeting interesting people, discussing ideas, being challenged, and having good times. Nobody harassed, touched or assaulted me. There was some flirting, as I guess is natural when hundreds of people are put in sunny places far away from home, but I was never the victim of any misconduct or abuse of power. So instead of driving me out of the field, conferences became important, enriching and rewarding moments of my professional life.

Looking back at those conferences I feel sick, and heartbroken, at the thought that some of my peers were having such a difficult time, because of predators who don’t ever face the consequences of their actions. Meanwhile I was part of the silent majority.

The recent series of revelations about sexual harassment and assaults in other professional environments indicate that this is not specific to our field, nor to academia. But this does not make it any more acceptable. I know for a fact that many leaders of our field take this issue extremely seriously (as Kristian mentions too), but clearly much much more needs to be done. The current situation is just shameful; strong and coordinated actions will be needed to fix it. Thanks again to Kristian for the wake-up call.

## nrow, references and copies

Hi all,

This post deals with a strange phenomenon in R that I have noticed while working on unbiased MCMC. Reducing the problem to a simple form, consider the following code, which iteratively samples a vector ‘x’ and stores it in a row of a large matrix called ‘chain’ (I’ve kept the MCMC terminology).

dimstate = 100 nmcmc = 1e4 chain = matrix(0, nrow = nmcmc, ncol = dimstate) for (imcmc in 1:nmcmc){ if (imcmc == nrow(chain)){ #call to nrow } x = rnorm(dimstate, mean = 0, sd = 1) chain[imcmc,] = x #copying of x in chain }

If you execute this code, you will see that it is surprisingly slow: it takes close to a minute on my computer. Now, consider the next block, which does exactly the same except that the vector ‘x’ is not copied into the matrix ‘chain’.

dimstate = 100 nmcmc = 1e4 chain = matrix(0, nrow = nmcmc, ncol = dimstate) for (imcmc in 1:nmcmc){ if (imcmc == nrow(chain)){ #call to nrow } x = rnorm(dimstate, mean = 0, sd = 1) # chain[imcmc,] = x #no more copying }

This code runs nearly instantaneously. Could it be that just copying a vector in a matrix takes a lot of time? Sounds unlikely. Now consider this third block.

dimstate = 100 nmcmc = 1e4 chain = matrix(0, nrow = nmcmc, ncol = dimstate) for (imcmc in 1:nmcmc){ if (imcmc == nmcmc){ #no call to nrow } x = rnorm(dimstate, mean = 0, sd = 1) chain[imcmc,] = x #copying of x in chain }

This code runs nearly instantaneously as well; this time ‘x’ is copied into ‘chain’, but the call to the nrow function is removed….?! What is nrow doing? It is meant to simply return dim(chain)[1], the first dimension of chain. So consider this fourth block.

dimstate = 100 nmcmc = 1e4 chain = matrix(0, nrow = nmcmc, ncol = dimstate) for (imcmc in 1:nmcmc){ if (imcmc == dim(chain)[1]){ #call to dim instead of nrow } x = rnorm(dimstate, mean = 0, sd = 1) chain[imcmc,] = x #copying of x in chain }

This one also runs instantaneously! So replacing nrow(chain) by dim(chain)[1] solves the problem. Why?

The answer comes from R guru and terrific statistician Louis Aslett. I directly quote from an exchange of emails, since he brilliantly explains the phenomenon.

You probably know R stores everything by reference, so if I do:

x <- matrix(0, nrow=1e5, ncol=100)

y <- xI actually only have one copy of the matrix in memory with two references to it. If I then do:

x[1,1] <- 1

R will first make a copy of the whole matrix, update x to point to that and then change the first element to one. This idea is used when you pass a variable to a standard (i.e. non-core, non-primitive) R function, which nrow is: it creates a reference to the variable you pass so that it doesn’t have to copy and the function call is very fast …. as long as you don’t write to it inside the function, no copy need ever happen. But the “bad design” bit is that R makes a decision whether to copy on write based only on a reference count and crucially that reference count stays increased even after a function returns, irrespective of whether or not the function has touched the variable.

So:

x <- matrix(0, nrow=1e5, ncol=100) # matrix has ref count 1

x[1,1] <- 1 # ref count is 1, so write with no copy

nrow(x) # ref count is 2 even though nothing was touched

x[1,1] <- 1 # ref count still 2, so R copies before writing first element. Now the ref count drops to 1 again

x[2,2] <- 1 # this writes without a copy as ref count got reset on last line

nrow(x) # ref count jumps

x[3,3] <- 1 # copy invoked again! Aaaargh!So by calling nrow in the loop for the first example, the chain matrix is being copied in full on every iteration. In the second example, chain is never written to so there is no negative side effect to the ref count having gone up. In the third example, chain only ever has ref count 1 so there are no copies and each row is written in-place. I did a quick bit of profiling and indeed in the slow example, the R garbage collector allocates and tidies up nearly 9GB of RAM when executing the loop!

The crazy thing is that dim(chain)[1] works full speed even though that is all that nrow is doing under the hood, but the reason is that dim is a so-called “primitive” core R function which is special because it doesn’t affect the reference counter of its arguments. If you want to dig into this yourself, there’s a function refs() in the pryr package which tells you the current reference count to any variable.

Thanks Louis!

## Bayesian model comparison with vague or improper priors

Hi,

With Stephane Shao, Jie Ding and Vahid Tarokh we have just arXived a tech report entitled “Bayesian model comparison with the Hyvärinen score: computation and consistency“. Here I’ll explain the context, that is, scoring rules and Hyvärinen scores (originating in Hyvärinen’s score matching approach to inference), and then what we actually do in the paper.

leave a comment