# Statisfaction

## [Meta-]Blogging as young researchers

Posted in General, Statistics by Pierre Jacob on 11 December 2014

Hello all,

This is an article intended for the ISBA bulletin, jointly written by us all at Statisfaction, Rasmus Bååth from Publishable Stuff, Boris Hejblum from Research side effects, Thiago G. Martins from tgmstat@wordpress, Ewan Cameron from Another Astrostatistics Blog and Gregory Gandenberger from gandenberger.org

Inspired by established blogs, such as the popular Statistical Modeling, Causal Inference, and Social Science or Xi’an’s Og, each of us began blogging as a way to diarize our learning adventures, to share bits of R code or LaTeX tips, and to advertise our own papers and projects. Along the way we’ve come to a new appreciation of the world of academic blogging: a never-ending international seminar, attended by renowned scientists and anonymous users alike. Here we share our experiences by weighing the pros and cons of blogging from the point of view of young researchers.

## momentify R package at BAYSM14

Posted in General, R, Seminar/Conference, Statistics by Julyan Arbel on 20 September 2014

I presented an arxived paper of my postdoc at the big success Young Bayesian Conference in Vienna. The big picture of the talk is simple: there are situations in Bayesian nonparametrics where you don’t know how to sample from the posterior distribution, but you can only compute posterior expectations (so-called marginal methods). So e.g. you cannot provide credible intervals. But sometimes all the moments of the posterior distribution are available as posterior expectations. So morally, you should be able to say more about the posterior distribution than just reporting the posterior mean. To be more specific, we consider a hazard (h) mixture model

$\displaystyle h(t)=\int k(t;y)\mu(dy)$

where $k$ is a kernel, and the mixing distribution $\mu$ is random and discrete (Bayesian nonparametric approach).

We consider the survival function $S$ which is recovered from the hazard rate $h$ by the transform

$\displaystyle S(t)=\exp\Big(-\int_0^t h(s)ds\Big)$

and some possibly censored survival data having survival $S$. Then it turns out that all the posterior moments of the survival curve $S(t)$ evaluated at any time $t$ can be computed.

The nice trick of the paper is to use the representation of a distribution in a [Jacobi polynomial] basis where the coefficients are linear combinations of the moments. So one can sample from [an approximation of] the posterior, and with a posterior sample we can do everything! Including credible intervals.

I’ve wrapped up the few lines of code in an R package called momentify (not on CRAN). With a sequence of moments of a random variable supported on [0,1] as an input, the package does two things:

• evaluates the approximate density
• samples from it

A package example for a mixture of beta and 2 to 7 moments gives that result:

## Non-negative unbiased estimators

Posted in Statistics by Pierre Jacob on 13 May 2014

Benedict has to choose between unbiasedness and non-negativity.

Hey hey,

With Alexandre Thiéry we’ve been working on non-negative unbiased estimators for a while now. Since I’ve been talking about it at conferences and since we’ve just arXived the second version of the article, it’s time for a blog post. This post is kind of a follow-up of a previous post from July, where I was commenting on Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade.

## Parallel resampling in the particle filter

Posted in Statistics by Pierre Jacob on 12 May 2014

Decisions decisions… which resampling to use on the CPU (left), GPU (middle), or between both (right). The y-axis essentially parametrizes the expected variance of the weights (high values = “we observe an outlier” = high variance of the weights).

Hey there,

It’s been a while I haven’t written about parallelization and GPUs. With colleagues Lawrence Murray and Anthony Lee we have just arXived a new version of Parallel resampling in the particle filter. The setting is that, on modern computing architectures such as GPUs, thousands of operations can be performed in parallel (i.e. simultaneously) and therefore the rest of the calculations that cannot be parallelized quickly becomes the bottleneck. In the case of the particle filter (or any sequential Monte Carlo method such as SMC samplers), that bottleneck is the resampling step. The article investigates this issue and numerically compares different resampling schemes.

## Rasmus Bååth’s Bayesian first aid

Posted in Project, R, Statistics by Pierre Jacob on 23 January 2014

Besides having coded a pretty cool MCMC app in Javascript, this guy Rasmus Bååth has started the Bayesian first aid project. The idea is that if there’s an R function called blabla.test performing test “blabla”, there should be a function bayes.blabla.test performing a similar test in a Bayesian framework, and showing the output in a similar way so that the user can easily compare both approaches.This post explains it all. Jags and BEST seem to be the two main workhorses under the hood.

Kudos to Rasmus for this very practical approach, potentially very impactful. Maybe someday people will have to specify if they want a frequentist approach and not the other way around! (I had a dream, etc).

## Pseudo-Bayes: a quick and awfully incomplete review

Posted in General, Statistics by nicolaschopin on 18 September 2013

You called me a pseudo-Bayesian. Prepare to die.

A recently arxived paper by Pier Bissiri, Chris Holmes and Steve Walker piqued my curiosity about “pseudo-Bayesian” approaches, that is, statistical approaches based on a pseudo-posterior:

$\pi(\theta) \propto p(\theta) \hat{L}(\theta)$

where $\hat{L}(\theta)$ is some pseudo-likelihood. Pier, Chris and Steve use in particular

$\hat{L}(\theta) = \exp\{ - \lambda*R_n(\theta,x) \}$

where $R_n(\theta,x)$ is some empirical risk function. A good example is classification; then $R_n(\theta,x)$ could be the proportion of properly classified points:

$R_n(\theta,x) = \sum_{i=1}^n \mathbf{I}(y_i\times f_{\theta}(x_i)\geq 0)$

where $f_{\theta}$ is some score function parametrised by $\theta$, and $y_i\in\{-1,1\}$. (Side note: I find the $-1/1$ ML convention for the $y_i$ more convenient than the $0/1$ stats convention.)

It turns out that this particular kind of pseudo-posterior has already been encountered before, but with different motivations:

•  Chernozhukov and Hong (JoE, 2003)  used it to define new Frequentist estimators based on moment estimation ideas (i.e. take $R_n$ above to be some empirical moment constraint). Focus is on establishing Frequentist properties of say the expectation of the pseudo-posterior. (It seems to me that few people have heard about this this paper in Stats).
• the PAC-Bayesian approach which originates from Machine Learning  also relies on this kind of pseudo-posterior. To be more precise, PAC-Bayes usually starts by minimising the upper bound of an oracle inequality within a class of randomised estimators. Then, as a result, you obtain as a possible solution, say, a single draw for the pseudo-posterior defined above.  A good introduction is this book by Olivier Catoni.
• Finally, Pier, Chris and Steve’s approach is by far the most Bayesian of these three pseudo-Bayesian approaches, in the sense that they try to maintain an interpretation of the pseudo-posterior as a representation on the uncertainty on $\theta$. Crudely speaking,  they don’t look only at the expectation, like the two approaches aboves, but also at the spread of the pseudo-posterior.

Let me mention briefly that quite a few papers have considered using other types of pseudo-likelihood in a pseudo-posterior, such as empirical likelihood, composite likelihood, and so on, but I will shamefully skip them for now.

To which extent this growing interest in “Pseudo-Bayes” should have an impact on Bayesian computation? For one thing, more problems to throw at our favourite algorithms should be good news. In particular, Chernozhukov and Hong mention the possibility to use MCMC as a big advantage for their approach, because typically the $L_n$ function they consider could be difficult to minimise directly by optimisation algorithms. PAC-Bayesians also seem to recommend MCMC, but I could not find so many PAC-Bayesian papers that go beyond the theory and actually implement it; an exception is this.

On the other hand, these pseudo posteriors might be quite nasty. First, given the way they are defined, they should not have the kind of structure that makes it possible to use Gibbs sampling. Second, many interesting choices for $R_n$ seem to be   irregular or multimodal. Again, in the classification example, the 0-1 loss function is typically not continuous. Hopefully the coming years will witness some interesting research on which computational approaches are more fit for pseudo-Bayes computation, but readers will not be surprised if I put my Euros  on (some form of) SMC!

Tagged with:

## A newcomer at Statisfaction

Posted in Statistics by nicolaschopin on 7 September 2013

I am Nicolas Chopin,  a Professor of Statistics at the ENSAE, and my colleagues and good friends that manage Statisfaction kindly agreed that I would join their blog. I work mostly on “Bayesian Computation”, i.e. Monte Carlo and non-Monte Carlo methods to compute Bayesian quantities; a strong focus of my research is on Sequential Monte Carlo (aka particle filters).

I don’t plan to blog very regularly, and only on stuff related to my research, at least in some way. Well, that’s the idea for now. Stay tuned!

Nicolas

## From SVG to probability distributions [with R package]

Posted in R, Statistics by Pierre Jacob on 25 August 2013

Hey,

To illustrate generally complex probability density functions on continuous spaces, researchers always use the same examples, for instance mixtures of Gaussian distributions or a banana shaped distribution defined on $\mathbb{R}^2$ with density function:

$f(x,y) = \exp\left(-\frac{x^2}{200} - \frac{1}{2}(y+Bx^2-100B)^2\right)$

If we draw a sample from this distribution using MCMC we obtain a [scatter]plot like this one:

Fig. 1: a sample from the very lame banana shaped distribution

Clearly it doesn’t really look like a banana, even if you use yellow to colour the dots like here. Actually it looks more like a boomerang, if anything. I was worried about this for a while, until I came up with a more realistic banana shaped distribution:

Fig. 2: a sample from the realistic banana shaped distribution

See how the shape is well defined compared to the first figure? And there’s even the little tail, that proves so convenient when we want to peel off the fruit. More generally we might want to create target density functions based on general shapes. For this you can now try RShapeTarget, which you can install directly from R using devtools:

library(devtools)


The package parses SVG files representing shapes, and creates target densities from them. More precisely, a SVG files contains “paths”, which are sequence of points (for instance the above banana is a single closed path). The associated log density at any point $x$ is defined by $-1/(2\lambda) \times d(x, P)$ where $P$ is the closest path of the shape from $x$ and $d(x,P)$ is the distance between the point and the path. The parameter $\lambda$ specifies the rate at which the density decays when the point goes away from the shape. With this you can define the maple leaf distribution, as a tribute to JSM 2013:

Fig. 3: a sample the “O Canada” probability distribution.

In the package you can get a distribution from a SVG file using the following code:

library(RShapeTarget)
# create target from file
my_shape_target <- create_target_from_shape(my_svg_file_name, lambda =1)
# test the log density function on 25 randomly generated points
my_shape_target$logd(matrix(rnorm(50), ncol = 2), my_shape_target$algo_parameters)


Since characters are just a bunch of paths, you can also define distributions based on words, for instance:

Hodor: Hodor.

which is done as follows (warning you’re only allowed a-z and A-Z, no numbers no space no punctuation for now):

library(RShapeTarget)
word_target <- create_target_from_word("Hodor")


For the words, I defined the target density function as before, except that it’s constant on the letters: so if a point is outside a letter its density is computed based on the distance to the nearest path; if it’s inside a letter it’s just constant, so that the letters are “filled” with some constant density. I thought it’d look better.

Now I’m not worried about the banana shaped distribution any more, but by the fact that the only word I could think of was “Hodor” (with whom you can chat over there).

## Joint Statistical Meeting 2013

Posted in General, Seminar/Conference, Statistics by Pierre Jacob on 23 July 2013

A typical statistical meeting.

Hey,

In a few weeks (August 3-8) I’ll attend the Joint Statistical Meeting in Montréal, Canada. According to Wikipedia it’s been held every year since 1840 and now gathers more than 5,000 participants!

I’ll talk in a session organized by Scott Schmidler, entitled Adaptive Monte Carlo Methods for Bayesian Computation; you can find the session programme here [online program]. I’ll talk about score and Fisher observation matrix estimation in state-space models.

According to the rumour and Christian’s reflections on the past years (2009, 2010, 2011), I should prepare my schedule in advance to really enjoy this giant meeting. So if you want to meet there, please send me an e-mail!

See you in Montréal!

## Path storage in the particle filter

Posted in Statistics by Pierre Jacob on 12 July 2013

Typical ancestry tree generated by a particle filter

Hey particle lovers,

With Lawrence Murray and Sylvain Rubenthaler we looked at how to store the paths in the particle filter, and the related expected memory cost. We just arXived a technical report about it. Would you like to know more?