# Statisfaction

## [Meta-]Blogging as young researchers

Posted in General, Statistics by Pierre Jacob on 11 December 2014

Hello all,

This is an article intended for the ISBA bulletin, jointly written by us all at Statisfaction, Rasmus Bååth from Publishable Stuff, Boris Hejblum from Research side effects, Thiago G. Martins from tgmstat@wordpress, Ewan Cameron from Another Astrostatistics Blog and Gregory Gandenberger from gandenberger.org

Inspired by established blogs, such as the popular Statistical Modeling, Causal Inference, and Social Science or Xi’an’s Og, each of us began blogging as a way to diarize our learning adventures, to share bits of R code or LaTeX tips, and to advertise our own papers and projects. Along the way we’ve come to a new appreciation of the world of academic blogging: a never-ending international seminar, attended by renowned scientists and anonymous users alike. Here we share our experiences by weighing the pros and cons of blogging from the point of view of young researchers.

## Non-negative unbiased estimators

Posted in Statistics by Pierre Jacob on 13 May 2014

Benedict has to choose between unbiasedness and non-negativity.

Hey hey,

With Alexandre Thiéry we’ve been working on non-negative unbiased estimators for a while now. Since I’ve been talking about it at conferences and since we’ve just arXived the second version of the article, it’s time for a blog post. This post is kind of a follow-up of a previous post from July, where I was commenting on Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade.

## Parallel resampling in the particle filter

Posted in Statistics by Pierre Jacob on 12 May 2014

Decisions decisions… which resampling to use on the CPU (left), GPU (middle), or between both (right). The y-axis essentially parametrizes the expected variance of the weights (high values = “we observe an outlier” = high variance of the weights).

Hey there,

It’s been a while I haven’t written about parallelization and GPUs. With colleagues Lawrence Murray and Anthony Lee we have just arXived a new version of Parallel resampling in the particle filter. The setting is that, on modern computing architectures such as GPUs, thousands of operations can be performed in parallel (i.e. simultaneously) and therefore the rest of the calculations that cannot be parallelized quickly becomes the bottleneck. In the case of the particle filter (or any sequential Monte Carlo method such as SMC samplers), that bottleneck is the resampling step. The article investigates this issue and numerically compares different resampling schemes.

## Moustache target distribution and Wes Anderson

Posted in Art, Geek, R by Pierre Jacob on 31 March 2014

Today I am going to introduce the moustache target distribution (moustarget distribution for brievety). Load some packages first.

library(wesanderson) # on CRAN
library(RShapeTarget) # available on https://github.com/pierrejacob/RShapeTarget/
library(PAWL) # on CRAN


Let’s invoke the moustarget distribution.

 shape <- create_target_from_shape(
file_name=system.file(package = "RShapeTarget", "extdata/moustache.svg"),
lambda=5)
rinit <- function(size) matrix(rnorm(2*size), ncol = 2)
moustarget <- target(name = "moustache", dimension = 2,
rinit = rinit, logdensity = shape$logd, parameters = shape$algo_parameters)


This defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on $\mathbb{R}^2$ and is proportional to $1$ on the segments described in the SVG files, and decreases exponentially fast to $0$ away from the segments. The density function of the moustarget is plotted below, a picture being worth a thousand words.

## Beautiful Science: Picturing Data, Inspiring Insight

Posted in General by Pierre Jacob on 19 March 2014

Hey,

There’s a nice exhibition open until May 26th at the British Library in London, entitled Beautiful Science: Picturing Data, Inspiring Insight. Various examples of data visualizations are shown, either historical or very modern, or even made especially for the exhibition. Definitely worth a detour if you happen to be in the area, you can see everything in 15 minutes.

In particular there are nice visualisations of historical climate data, gathered from the logbooks of the English East India company, whose ships were crossing every possible sea in the beginning of the 19th century. The logbooks contain locations and daily weather reports, handwritten by the captains themselves. Turns out the logbooks are kept at the British Library itself and some of them are on display at the exhibition. More info on that project here: oldweather.org.

## Rasmus Bååth’s Bayesian first aid

Posted in Project, R, Statistics by Pierre Jacob on 23 January 2014

Besides having coded a pretty cool MCMC app in Javascript, this guy Rasmus Bååth has started the Bayesian first aid project. The idea is that if there’s an R function called blabla.test performing test “blabla”, there should be a function bayes.blabla.test performing a similar test in a Bayesian framework, and showing the output in a similar way so that the user can easily compare both approaches.This post explains it all. Jags and BEST seem to be the two main workhorses under the hood.

Kudos to Rasmus for this very practical approach, potentially very impactful. Maybe someday people will have to specify if they want a frequentist approach and not the other way around! (I had a dream, etc).

## MCM’Ski lessons

Posted in Seminar/Conference by Pierre Jacob on 16 January 2014

A few days after the MCMSki conference, I start to see the main lessons gathered there.

1. I should really read the full program before attending the next MCMSki. The three parallel sessions looked consistently interesting, and I really regret having missed some talks (in particular Dawn Woodard‘s and Natesh Pillai‘s) and some posters as well (admittedly, due to exhaustion on my part).
2. Compared to the previous instance three years ago (in Utah), the main themes have significantly changed. Scalability, approximate methods, non-asymptotic results, 1/n methods … these keywords are now on everyone’s lips. Can’t wait to see if MCQMC’14 will feel that different from MCQMC’12.
3. The community is rightfully concerned about scaling Monte Carlo methods to big data, with some people pointing out that models should also be rethought in this new context.
4. The place of software developers in the conference, or simply references to software packages in the talks, is much greater than it used to be. It’s a very good sign towards reproducible research in our field. There’s still a lot of work to do, in particular in terms of making parallel computing easier to access (time to advertise LibBi a little bit). On a related note, many people now point out whether their proposed algorithms are parallel-friendly or not.
5. Going from the Rockies to the Alps, the food drastically changed from cheeseburgers to just melted cheese. Bread could be found but ground beef and Budweiser were reported missing.
6. It’s fun to have an international conference in your home country, but switching from French to English all the time was confusing.

Back in flooded Oxford now!

## Dennis Lindley (1923-2013)

Posted in General by Pierre Jacob on 16 December 2013

I’ve just heard this sad piece of news. Definitely one of the greatest statisticians of the last 50 years. Wished I’d had met him in person.

Originally posted on Xi'an's Og:

Dennis Lindley most sadly passed away yesterday at the hospital near his home in Somerset. He was one of the founding fathers of our field (of Bayesian statistics), who contributed to formalise Bayesian statistics in a coherent theory. And to make it one with rational decision-making, a perspective missing in Jeffreys’ vision. (His papers figured prominently in the tutorials we gave yesterday for the opening of O’Bayes 250.) At the age of 90, his interest in the topic had not waned away: as his interview with Tony O’Hagan last Spring showed, his passionate arguing for the rationale of the Bayesian approach was still there and alive! The review he wrote of The Black Swan a few years ago also demonstrated he had preserved his ability to see through bogus arguments. (See his scathing “One hardly advances the respect with which statisticians are held in society by making…

View original 142 more words

## Singapore –> Oxford

Posted in General by Pierre Jacob on 3 October 2013

A quick post to say that I’m moving from Singapore to Oxford, UK. I will dearly miss my Singaporean colleagues, as well as my morning Laksa and Nasi lemak. I look forward to the skying season though.

I will work for the next two years as a post-doc with Professors Arnaud Doucet and Yee Whye Teh, on sequential Monte Carlo methods for high-dimensional problems.

Former office mate Alex Thiery is still in Singapore and will start blogging here soon, so we’ll still have two continents covered. Still looking for contributors in the other ones!

## Clone wars inside the uniform random variable

Posted in General by Pierre Jacob on 25 September 2013

Hello,

In a recent post Nicolas discussed some limitation of pseudo-random number generation. On a related note there’s  a feature of random variables that I find close to mystical.

In an on-going work with Alex Thiery, we had to precisely define the notion of randomized algorithms at some point, and we essentially followed Keane and O’Brien [1994] (as it happens there’s an article today on arXiv that also is related, maybe, or not). The difficulty comes with the randomness. We can think of a deterministic algorithm as a good old function mapping an input space to an output space, but a random algorithm  adds some randomness over a deterministic scheme (in an accept-reject step for instance, or a random stopping criterion), so that given fixed inputs the output might still vary. One way to formalise it consists in defining the algorithm as a deterministic function of inputs and of a source of randomness; that randomness is represented by  a single random variable $U$ e.g. following an uniform distribution.

The funny, mystical and disturbing thing is that a single uniform random variable is enough to represent an infinity of them. It sounds like an excerpt of the Vedas, doesn’t it? To see this, write a single uniform realization in binary representation. That is, for $U \in [0,1]$ write

$U = \sum_{k> 0} b_k 2^{-k}$

with $b_k = \mbox{floor}(2^k U) \mbox{ mod } 2$. The binary representation is $b_1b_2b_3b_4b_5\ldots$

Realization of an uniform random variable in binary representation

Now it’s easy to see that these zeros and ones are distributed as independent Bernoulli variables. Now we put these digits in a particular position, as follows.

Same zeros and ones ordered in a triangle of increasing size

If we take each column or each row from the grid above, they’re independent and they’re also binary representations of uniform random variables – you could also consider diagonals or more funky patterns. You could say that the random variable contains an infinity of independent clones.

This property actually sounds dangerous now, come to think of it. I think it was always well-known but people might not have made the link with Star Wars. In the end I’m happy to stick with harmless pseudo-random numbers, for safety reasons.