This week I’ll start my Bayesian Statistics master’s course at the Collegio Carlo Alberto. I realized that some of last year students got PhD positions in prestigious US universities. So I thought that letting this year’s students have a first grasp of some great Bayesian papers wouldn’t do harm. The idea is that in addition to the course, the students will pick a paper from a list and present it (or rather part of it) to the others and to me. Which will let them earn some extra points for the final exam mark. It’s in the spirit of Xian’s Reading Classics Seminar (his list here).

I’ve made up the list below, inspired by two textbooks references lists and biased by personal tastes: Xian’s Bayesian Choice and Peter Hoff’s First Course in Bayesian Statistical Methods. See the pdf list and zipped folder for papers. Comments on the list are much welcome!

Julyan

PS: reference n°1 isn’t a joke!

]]>

*Hello all,*

*This is an article intended for the ISBA bulletin, jointly written by us all at Statisfaction, Rasmus Bååth from Publishable Stuff, Boris Hejblum from Research side effects, Thiago G. Martins from tgmstat@wordpress, Ewan Cameron from Another Astrostatistics Blog and Gregory Gandenberger from gandenberger.org. *

Inspired by established blogs, such as the popular Statistical Modeling, Causal Inference, and Social Science or Xi’an’s Og, each of us began blogging as a way to diarize our learning adventures, to share bits of R code or LaTeX tips, and to advertise our own papers and projects. Along the way we’ve come to a new appreciation of the world of academic blogging: a never-ending international seminar, attended by renowned scientists and anonymous users alike. Here we share our experiences by weighing the pros and cons of blogging from the point of view of young researchers.

At least at face value blogging has some notable advantages over traditional academic communication: publication is instantaneous and thus proves efficient in sparking discussions and debates; it allows all sorts of technological sorcery (hyperlinks, animations, applications), while many journals are still adapting to grayscale plots; and it allows for humorous and colourful writing styles, freeing the writer from the constraints of the impersonal academic prose. Last but not least, it is acceptable to blog about almost any topic, from office politics to funding bodies, from complaints about the absurdity of p-values to debates on the net profits of publishing companies, not to mention quarrels about the term “data science”.

For young researchers, some aspects are particularly appealing. By putting academics directly in touch with one another through comments and replies, young researchers are given the opportunity to “talk” directly on technical subjects to some of the most renowned names in their fields—and indeed a surprising number of senior researchers are avid blog readers! This often proves much more efficient than trying to awkwardly stalk the same professors at conferences. Through such interactions, young academics can show off their many interests and skills, which can do much to fill out the picture painted by their academic CV.

Beyond those low and careerist considerations, we see blogging as a good tool to learn and to share scientific ideas. According to popular belief, only a third of all started research projects end up in a publication; but all of them can at least end up on a blog. So if you indulge in a bit of off-topic study or burn a few hours playing around with a new methodology it need not fuel your performance anxiety: a blog post explaining it will still feel like a delivered product. And you will very likely get some interesting feedback—though rarely to the depth given in journal reviews.

Finally, using blogs to advertise articles and packages seems particularly useful at the early stage of a career, where you might not be invited to that many conferences, or might only be given some dark corner of a giant poster session to talk about your work.

Some cautionary notes now, blogging can be risky! As the adage goes, “better to keep your mouth shut and appear a fool than to open it and remove all doubt”. Beyond the quality of the content being shared, blogs are also sometimes disregarded by academics as a frivolous medium; there is a risk that your colleagues will see your blogging hobby as a pure waste of time.

A second risk is to disclose too much information about promising research leads. There should be some balance between ideas shared and ideas kept secret, so that blogging does not jeopardize publication. Other platforms that formally establish precedence (such as arXiv) might be better suited for the initial presentation of new and exciting work. For this reason it seems wisest to blog a posteriori, though the interest of these blogs will be less than their potential to function as real-time research diaries.

A third risk is genuine time-wasting. For those who have never tried, it can be surprising to discover how many hours are needed to write each post. It can be frustrating in the beginning when reader statistics indicate an audience of just one or two spam-bots and some curious relatives. On the other hand there are still a limited number of academic blogs on statistics so far, so the market is far from saturation: any new blog can quickly garner a decent amount of attention. Of course it can be hard to keep a regular posting schedule, which is necessary to maintain a stable reading base.

To conclude, blogging can be a clever way to bypass the hierarchical structure of academia. It gives everyone a direct and fast access to everyone else. In some respects it helps to alleviate key problems affecting young researchers, such as the lengthy reviewing process of top journals and the lack of communication space.

]]>

but in the RSS version, it reads

.

Well, that’s a bummer. For now, I recommend anyone to read instead the arxiv version (updated on Monday).

]]>

Almost 10 months since my latest post? I guess bloggin’ ain’t my thing… In my defense, Mathieu Gerber and I were quite busy revising our SQMC paper. I am happy to announce that it has just been accepted as a read paper in JRSSB. If all goes as planned, we should present the paper at the RSS ordinary meeting on Dec 10. Everybody is welcome to attend, and submit an oral or written discussion (or both). More details soon, when the event is officially announced on the RSS web-site.

What is SQMC? It is a QMC (Quasi-Monte Carlo) version of particle filtering. For the same CPU cost, it typically generates much more accurate estimators. Interested? consider reading the paper here (more recent version coming soon), checking this video where I present SQMC, or, even better, attending our talk in London!

]]>

where is a kernel, and the mixing distribution is random and discrete (Bayesian nonparametric approach).

We consider the survival function which is recovered from the hazard rate by the transform

and some possibly censored survival data having survival . Then it turns out that all the posterior moments of the survival curve evaluated at any time can be computed.

The nice trick of the paper is to use the representation of a distribution in a [Jacobi polynomial] basis where the coefficients are linear combinations of the moments. So one can sample from [an approximation of] the posterior, and with a posterior sample we can do everything! Including credible intervals.

I’ve wrapped up the few lines of code in an R package called momentify (not on CRAN). With a sequence of moments of a random variable supported on [0,1] as an input, the package does two things:

- evaluates the approximate density
- samples from it

A package example for a mixture of beta and 2 to 7 moments gives that result:

]]>

Hey hey,

With Alexandre Thiéry we’ve been working on non-negative unbiased estimators for a while now. Since I’ve been talking about it at conferences and since we’ve just arXived the second version of the article, it’s time for a blog post. This post is kind of a follow-up of a previous post from July, where I was commenting on Playing Russian Roulette with Intractable Likelihoods by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade.

The setting is the combination of two components.

**1°)** There are techniques to “debias” consistent estimators. Consider a sequence converging to in the sense . Introduce an integer-valued random variable and the survival probabilities . Then the random variable is an unbiased estimator of , i.e. its expectation is . Under additional assumptions it has a finite variance and a finite expected computational time… wow. We’ve just removed the bias off a sequence of biased estimators. We’ve reached the limit, we’ve reached infinity, we’re beyond heaven. That random truncation trick has been invented and reinvented (from Von Neumann and Ulam!) over the years but the most thorough and general study is found in Rhee & Glynn (2013). See for instance Rychlik (1990) for an early example of the same trick.

**2°)** Now, since there’s one way to debias estimators, there might be others. In particular there might be some way to remove the bias *and* to guarantee some positivity constraint. That is, assume now that is in . We might want to have an unbiased estimator of that takes almost surely non-negative values. A motivating example is precisely the Russian Roulette paper mentioned above, and in general the pseudo-marginal methods. With those methods we can perform “exact inference” on a posterior distribution, as long as we have access to non-negative unbiased estimators of its probability density function point-wise evaluations.

Our results identify cases where non-negative unbiased estimators can be obtained, in the following sense. For instance, assume that we have access to a real-valued unbiased estimator of , from which we can draw independent copies. We show that there is no algorithm taking those estimators as input and producing almost surely non-negative unbiased estimators of that . So that it’s impossible to “positivate” an unbiased estimator just like that. To prove such a result we rely on a precise definition of algorithm, which we believe is not restrictive.

More generally we show that if we have unbiased estimators of and want to obtain non-negative unbiased estimators of for some function , well that’s impossible in general. We are sorry.

However if you have an unbiased estimator of taking values in an interval , then it can be possible to have a non-negative unbiased estimator of , depending on the function considered, and in this case the problem is very much related to the Bernoulli Factory problem of Von Neumann (again! Damn you v.N.). In other words, if you have more knowledge on your unbiased estimator used as input (in this case lower and upper bounds), the problem might have a solution. In practice this type of knowledge would be model specific.

When there isn’t any non-negative unbiased estimators available, pseudo-marginal methods cannot be directly applied. Since those methods have proven very successful in some important areas such as hidden Markov models, we believe it’s interesting to characterize the other settings in which they might be applied. In the paper we discuss exact simulation of diffusions, inference for big data, doubly intractable distribution and inference based on reference priors. In those fields (at least the first three) people have tried to come up with general non-negative unbiased estimators, so we hope to save them some time!

]]>

Hey there,

It’s been a while I haven’t written about parallelization and GPUs. With colleagues Lawrence Murray and Anthony Lee we have just arXived a new version of Parallel resampling in the particle filter. The setting is that, on modern computing architectures such as GPUs, thousands of operations can be performed in parallel (i.e. simultaneously) and therefore the rest of the calculations that cannot be parallelized quickly becomes the bottleneck. In the case of the particle filter (or any sequential Monte Carlo method such as SMC samplers), that bottleneck is the resampling step. The article investigates this issue and numerically compares different resampling schemes.

In the resampling step, given a vector of “weights” (non-negative real numbers), a vector of integers called “offspring counts”, , is drawn such that for all , . That is, in average a particle has a number of offprings proportional to its normalized weight. Most implementations of the resampling step require a collective operation, such as computing the sum of the weights to normalize them. On top of being a collective operation, computing the sum of the weights is not a numerically stable operation, if the weight vector is very large. Numerical results in the article show that in single precision floating point format (as preferred for fast execution on the GPU) and for vectors of size half a million or more, a typical implementation of the resampling step (multinomial, residual, systematic…) exhibits a non-negligible bias due to numerical instability.

Two resampling strategies come to the rescue: Metropolis and Rejection resampling. These methods, described in details in the article, rely only on pair-wise weight comparisons and thus 1) are numerically stable and 2) bypass collective operations. Interestingly enough, the Metropolis resampler is theoretically biased but, when numerical stability is taken into account in single precision, proves “less biased” than the traditional resampling strategies (which are theoretically unbiased!), again when using half a million particles or more. It’s not too crazy to imagine that particle filters will soon be commonly run with millions of particles, hence the interest of studying the behaviour of resampling schemes in that regime.

Other practical aspects of resampling implementations are discussed in the article, such as whether the resampling step should be done on the CPU or on the GPU, taking into account the cost of copying the vectors into memory. Decision matrices are given (figure above), giving some indication on which is the best strategy in terms of performing resampling on CPU or GPU, and which resampling scheme to use.

All the numerical results of the article can be reproduced using the Resampling package for Libbi.

]]>

library(wesanderson) # on CRAN library(RShapeTarget) # available on https://github.com/pierrejacob/RShapeTarget/ library(PAWL) # on CRAN

Let’s invoke the *moustarget* distribution.

shape <- create_target_from_shape( file_name=system.file(package = "RShapeTarget", "extdata/moustache.svg"), lambda=5) rinit <- function(size) matrix(rnorm(2*size), ncol = 2) moustarget <- target(name = "moustache", dimension = 2, rinit = rinit, logdensity = shape$logd, parameters = shape$algo_parameters)

This defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on and is proportional to on the segments described in the SVG files, and decreases exponentially fast to away from the segments. The density function of the *moustarget* is plotted below, a picture being worth a thousand words.

ranges <- apply(shape$bounding_box, 2, range) gridx <- seq(from=ranges[1,1], to=ranges[2,1], length.out=300) gridy <- seq(from=ranges[1,2], to=ranges[2,2], length.out=300) grid.df <- expand.grid(gridx, gridy) grid.df$logdensity <- moustarget@logdensity(cbind(grid.df$Var1, grid.df$Var2), moustarget@parameters) names(grid.df) <- c("x", "y", "z") g2d <- ggplot(grid.df) + geom_raster(aes(x=x, y=y, fill=exp(z))) + xlab("X") + ylab("Y") g2d <- g2d + xlim(ranges[,1]) + ylim(ranges[,2]) pal <- wes.palette(name = "GrandBudapest", type = "continuous") g2d <- g2d + scale_fill_gradientn(name = "density", colours = pal(50)) g2d <- g2d + theme(legend.position = "bottom", legend.text = element_text(size = 10)) g2d

There are various interesting aspects to note about this distribution. First it is very multi-modal and strongly non-Gaussian, thus providing an interesting toy problem for testing MCMC algorithms. Furthermore, sampling from the *moustarget* can be made arbitrarily difficult by pulling the moustache down, thus separating the moustache mode from the remaining probability mass around the eyes, ears and hat. Finally, note that the colours chosen to represent the density above approximately match the principal colours used in Grand Budapest Hotel by Wes Anderson. This is thanks to the awesome wesanderson package on CRAN. Obviously it is now very tempting to launch the Wang-Landau algorithm on this target with a spatial binning strategy, in order to try out the various palettes provided in wesanderson.

mhparameters <- tuningparameters(nchains = 10, niterations = 10000, storeall = TRUE) getPos <- function(points, logdensity) points[,2] explore_range <- c(-700,0) ncuts <- 20 positionbinning <- binning(position = getPos, name = "position", binrange = explore_range, ncuts = ncuts, useLearningRate = TRUE, autobinning = FALSE) pawlresults <- pawl(target = moustarget, binning = positionbinning, AP = mhparameters, verbose = TRUE) pawlchains <- ConvertResults(pawlresults, verbose = FALSE) locations <- positionbinning@getLocations(pawlresults$finalbins, pawlchains$X2) pawlchains$locations <- factor(locations) g <- ggplot(subset(pawlchains), aes(x=X1, y = X2, alpha = exp(logdens), size = exp(logdens), colour = locations)) + geom_point() + theme(legend.position="none") + xlab("X") + ylab("Y") g <- g + geom_hline(yintercept = pawlresults$finalbins) pal <- wes.palette(name = "GrandBudapest", type = "continuous") print(g + scale_color_manual(values = pal(21)) + labs(title = "Moustarget in Grand Budapest colours")) pal <- wes.palette(name = "Darjeeling", type = "continuous") print(g + scale_color_manual(values = pal(21)) + labs(title = "Moustarget in Darjeeling colours")) pal <- wes.palette(name = "Zissou", type = "continuous") print(g + scale_color_manual(values = pal(21)) + labs(title = "Moustarget in Zissou colours"))

]]>

Hey,

There’s a nice exhibition open until May 26th at the British Library in London, entitled Beautiful Science: Picturing Data, Inspiring Insight. Various examples of data visualizations are shown, either historical or very modern, or even made especially for the exhibition. Definitely worth a detour if you happen to be in the area, you can see everything in 15 minutes.

In particular there are nice visualisations of historical climate data, gathered from the logbooks of the English East India company, whose ships were crossing every possible sea in the beginning of the 19th century. The logbooks contain locations and daily weather reports, handwritten by the captains themselves. Turns out the logbooks are kept at the British Library itself and some of them are on display at the exhibition. More info on that project here: oldweather.org.

]]>

Besides having coded a pretty cool MCMC app in Javascript, this guy Rasmus Bååth has started the Bayesian first aid project. The idea is that if there’s an R function called **blabla.test** performing test “blabla”, there should be a function **bayes.blabla.test** performing a similar test in a Bayesian framework, and showing the output in a similar way so that the user can easily compare both approaches.This post explains it all. Jags and BEST seem to be the two main workhorses under the hood.

Kudos to Rasmus for this very practical approach, potentially very impactful. Maybe someday people will have to specify if they want a frequentist approach and not the other way around! (I had a dream, etc).

]]>