Googling Bayes’ pictures
I am writing way too many posts in a row on Google tools. I promise I will think about something else soon.
I find amusing the possibility to launch a search in Google images by just dragging a picture into the search box, instead of typing text. I remember that Pierre told me about it a long time ago, but this is the first time I play with it.
For example (easy) it will recognize the (supposedly) picture of our celebrated English mathematician (note that I used an objective picture name name3.jpg, at least unbiased):


More involved is to try with a character picture. Launching a picture of Bayes’ Theorem suggests as “Best guest for this image” bayes theorem (itself!), and as first entry its Wikipedia page. So it gets it all right as with Bayes picture:


Actually, there is a trick here because the image is taken from the Internet on a page which links to Wikipedia. So I guess it uses more this piece of information than any character recognition of the formula. When the picture can not be found on the Internet, e.g. with a LaTeX typed formula, the result is not that clear. So it does not really beat the (not convincing) LaTeX search offered by Springer which looks for academic papers with some given LaTeX code.
You might also try to recover an R package from CRAN by using a plot that it produces (who knows?). For example, it will tell you that this kernel density plot produced with the diamonds dataset can be found on the ggplot2 page about geom_histogram:

World Tourism Day, and Google Public Data Explore
Today is the World Tourism Day! So let’s speak about some tourism related datasets – and others.
Among other nice functions, Google offers a Public Data Explore in a beta version which provides a collection of datasets from OECD, IMF, Eurostat, World Bank, US Census Bureau, etc (cf. our datasets page as well). It is possible to plot these data directly online, with the following (limited) types: lines, barplots, maps and scatterplots.

The page reads “Data visualizations for a changing world“… nothing less! As someone writes on Andrew’s blog, it reminds a lot of Hans Rosling work with Gapminder‘s motion charts: “Unveiling the beauty of statistics for a fact based world view“.
It is really easy to use, and a good opportunity for math highschool teachers to show nice graphs to students before they learn how to use R. The pointer to R is straightforward as it displays the same plots as the googleVis R package. For example for Tourism data, the number of nights spent in European countries looks like this in 2009 (click for getting the motion chart version!)
The barplot goes like this:
Density exploration and Wang-Landau algorithms [with R package]

Hey,
Since a new paper that I’ve co-written has appeared on arXiv, here is a quick post summarizing it. The paper is named:
An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration
and describes improvements over the Wang-Landau algorithm described by Atchadé and Liu, which is itself a generalization of the work of Wang and Landau (which itself is commonly used in computational physics and well deserves its Wikipedia article). It is joint work with Luke Bornn from UBC, Arnaud Doucet from Oxford and Pierre Del Moral from INRIA Bordeaux.
We focus on the utility of this kind of “exploratory” algorithms for statistical inference. More precisely, suppose that you want to explore (e.g. by simulating from) a complex probability distribution, complex in the sense that most standard algorithms would fail to sample precisely from it, either because it is defined on a very high dimensional state space, or because there are many local modes. The general idea of the Wang-Landau algorithm is that the parts of the state space that have been explored already (e.g. the Markov chain generated by the algorithm has spent some time there) are “penalized”, so that they are less likely to be explored further. Hence, the other parts of the space are “favoured”. That goes on until all the parts (in some sense) have been explored enough. A pretty natural idea!
Now defining these “parts” of the space is a bit tricky, and has proved to be a limit of the method: if the parts are not well designed, the Wang-Landau algorithm does not provide very useful results compared to classical MCMC methods. Part of our work was to design a somewhat “automatic” method to partition the space dynamically during the algorithm, so that the generated chains explore the whole space of interest indeed. Other improvements include the use of parallel chains and adaptive proposal distributions. On top of algorithmic improvements, we argue for a more general use of exploratory methods to prevent getting stuck in local modes without even noticing it.
We try our improved Wang-Landau on four models: a toy example involving a mixture of three bi-variate Gaussian distributions, a variable selection problem, a challenging mixture model with lots of well-separated modes (that’s the figure above), and an Ising model applied to image analysis, where the difficulty is to design efficient moves to travel around the very big space. In all cases we compare to adaptive MCMC with parallel chains, matching the computational cost in terms of target density evaluations.
To make things easier, we provide an R package (my very first!) here:
Hence the graphs of the article can be easily reproduced by launching a few files. We hope that it can also be easily adapted to a lot of target distributions, if you want to try the method on your model! We’ll probably submit to CRAN after some clean-up.
I’m also working on more theoretical aspects of the Wang-Landau algorithm with Robin Ryder, so I’ll blog more on that soon.
Google Trends Gadget and French Socialists

(Wikimedia image)
A primary election for selecting the Socialist candidate for the next presidential election will take place on October 9 and 16. There are 6 candidates: Martine Aubry, Jean-Michel Baylet, Arnaud Montebourg, François Hollande, Ségolène Royal and Manuel Valls… yes indeed, Dominique Strauss-Kahn will not enter the competition!
The first TV debate, Des paroles et des actes – Le débat des primaires, is this evening, 20:35 Paris time, forecasted on France 2 and online. And it’s going to be amusing, if not thought-provoking.
Last polls (BVA) are as follows: 47% Hollande, 29% Aubry, 12% Ségolène Royal, 6% Montebourg, 4% Valls and 1% Baylet (please do not ask for error margins).
To have an insight of the favorite Google searches (among French people), we can use the embeddable Google Trends Gadget. Nice features include daily updates, and choice of different time periods with one click. But there is no label on time axis, shame.
It does not work properly in WordPress yet, too bad! Or am I not using it right? Instead of the embeddable version, the classic one still does the job. Yet, quick and dirty copy/paste of the Gadget for the major 3 candidates :
1 month period

1 year period

“max time” period

It’s curious (suspicious?) to see that the curves are not exactly consistent across time periods. The reason is probably explained in Google Trends math.
Aubry outperformed the others for the last few months, but Hollande seems indeed to gain power these days. Royal vanishes (!) with the “last month” tab. On the “max” tab, she performs well in 2007, as expected. Let’s see what happens in mid-October!
UPDATE: actually WordPress does not allow all HTML tags for security reasons. Banned tags include: embed, frame, iframe, form, input, object, textarea, including iframe which is used for the Google Trend Gadget.
WordPress Blogs on iPad
Hey, good news, it is possible to follow Statisfaction on iPads! In a nice and non standard way I mean. The great blog tool WordPress provides a neat display, where it is possible to customize a Cover Logo and a Launch Screen Image:

Don’t forget to tune this in the Appearance tab if you have a WordPress blog. It displays the posts app-likely :

And for sure, there is a still the mobile version as well, for example on an iPhone:

Enjoy Statisfaction in your pocket !
BISP7 in Madrid

Hey there,
I am currently attending Bayesian Inference in Stochastic Processes 7, hosted by Universidad Carlos III de Madrid.
I am going to talk (for a very short 15 minutes) about SMC^2 (arXiv link, google code link) on Saturday. Looking at the conference’s program, I am definitely hoping to interact on closely-related topics with the other participants. In particular, there seems to be other fans of pseudo-marginal MCMC approaches, Particle MCMC and random weight particle filters. It seems to me that these are becoming the building blocks of many possible new sampling methods, which are still left to discover and to study. Pretty exciting!
Hasta luego!
EDIT: The conference is over and it was amazing. Here are my slides, if you’re interested in a very short introduction to SMC^2:


2 comments