## The Wang-Landau algorithm reaches the flat histogram in finite time.

*Cross-posted from my personal blog.*

MCMC practitioners may be familiar with the Wang-Landau algorithm, which is widely used in Physics. This algorithm divides the sample space into “boxes”. Given a target distribution, the algorithm then samples proportionally to the target in each box, while aiming at spending a pre-defined proportion of the sample in each box. (Usually these predefined proportions are uniform.)

This strategy can help move faster between modes of a distribution, by forcing the sample to visit often the space between modes.

The most sophisticated versions of this algorithm combine a decreasing stochastic schedule and the so-called flat histogram criterion: whenever the proportions of the sample in each box are close enough to the desired frequencies, the stochastic schedule decreases. A decreasing schedule is necessary for diminishing adaptation to hold.

Until now, it was unknown whether the flat histogram is necessarily reached in finite time, and hence whether the schedule ever starts decreasing.

Pierre and I just submitted and arXived a proof that the flat histogram is reached in finite time under some conditions, and may never be reached in other cases.

## Talk at CREST seminar on language diversification

Below are the slides from my talk earlier today at the PhD students’ seminar (which, for the occasion, I might refer to as the CREST young researchers’ seminar – I am sure our previous speaker will not mind!). The slides, in French, are about my DPhil with Geoff Nicholls on *Phylogenetic Models of Language Diversification*.

Relevant papers:

## TED talks

At the PhD students’ seminar last Thursday, there was a brief discussion about TED talks, which are apparently not as well known as I thought. The TED conferences gather together brilliant people from very different areas, who have 20 minutes to present their work to a lay audience. The talks are often of very high quality and are put online a few months later (under a free Creative Commons licence).

One of the best stats talks has got to be Hans Rosling, who does amazing things with data visualization:

Another enjoyable visualization talk is David McCandless showing graphs from Information is Beautiful:

Peter Donnelly also gave a TED talk on how difficult it is to interpret even simple statistics, but it is not the best talk. He made similar points when I followed his Introduction to Statistics lecture in Oxford, but they were much more convincing when he had more time to expand on them.

## Toponymy and localisation

*Cross-posted from my personal blog.*

Since this is my first post here, a quick introduction is in order: I am a postdoc at CREST and CEREMADE (Paris Dauphine). I have a personal blog, but will be cross-posting here as well.

As Julyan already mentioned, Master’s level students at ENSAE are required to work on an Applied Statistics project. Céline Duval, Pierre Jacob and myself are proposing a topic on correlation between toponymy (place names) and geographic localisation of French towns and cities (“communes”, of which there are 36,000). The idea is to detect which characteristics of a place name are informative of where in France that place is.

For example, some endings are very informative: on this map where every cyan dot represents one commune, place names ending in “-y” are mostly found in the North-East and place names ending in “-ac” in the South-West.

The following map shows that a “w” in a place indicates it lies in the North-East, and that a “k” indicates that it lies in the North-East or Brittany.

Just for fun, here is a map of place names which include one of the major rivers: Loire, Seine, Rhône, Saône, Garonne and Marne. Unsurprisingly, they align almost perfectly with the rivers.

Students interested in the project can contact Céline, Pierre or myself. People interested in the project might enjoy Keith Brigg’s pages on English place names.

1comment