A few days after the MCMSki conference, I start to see the main lessons gathered there.
- I should really read the full program before attending the next MCMSki. The three parallel sessions looked consistently interesting, and I really regret having missed some talks (in particular Dawn Woodard‘s and Natesh Pillai‘s) and some posters as well (admittedly, due to exhaustion on my part).
- Compared to the previous instance three years ago (in Utah), the main themes have significantly changed. Scalability, approximate methods, non-asymptotic results, 1/n methods … these keywords are now on everyone’s lips. Can’t wait to see if MCQMC’14 will feel that different from MCQMC’12.
- The community is rightfully concerned about scaling Monte Carlo methods to big data, with some people pointing out that models should also be rethought in this new context.
- The place of software developers in the conference, or simply references to software packages in the talks, is much greater than it used to be. It’s a very good sign towards reproducible research in our field. There’s still a lot of work to do, in particular in terms of making parallel computing easier to access (time to advertise LibBi a little bit). On a related note, many people now point out whether their proposed algorithms are parallel-friendly or not.
- Going from the Rockies to the Alps, the food drastically changed from cheeseburgers to just melted cheese. Bread could be found but ground beef and Budweiser were reported missing.
- It’s fun to have an international conference in your home country, but switching from French to English all the time was confusing.
Back in flooded Oxford now!
Happy new year to everyone, and perhaps see you at MCMski 4 in Chamonix next week, which I expect to be a very friendly and exciting even if I’m not much into skiing. :-)
I will talk for the first time about SQMC, a QMC (Quasi Monte Carlo) variant of particle filtering (PF) that Mathieu Gerber and I developed in the recent months. We are quite excited about it for a variety of reasons, but I will give more details shortly on this blog.
I thought that my talk would clash with a session on PMCMC, which was quite unfortunate as I suspect that session would target perhaps the same audience, but looking at the program, I see it’s no longer the case. Thanks the power that be!
I also organise a session on “Bayesian computation in Neurosciences” in MCMski 4. Feel free to come if you have interest in the subject. Myself, I think it’s a particular cool area of application, about which I know very very little… which is why I organise a session to learn more about it! :-) I also co-organise (with Simon Barthelmé and Adam Johansen) a workshop at Warwick on the same subject, more details soon.
In case you have missed the new round of misdeeds by Elsevier, here is an excellent summary (plus a good overview of the current debate on open access an so on):
Many reactions seem to focus on Academia.edu, which is private company, so perhaps that case is no so black and white. However, I found the story (also mentioned by the WP paper) of our colleague Daniel Povey much more infuriating: Daniel put a legit copy of one of his paper on his web site, some robot wrongly detected this copy as the version owned by Elsevier, sent a DCMA take down note to Google, and boom, Google automatically shut downs Daniel’s google web page entirely. Welcome to the brave new world of robots enacting the Law.
I was talking with an Economist the other day. He told me that big corporations very rarely innovate, because they invested so much in a particular, currently lucrative, business model, even that model is doomed in the medium term. He gave me the example of Kodak: they developed the first digital camera before anyone else, yet they never managed to turn around their business model to make the transition to digital photography. They filed for bankruptcy last year. I think the same applies to Elsevier: even if it does not even make sense for them in the long run, this company is going to fight ugly to defend its current business model (the “treasure chest behind a pay wall”, the treasure being our papers) rather that trying to transition to a new business model compatible with open access. So I guess it falls on us to consider sending our paper to new players in academic publishing.
In other news, I have heard many French Universities are going to lose any access to Elsevier journals as of 1st Jan 2014, because of failed negociations between Elsevier and these Universities, but I found little detail on the interweb on this particular story.
I’ve just heard this sad piece of news. Definitely one of the greatest statisticians of the last 50 years. Wished I’d had met him in person.
Originally posted on Xi'an's Og:
Dennis Lindley most sadly passed away yesterday at the hospital near his home in Somerset. He was one of the founding fathers of our field (of Bayesian statistics), who contributed to formalise Bayesian statistics in a coherent theory. And to make it one with rational decision-making, a perspective missing in Jeffreys’ vision. (His papers figured prominently in the tutorials we gave yesterday for the opening of O’Bayes 250.) At the age of 90, his interest in the topic had not waned away: as his interview with Tony O’Hagan last Spring showed, his passionate arguing for the rationale of the Bayesian approach was still there and alive! The review he wrote of The Black Swan a few years ago also demonstrated he had preserved his ability to see through bogus arguments. (See his scathing “One hardly advances the respect with which statisticians are held in society by making…
View original 142 more words
Please note this is a very early, preliminary, non-official announcement, but I understand that our lab might be able to fund a post-doc position next academic year (starting around September 2014). The successful candidate would be expected to interact with a (non-empty!) subset of our Stats group (Arnak Dalayan, Eric Gautier, Judith Rousseau, Alexandre Tsybakov, and me). In particular, I’d be interested to hear from anyone who would like to apply in order to interact with me (and maybe other lab members) on things related to Bayesian computation (Sequential Monte Carlo, MCMC, fast approximations, etc), at least partially. I have various projects in mind, but I’m quite flexible and open to discussion. I think that the selection process might occur some time in May-June of next year, but again I don’t have exact details for now.
Former office mate Alex Thiery is still in Singapore and will start blogging here soon, so we’ll still have two continents covered. Still looking for contributors in the other ones!
In a recent post Nicolas discussed some limitation of pseudo-random number generation. On a related note there’s a feature of random variables that I find close to mystical.
In an on-going work with Alex Thiery, we had to precisely define the notion of randomized algorithms at some point, and we essentially followed Keane and O’Brien  (as it happens there’s an article today on arXiv that also is related, maybe, or not). The difficulty comes with the randomness. We can think of a deterministic algorithm as a good old function mapping an input space to an output space, but a random algorithm adds some randomness over a deterministic scheme (in an accept-reject step for instance, or a random stopping criterion), so that given fixed inputs the output might still vary. One way to formalise it consists in defining the algorithm as a deterministic function of inputs and of a source of randomness; that randomness is represented by a single random variable e.g. following an uniform distribution.
The funny, mystical and disturbing thing is that a single uniform random variable is enough to represent an infinity of them. It sounds like an excerpt of the Vedas, doesn’t it? To see this, write a single uniform realization in binary representation. That is, for write
with . The binary representation is
Now it’s easy to see that these zeros and ones are distributed as independent Bernoulli variables. Now we put these digits in a particular position, as follows.
If we take each column or each row from the grid above, they’re independent and they’re also binary representations of uniform random variables – you could also consider diagonals or more funky patterns. You could say that the random variable contains an infinity of independent clones.
This property actually sounds dangerous now, come to think of it. I think it was always well-known but people might not have made the link with Star Wars. In the end I’m happy to stick with harmless pseudo-random numbers, for safety reasons.
A recently arxived paper by Pier Bissiri, Chris Holmes and Steve Walker piqued my curiosity about “pseudo-Bayesian” approaches, that is, statistical approaches based on a pseudo-posterior:
where is some pseudo-likelihood. Pier, Chris and Steve use in particular
where is some empirical risk function. A good example is classification; then could be the proportion of properly classified points:
where is some score function parametrised by , and . (Side note: I find the ML convention for the more convenient than the stats convention.)
It turns out that this particular kind of pseudo-posterior has already been encountered before, but with different motivations:
- Chernozhukov and Hong (JoE, 2003) used it to define new Frequentist estimators based on moment estimation ideas (i.e. take above to be some empirical moment constraint). Focus is on establishing Frequentist properties of say the expectation of the pseudo-posterior. (It seems to me that few people have heard about this this paper in Stats).
- the PAC-Bayesian approach which originates from Machine Learning also relies on this kind of pseudo-posterior. To be more precise, PAC-Bayes usually starts by minimising the upper bound of an oracle inequality within a class of randomised estimators. Then, as a result, you obtain as a possible solution, say, a single draw for the pseudo-posterior defined above. A good introduction is this book by Olivier Catoni.
- Finally, Pier, Chris and Steve’s approach is by far the most Bayesian of these three pseudo-Bayesian approaches, in the sense that they try to maintain an interpretation of the pseudo-posterior as a representation on the uncertainty on . Crudely speaking, they don’t look only at the expectation, like the two approaches aboves, but also at the spread of the pseudo-posterior.
Let me mention briefly that quite a few papers have considered using other types of pseudo-likelihood in a pseudo-posterior, such as empirical likelihood, composite likelihood, and so on, but I will shamefully skip them for now.
To which extent this growing interest in “Pseudo-Bayes” should have an impact on Bayesian computation? For one thing, more problems to throw at our favourite algorithms should be good news. In particular, Chernozhukov and Hong mention the possibility to use MCMC as a big advantage for their approach, because typically the function they consider could be difficult to minimise directly by optimisation algorithms. PAC-Bayesians also seem to recommend MCMC, but I could not find so many PAC-Bayesian papers that go beyond the theory and actually implement it; an exception is this.
On the other hand, these pseudo posteriors might be quite nasty. First, given the way they are defined, they should not have the kind of structure that makes it possible to use Gibbs sampling. Second, many interesting choices for seem to be irregular or multimodal. Again, in the classification example, the 0-1 loss function is typically not continuous. Hopefully the coming years will witness some interesting research on which computational approaches are more fit for pseudo-Bayes computation, but readers will not be surprised if I put my Euros on (some form of) SMC!
I start a new course at ENSAE this year, on “Monte Carlo and simulation methods”. I intend to cover pseudo-random generators at the beginning, so I’m thinking about how to teach this material which I’m not so familiar with.
One very naive remark: in a “truly random world”, when I flip a coin times, I obtain one out of possible outcomes, with probability . In the real world, if I use a computer to toss coins, the number of possible outcomes (for these $n$ successive tosses) is bounded by . This is because a stream of pseudo-random numbers is completely determined by the seed (the starting point of the stream), and most generators are based on 32-bits seeds.
Compare with when is large, and you see that PRNG is quite a crude approximation of randomness. Of course, it’s not so bad in practice, because usually you are not interested in the exact value of a vector of successive coin tosses, but rather at some summary of dimension . Still, the pseudo-random world is much smaller than the random world it is supposed to mimic.
I found this remark quite scary, and I think I’ll use it to impress on my students the limitations of PRNG. By the way, if you like horror stories on PRNG, you might find the slides of Régis Lebrun (for a talk at BigMC he gave a few years back) quite entertaining. It was really funny to see the faces of my colleagues turning white as Régis was giving more and more evidence that we are often too confident in PRN generators and oblivious of their limitations. I suspect my own face was very much the same colour.
Hi Statisfied readers,
I am Nicolas Chopin, a Professor of Statistics at the ENSAE, and my colleagues and good friends that manage Statisfaction kindly agreed that I would join their blog. I work mostly on “Bayesian Computation”, i.e. Monte Carlo and non-Monte Carlo methods to compute Bayesian quantities; a strong focus of my research is on Sequential Monte Carlo (aka particle filters).
I don’t plan to blog very regularly, and only on stuff related to my research, at least in some way. Well, that’s the idea for now. Stay tuned!