There’s a nice exhibition open until May 26th at the British Library in London, entitled Beautiful Science: Picturing Data, Inspiring Insight. Various examples of data visualizations are shown, either historical or very modern, or even made especially for the exhibition. Definitely worth a detour if you happen to be in the area, you can see everything in 15 minutes.
In particular there are nice visualisations of historical climate data, gathered from the logbooks of the English East India company, whose ships were crossing every possible sea in the beginning of the 19th century. The logbooks contain locations and daily weather reports, handwritten by the captains themselves. Turns out the logbooks are kept at the British Library itself and some of them are on display at the exhibition. More info on that project here: oldweather.org.
Kudos to Rasmus for this very practical approach, potentially very impactful. Maybe someday people will have to specify if they want a frequentist approach and not the other way around! (I had a dream, etc).
I’m Joseph Dureau, I have been an avid reader of this blog for while now, and I’m very glad Pierre proposed me to share a few things. Until a few months ago, I used to work on Bayesian inference methods for stochastic processes, with applications to epidemiology. Along with fellow colleagues from this past life, I have now taken the startup path, founding Standard Analytics. We’re looking into how web technologies can be used to enhance browsability, transparency and impact of scientific publications. Here’s a start on what we’ve been up to so far.
Let me just make it clear that everything I’m presenting is fully open source, and available here. I hope you’ll find it interesting, and we’re very excited to hear from you! Here it goes..
To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically.
Berners-Lee et al, 2001
Since this sentence was written, twelve years ago, ambitious and collective initiatives have been undertaken to revolutionize what machines can do for us on the web. When I make a purchase online, my email service is able to understand it from the purchase confirmation email, communicate to the online store service, authenticate, obtain information on the delivery, and provide me with a real-time representation of where the item is located. Machines now have the means to process data in a smarter way, and to communicate over it!
However, when it comes to exchanging quantitative arguments, be it in a blog post or in a scientific article, web technology does not bring us much further than what can be done with pen and paper. (more…)
A few days after the MCMSki conference, I start to see the main lessons gathered there.
- I should really read the full program before attending the next MCMSki. The three parallel sessions looked consistently interesting, and I really regret having missed some talks (in particular Dawn Woodard‘s and Natesh Pillai‘s) and some posters as well (admittedly, due to exhaustion on my part).
- Compared to the previous instance three years ago (in Utah), the main themes have significantly changed. Scalability, approximate methods, non-asymptotic results, 1/n methods … these keywords are now on everyone’s lips. Can’t wait to see if MCQMC’14 will feel that different from MCQMC’12.
- The community is rightfully concerned about scaling Monte Carlo methods to big data, with some people pointing out that models should also be rethought in this new context.
- The place of software developers in the conference, or simply references to software packages in the talks, is much greater than it used to be. It’s a very good sign towards reproducible research in our field. There’s still a lot of work to do, in particular in terms of making parallel computing easier to access (time to advertise LibBi a little bit). On a related note, many people now point out whether their proposed algorithms are parallel-friendly or not.
- Going from the Rockies to the Alps, the food drastically changed from cheeseburgers to just melted cheese. Bread could be found but ground beef and Budweiser were reported missing.
- It’s fun to have an international conference in your home country, but switching from French to English all the time was confusing.
Back in flooded Oxford now!
Happy new year to everyone, and perhaps see you at MCMski 4 in Chamonix next week, which I expect to be a very friendly and exciting even if I’m not much into skiing. :-)
I will talk for the first time about SQMC, a QMC (Quasi Monte Carlo) variant of particle filtering (PF) that Mathieu Gerber and I developed in the recent months. We are quite excited about it for a variety of reasons, but I will give more details shortly on this blog.
I thought that my talk would clash with a session on PMCMC, which was quite unfortunate as I suspect that session would target perhaps the same audience, but looking at the program, I see it’s no longer the case. Thanks the power that be!
I also organise a session on “Bayesian computation in Neurosciences” in MCMski 4. Feel free to come if you have interest in the subject. Myself, I think it’s a particular cool area of application, about which I know very very little… which is why I organise a session to learn more about it! :-) I also co-organise (with Simon Barthelmé and Adam Johansen) a workshop at Warwick on the same subject, more details soon.
In case you have missed the new round of misdeeds by Elsevier, here is an excellent summary (plus a good overview of the current debate on open access an so on):
Many reactions seem to focus on Academia.edu, which is private company, so perhaps that case is no so black and white. However, I found the story (also mentioned by the WP paper) of our colleague Daniel Povey much more infuriating: Daniel put a legit copy of one of his paper on his web site, some robot wrongly detected this copy as the version owned by Elsevier, sent a DCMA take down note to Google, and boom, Google automatically shut downs Daniel’s google web page entirely. Welcome to the brave new world of robots enacting the Law.
I was talking with an Economist the other day. He told me that big corporations very rarely innovate, because they invested so much in a particular, currently lucrative, business model, even that model is doomed in the medium term. He gave me the example of Kodak: they developed the first digital camera before anyone else, yet they never managed to turn around their business model to make the transition to digital photography. They filed for bankruptcy last year. I think the same applies to Elsevier: even if it does not even make sense for them in the long run, this company is going to fight ugly to defend its current business model (the “treasure chest behind a pay wall”, the treasure being our papers) rather that trying to transition to a new business model compatible with open access. So I guess it falls on us to consider sending our paper to new players in academic publishing.
In other news, I have heard many French Universities are going to lose any access to Elsevier journals as of 1st Jan 2014, because of failed negociations between Elsevier and these Universities, but I found little detail on the interweb on this particular story.
I’ve just heard this sad piece of news. Definitely one of the greatest statisticians of the last 50 years. Wished I’d had met him in person.
Originally posted on Xi'an's Og:
Dennis Lindley most sadly passed away yesterday at the hospital near his home in Somerset. He was one of the founding fathers of our field (of Bayesian statistics), who contributed to formalise Bayesian statistics in a coherent theory. And to make it one with rational decision-making, a perspective missing in Jeffreys’ vision. (His papers figured prominently in the tutorials we gave yesterday for the opening of O’Bayes 250.) At the age of 90, his interest in the topic had not waned away: as his interview with Tony O’Hagan last Spring showed, his passionate arguing for the rationale of the Bayesian approach was still there and alive! The review he wrote of The Black Swan a few years ago also demonstrated he had preserved his ability to see through bogus arguments. (See his scathing “One hardly advances the respect with which statisticians are held in society by making…
View original 142 more words
Please note this is a very early, preliminary, non-official announcement, but I understand that our lab might be able to fund a post-doc position next academic year (starting around September 2014). The successful candidate would be expected to interact with a (non-empty!) subset of our Stats group (Arnak Dalayan, Eric Gautier, Judith Rousseau, Alexandre Tsybakov, and me). In particular, I’d be interested to hear from anyone who would like to apply in order to interact with me (and maybe other lab members) on things related to Bayesian computation (Sequential Monte Carlo, MCMC, fast approximations, etc), at least partially. I have various projects in mind, but I’m quite flexible and open to discussion. I think that the selection process might occur some time in May-June of next year, but again I don’t have exact details for now.
Former office mate Alex Thiery is still in Singapore and will start blogging here soon, so we’ll still have two continents covered. Still looking for contributors in the other ones!
In a recent post Nicolas discussed some limitation of pseudo-random number generation. On a related note there’s a feature of random variables that I find close to mystical.
In an on-going work with Alex Thiery, we had to precisely define the notion of randomized algorithms at some point, and we essentially followed Keane and O’Brien  (as it happens there’s an article today on arXiv that also is related, maybe, or not). The difficulty comes with the randomness. We can think of a deterministic algorithm as a good old function mapping an input space to an output space, but a random algorithm adds some randomness over a deterministic scheme (in an accept-reject step for instance, or a random stopping criterion), so that given fixed inputs the output might still vary. One way to formalise it consists in defining the algorithm as a deterministic function of inputs and of a source of randomness; that randomness is represented by a single random variable e.g. following an uniform distribution.
The funny, mystical and disturbing thing is that a single uniform random variable is enough to represent an infinity of them. It sounds like an excerpt of the Vedas, doesn’t it? To see this, write a single uniform realization in binary representation. That is, for write
with . The binary representation is
Now it’s easy to see that these zeros and ones are distributed as independent Bernoulli variables. Now we put these digits in a particular position, as follows.
If we take each column or each row from the grid above, they’re independent and they’re also binary representations of uniform random variables – you could also consider diagonals or more funky patterns. You could say that the random variable contains an infinity of independent clones.
This property actually sounds dangerous now, come to think of it. I think it was always well-known but people might not have made the link with Star Wars. In the end I’m happy to stick with harmless pseudo-random numbers, for safety reasons.