Kudos to Rasmus for this very practical approach, potentially very impactful. Maybe someday people will have to specify if they want a frequentist approach and not the other way around! (I had a dream, etc).
A few days after the MCMSki conference, I start to see the main lessons gathered there.
- I should really read the full program before attending the next MCMSki. The three parallel sessions looked consistently interesting, and I really regret having missed some talks (in particular Dawn Woodard‘s and Natesh Pillai‘s) and some posters as well (admittedly, due to exhaustion on my part).
- Compared to the previous instance three years ago (in Utah), the main themes have significantly changed. Scalability, approximate methods, non-asymptotic results, 1/n methods … these keywords are now on everyone’s lips. Can’t wait to see if MCQMC’14 will feel that different from MCQMC’12.
- The community is rightfully concerned about scaling Monte Carlo methods to big data, with some people pointing out that models should also be rethought in this new context.
- The place of software developers in the conference, or simply references to software packages in the talks, is much greater than it used to be. It’s a very good sign towards reproducible research in our field. There’s still a lot of work to do, in particular in terms of making parallel computing easier to access (time to advertise LibBi a little bit). On a related note, many people now point out whether their proposed algorithms are parallel-friendly or not.
- Going from the Rockies to the Alps, the food drastically changed from cheeseburgers to just melted cheese. Bread could be found but ground beef and Budweiser were reported missing.
- It’s fun to have an international conference in your home country, but switching from French to English all the time was confusing.
Back in flooded Oxford now!
I’ve just heard this sad piece of news. Definitely one of the greatest statisticians of the last 50 years. Wished I’d had met him in person.
Originally posted on Xi'an's Og:
Dennis Lindley most sadly passed away yesterday at the hospital near his home in Somerset. He was one of the founding fathers of our field (of Bayesian statistics), who contributed to formalise Bayesian statistics in a coherent theory. And to make it one with rational decision-making, a perspective missing in Jeffreys’ vision. (His papers figured prominently in the tutorials we gave yesterday for the opening of O’Bayes 250.) At the age of 90, his interest in the topic had not waned away: as his interview with Tony O’Hagan last Spring showed, his passionate arguing for the rationale of the Bayesian approach was still there and alive! The review he wrote of The Black Swan a few years ago also demonstrated he had preserved his ability to see through bogus arguments. (See his scathing “One hardly advances the respect with which statisticians are held in society by making such declarations” in his ripping discussion of Aitkin’s 1991 Posterior Bayes factors.) He also started this interesting discussion last year about the five standard deviations “needed” for the Higgs boson… My personal email contacts with Dennis over the re-reading of Jeffreys’ book were a fantastic experience as he kindly contributed by expanding on how the book was received at the time and correcting some of my misunderstanding. It is a pity I can no longer send him the (soon to come?) final version of my Jeffreys-Lindley paradox paper as I intended to do. The email email@example.com will no longer answer our queries… I figure there will be many testimonies and shared memories of his contributions and life at the Bayes-250 conference tomorrow. Farewell, Dennis, and I hope you now explore the paths of a more coherent world than ours!
Former office mate Alex Thiery is still in Singapore and will start blogging here soon, so we’ll still have two continents covered. Still looking for contributors in the other ones!
In a recent post Nicolas discussed some limitation of pseudo-random number generation. On a related note there’s a feature of random variables that I find close to mystical.
In an on-going work with Alex Thiery, we had to precisely define the notion of randomized algorithms at some point, and we essentially followed Keane and O’Brien  (as it happens there’s an article today on arXiv that also is related, maybe, or not). The difficulty comes with the randomness. We can think of a deterministic algorithm as a good old function mapping an input space to an output space, but a random algorithm adds some randomness over a deterministic scheme (in an accept-reject step for instance, or a random stopping criterion), so that given fixed inputs the output might still vary. One way to formalise it consists in defining the algorithm as a deterministic function of inputs and of a source of randomness; that randomness is represented by a single random variable e.g. following an uniform distribution.
The funny, mystical and disturbing thing is that a single uniform random variable is enough to represent an infinity of them. It sounds like an excerpt of the Vedas, doesn’t it? To see this, write a single uniform realization in binary representation. That is, for write
with . The binary representation is
Now it’s easy to see that these zeros and ones are distributed as independent Bernoulli variables. Now we put these digits in a particular position, as follows.
If we take each column or each row from the grid above, they’re independent and they’re also binary representations of uniform random variables – you could also consider diagonals or more funky patterns. You could say that the random variable contains an infinity of independent clones.
This property actually sounds dangerous now, come to think of it. I think it was always well-known but people might not have made the link with Star Wars. In the end I’m happy to stick with harmless pseudo-random numbers, for safety reasons.
To illustrate generally complex probability density functions on continuous spaces, researchers always use the same examples, for instance mixtures of Gaussian distributions or a banana shaped distribution defined on with density function:
If we draw a sample from this distribution using MCMC we obtain a [scatter]plot like this one:
Clearly it doesn’t really look like a banana, even if you use yellow to colour the dots like here. Actually it looks more like a boomerang, if anything. I was worried about this for a while, until I came up with a more realistic banana shaped distribution:
See how the shape is well defined compared to the first figure? And there’s even the little tail, that proves so convenient when we want to peel off the fruit. More generally we might want to create target density functions based on general shapes. For this you can now try RShapeTarget, which you can install directly from R using devtools:
library(devtools) install_github(repo="RShapeTarget", username="pierrejacob")
The package parses SVG files representing shapes, and creates target densities from them. More precisely, a SVG files contains “paths”, which are sequence of points (for instance the above banana is a single closed path). The associated log density at any point is defined by where is the closest path of the shape from and is the distance between the point and the path. The parameter specifies the rate at which the density decays when the point goes away from the shape. With this you can define the maple leaf distribution, as a tribute to JSM 2013:
In the package you can get a distribution from a SVG file using the following code:
library(RShapeTarget) # create target from file my_shape_target <- create_target_from_shape(my_svg_file_name, lambda =1) # test the log density function on 25 randomly generated points my_shape_target$logd(matrix(rnorm(50), ncol = 2), my_shape_target$algo_parameters)
Since characters are just a bunch of paths, you can also define distributions based on words, for instance:
which is done as follows (warning you’re only allowed a-z and A-Z, no numbers no space no punctuation for now):
library(RShapeTarget) word_target <- create_target_from_word("Hodor")
For the words, I defined the target density function as before, except that it’s constant on the letters: so if a point is outside a letter its density is computed based on the distance to the nearest path; if it’s inside a letter it’s just constant, so that the letters are “filled” with some constant density. I thought it’d look better.
Now I’m not worried about the banana shaped distribution any more, but by the fact that the only word I could think of was “Hodor” (with whom you can chat over there).
I’ll talk in a session organized by Scott Schmidler, entitled Adaptive Monte Carlo Methods for Bayesian Computation; you can find the session programme here [online program]. I’ll talk about score and Fisher observation matrix estimation in state-space models.
According to the rumour and Christian’s reflections on the past years (2009, 2010, 2011), I should prepare my schedule in advance to really enjoy this giant meeting. So if you want to meet there, please send me an e-mail!
See you in Montréal!
We’re at the Big Data era blablabla, but the advanced computational methods usually don’t scale well enough to match the increasing sizes of datasets. For instance, even in a simple case of i.i.d. data and an associated likelihood function , the cost of evaluating the likelihood function at any parameter is typically growing at least linearly with . If you then plug that likelihood into an optimization technique to find the Maximum Likelihood Estimate, or into a sampling technique such as Metropolis-Hastings to sample from the posterior distribution, the computational cost grows accordingly for a fixed number of iterations. However you can get unbiased estimates of the log-likelihood by drawing points uniformly in the index set and by computing . This way you sub-sample from the whole dataset, and you can choose according to your computational budget. However is it possible to perform inference with these estimates instead of the complete log-likelihood?
This blog started as a collaborative blog written by then PhD students at CREST. Now some of us have left the lab but still feel like blogging from time to time so we use this blog. Going further, I don’t see any reason not to broaden our perspective by letting other people participate, in order to maintain a decent activity. The target would be at least a post per week. I am sure that many junior researchers out there feel like they could write a post or two, so this could be the place to share your views!
If you’re interested, either in a one-time blog post or on a more regular basis, please feel free to contact us by e-mail or in the comments below. You can just browse through the blog if you’re not sure what the scope is… actually the scope is pretty ill-defined, but includes tips and tricks in R, LaTeX, conferences in Statistics, mostly Bayesian or computational, random datasets, recent articles, reports on unusual use of statistical methods…