## Reading Bayesian classics — presentations

The students did a great job in presenting some Bayesian classics. I enjoyed reading the papers (pdfs can be found here), most of which I hadn’t read before, and enjoyed also the students’ talks. I share here some of the best ones, as well as some demonstrative excerpts from the papers. In chronological order (presentations on slideshare below):

- W. Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications.
*Biometrika*, 57(1):97–109, 1970.

In this paper, we shall consider Markov chain methods of sampling that are generalizations of a method proposed by Metropolis et al. (1953), which has been used extensively for numerical problems in statistical mechanics.

- Dennis V. Lindley and Adrian F.M. Smith. Bayes estimates for the linear model.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology),*with discussion, 1–41, 1972.

From Prof. B. de Finetti discussion (note the *valliant* collaborator Smith!):

I think that the main point to stress about this interesting and important paper is its significance for the philosophical questions underlying the acceptance of the Bayesian standpoint as the true foundation for inductive reasoning, and in particular for statistical inference. So far as I can remember, the present paper is the first to emphasize the role of the Bayesian standpoint as a logical framework for the analysis of intricate statistical situation. […] I would like to express my warmest congratulations to my friend Lindley and his valiant collaborator Smith.

## Statistics journals network

Xian blogged recently on the incoming RSS read paper: Statistical Modelling of Citation Exchange Between Statistics Journals, by Cristiano Varin, Manuela Cattelan and David Firth. Following the last JRSS B read paper by one of us! The data that are used in the paper (and can be downloaded here) are quite *fascinating* for us, *academics fascinated by academic rankings, for better or for worse *(ironic here). They consist in cross citations counts for 47 statistics journals (see list and abbreviations page 5): is the number of citations from articles published in journal in 2010 to papers published in journal in the 2001-2010 decade. The choice of the list of journals is discussed in the paper. Major journals missing include *Bayesian Analysis* (published from 2006), *The Annals of Applied Statistics* (published from 2007).

I looked at the ratio of Total Citations Received by Total Citations made. This is a super simple descriptive statistic which happen to look rather similar to Figure 4 which plots Export Scores from Stigler model (can’t say more about it, I haven’t read in detail). The top five is the same modulo the swap between *Annals of Statistics* and *Biometrika*. Of course a big difference is that the Cited/Citation ratio isn’t endowed with a measure of uncertainty (below, left is my making, right is Fig. 4 in the paper).

I was surprised not to see a graph / network representation of the data in the paper. As it happens I wanted to try the gephi software for drawing graphs, used for instance by François Caron and Emily Fox in their sparse graphs paper. I got the above graph, where:

- for the data, I used the citations matrix renormalized by the total number of citations made, which I denote by . This is a way to account for the size (number of papers published) of the journal. This is just a proxy though since the actual number of papers published by the journal is not available in the data. Without that correction,
*CSDA*is way ahead of all the others. - the node size represents the Cited/Citing ratio
- the edge width represents the renormalized . I’m unsure of what gephi does here, since it converts my directed graph into an undirected graph. I suppose that it displays only the largest of the two edges and .
- for a better visibility I kept only the first decile of heaviest edges.
- the clusters identified by four colors are modularity classes obtained by the Louvain method.

**Some remarks**

The two software journals included in the dataset are quite outliers:

- the
*Journal of Statistical Software (JSS)*is disconnected from the others, meaning it has no normalized citations in the first decile. Except from its self citations which are quite big and make it the 4th Impact Factor from the total list in 2010 (and apparently the first in 2015). - the largest is the self citations of the
*STATA Journal (StataJ).*

Centrality:

*CSDA*is the most central journal in the sense of the highest (unweighted) degree.

**Some further thoughts**

All that is just for the fun of it. As mentioned by the authors, citation counts are heavy-tailed, meaning that just a few papers account for much of the citations of a journal while most of the papers account for few citations. As a matter of fact, the total of citations received is mostly driven by a few super-cited papers, and also is the Cited/Citations matrix that I use throughout for building the graph. A reason one could put forward about why JRSS B makes it so well is the read papers: for instance, Spiegelhalter et al. (2002), DIC, received alone 11.9% of all JRSS B citations in 2010. Who’d bet the number of citation this new read paper (JRSS A though) will receive?

## [Meta-]Blogging as young researchers

*Hello all,*

*This is an article intended for the ISBA bulletin, jointly written by us all at Statisfaction, Rasmus Bååth from Publishable Stuff, Boris Hejblum from Research side effects, Thiago G. Martins from tgmstat@wordpress, Ewan Cameron from Another Astrostatistics Blog and Gregory Gandenberger from gandenberger.org. *

Inspired by established blogs, such as the popular Statistical Modeling, Causal Inference, and Social Science or Xi’an’s Og, each of us began blogging as a way to diarize our learning adventures, to share bits of R code or LaTeX tips, and to advertise our own papers and projects. Along the way we’ve come to a new appreciation of the world of academic blogging: a never-ending international seminar, attended by renowned scientists and anonymous users alike. Here we share our experiences by weighing the pros and cons of blogging from the point of view of young researchers.

## Unfortunate typos in read paper

Mathieu and I have just realised that the version of our SQMC paper made available on the RSS web site contains several unfortunate typos. In particular, the symbol for “small o” has been replaced by a “big O” by editors. For instance, Theorem 9 should state the QMC beats standard SMC; i.e. the MSE (mean square error) of an SQMC estimator is

but in the RSS version, it reads

.

Well, that’s a bummer. For now, I recommend anyone to read instead the arxiv version (updated on Monday).

## SQMC read paper

Almost 10 months since my latest post? I guess bloggin’ ain’t my thing… In my defense, Mathieu Gerber and I were quite busy revising our SQMC paper. I am happy to announce that it has just been accepted as a read paper in JRSSB. If all goes as planned, we should present the paper at the RSS ordinary meeting on Dec 10. Everybody is welcome to attend, and submit an oral or written discussion (or both). More details soon, when the event is officially announced on the RSS web-site.

What is SQMC? It is a QMC (Quasi-Monte Carlo) version of particle filtering. For the same CPU cost, it typically generates much more accurate estimators. Interested? consider reading the paper here (more recent version coming soon), checking this video where I present SQMC, or, even better, attending our talk in London!

## momentify R package at BAYSM14

I presented an arxived paper of my postdoc at the big success Young Bayesian Conference in Vienna. The big picture of the talk is simple: there are situations in Bayesian nonparametrics where you don’t know how to sample from the posterior distribution, but you can only compute posterior expectations (so-called *marginal methods*). So e.g. you cannot provide credible intervals. But sometimes all the moments of the posterior distribution are available as posterior expectations. So morally, you should be able to say more about the posterior distribution than just reporting the posterior mean. To be more specific, we consider a hazard (h) mixture model

where is a kernel, and the mixing distribution is random and discrete (Bayesian nonparametric approach).

We consider the survival function which is recovered from the hazard rate by the transform

and some possibly censored survival data having survival . Then it turns out that all the posterior moments of the survival curve evaluated at any time can be computed.

The nice trick of the paper is to use the representation of a distribution in a [Jacobi polynomial] basis where the coefficients are linear combinations of the moments. So one can sample from [an approximation of] the posterior, and with a posterior sample we can do everything! Including credible intervals.

I’ve wrapped up the few lines of code in an R package called momentify (not on CRAN). With a sequence of moments of a random variable supported on [0,1] as an input, the package does two things:

- evaluates the approximate density
- samples from it

A package example for a mixture of beta and 2 to 7 moments gives that result:

## Beautiful Science: Picturing Data, Inspiring Insight

Hey,

There’s a nice exhibition open until May 26th at the British Library in London, entitled Beautiful Science: Picturing Data, Inspiring Insight. Various examples of data visualizations are shown, either historical or very modern, or even made especially for the exhibition. Definitely worth a detour if you happen to be in the area, you can see everything in 15 minutes.

In particular there are nice visualisations of historical climate data, gathered from the logbooks of the English East India company, whose ships were crossing every possible sea in the beginning of the 19th century. The logbooks contain locations and daily weather reports, handwritten by the captains themselves. Turns out the logbooks are kept at the British Library itself and some of them are on display at the exhibition. More info on that project here: oldweather.org.

## Quantitative arguments as hypermedia

Hey readers,

I’m Joseph Dureau, I have been an avid reader of this blog for while now, and I’m very glad Pierre proposed me to share a few things. Until a few months ago, I used to work on Bayesian inference methods for stochastic processes, with applications to epidemiology. Along with fellow colleagues from this past life, I have now taken the startup path, founding Standard Analytics. We’re looking into how web technologies can be used to enhance browsability, transparency and impact of scientific publications. Here’s a start on what we’ve been up to so far.

Let me just make it clear that everything I’m presenting is fully open source, and available here. I hope you’ll find it interesting, and we’re very excited to hear from you! Here it goes..

To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically.

Berners-Lee et al, 2001

Since this sentence was written, twelve years ago, ambitious and collective initiatives have been undertaken to revolutionize what machines can do for us on the web. When I make a purchase online, my email service is able to understand it from the purchase confirmation email, communicate to the online store service, authenticate, obtain information on the delivery, and provide me with a real-time representation of where the item is located. Machines now have the means to process data in a smarter way, and to communicate over it!

However, when it comes to exchanging quantitative arguments, be it in a blog post or in a scientific article, web technology does not bring us much further than what can be done with pen and paper. (more…)

## See you in Chamonix

Happy new year to everyone, and perhaps see you at MCMski 4 in Chamonix next week, which I expect to be a very friendly and exciting even if I’m not much into skiing. :-)

I will talk for the first time about SQMC, a QMC (Quasi Monte Carlo) variant of particle filtering (PF) that Mathieu Gerber and I developed in the recent months. We are quite excited about it for a variety of reasons, but I will give more details shortly on this blog.

I thought that my talk would clash with a session on PMCMC, which was quite unfortunate as I suspect that session would target perhaps the same audience, but looking at the program, I see it’s no longer the case. Thanks the power that be!

I also organise a session on “Bayesian computation in Neurosciences” in MCMski 4. Feel free to come if you have interest in the subject. Myself, I think it’s a particular cool area of application, about which I know very very little… which is why I organise a session to learn more about it! :-) I also co-organise (with Simon Barthelmé and Adam Johansen) a workshop at Warwick on the same subject, more details soon.

## another nice one from Elsevier

In case you have missed the new round of misdeeds by Elsevier, here is an excellent summary (plus a good overview of the current debate on open access an so on):

Many reactions seem to focus on Academia.edu, which is private company, so perhaps that case is no so black and white. However, I found the story (also mentioned by the WP paper) of our colleague Daniel Povey much more infuriating: Daniel put a legit copy of one of his paper on his web site, some robot wrongly detected this copy as the version owned by Elsevier, sent a DCMA take down note to Google, and boom, Google automatically shut downs Daniel’s google web page entirely. Welcome to the brave new world of robots enacting the Law.

I was talking with an Economist the other day. He told me that big corporations very rarely innovate, because they invested so much in a particular, currently lucrative, business model, even that model is doomed in the medium term. He gave me the example of Kodak: they developed the first digital camera before anyone else, yet they never managed to turn around their business model to make the transition to digital photography. They filed for bankruptcy last year. I think the same applies to Elsevier: even if it does not even make sense for them in the long run, this company is going to fight ugly to defend its current business model (the “treasure chest behind a pay wall”, the treasure being our papers) rather that trying to transition to a new business model compatible with open access. So I guess it falls on us to consider sending our paper to new players in academic publishing.

In other news, I have heard many French Universities are going to lose any access to Elsevier journals as of 1st Jan 2014, because of failed negociations between Elsevier and these Universities, but I found little detail on the interweb on this particular story.

leave a comment