Statisfaction

Moustache target distribution and Wes Anderson

Posted in Art, Geek, R by Pierre Jacob on 31 March 2014

Today I am going to introduce the moustache target distribution (moustarget distribution for brievety). Load some packages first.

library(wesanderson) # on CRAN
library(RShapeTarget) # available on https://github.com/pierrejacob/RShapeTarget/
library(PAWL) # on CRAN


Let’s invoke the moustarget distribution.

 shape <- create_target_from_shape(
file_name=system.file(package = "RShapeTarget", "extdata/moustache.svg"),
lambda=5)
rinit <- function(size) matrix(rnorm(2*size), ncol = 2)
moustarget <- target(name = "moustache", dimension = 2,
rinit = rinit, logdensity = shape$logd, parameters = shape$algo_parameters)


This defines a target distribution represented by a SVG file using RShapeTarget. The target probability density function is defined on $\mathbb{R}^2$ and is proportional to $1$ on the segments described in the SVG files, and decreases exponentially fast to $0$ away from the segments. The density function of the moustarget is plotted below, a picture being worth a thousand words.

Using R in LaTeX with knitr and RStudio

Posted in Geek, LaTeX, R by Julyan Arbel on 28 February 2013

Hi,

I presented today at INSEE R user group (FL$\tau$R) how to use knitr (Sweave evolution) for writing $\LaTeX$ documents which are self contained with respect to the source code: your data changed? No big deal, just compile your .Rnw file again and you are done with an updated version of your paper![Ctrl+Shift+I] is easy. Some benefits with respect to having two separate .R and .tex files: it is integrated in a single software (RStudio), you can call variables in your text with the \Sexpr{} command. The slow speed at compilation is no more a real matter as one can put “cache=TRUE” in code chunk options not to reevaluate unchanged chunks, which fastens things.

I share the (brief) slides below. They won’t help much those who already use knitr, but they give the first steps for those who would like to give it a try.

A good tool for researchers ?

Posted in Geek by JB Salomond on 17 January 2013

Hi there !

Like Pierre a while ago, I got fed up with printing articles, annotating them, losing them, re-printing them, and so on. Moreover, I also wanted to be able to carry more than one or two books in my bag without ruining my back. E-Ink readers seemed good but at some point I changed my mind

After the ISBA conference in Kyoto, where I saw bazillions of IPads, I thought that tablets really worth the shot. I am cool with reading on a LCD screen, I probably won’t read scientific articles/books outside in the sun, and I like the idea of a light device that can replace my laptop in conferences. Furthermore, there is now a large choice of apps to annotate pdf which is crucial for me.

The device I chose run on Android (mainly because there is no memory extension on Apple devices), combined with a good capacitive pen, an annotation app such as eZreader that get your pdf directly from Dropbox (which is simply awesome). You can even use LaTeX (without fancy packages…) which may become handy.

I hope that I will not experience the same disappointment as Pierre did with his reader, but for the moment a tablet seems just what I needed !

Dropbox Space Race

Posted in Geek, General by Julyan Arbel on 1 December 2012

Hi,

additionally to the referral program (you refer a new user, you win an extra .5 Go), the Dropbox Space Race will give you 3 Go extra space (for 2 years) if you register with your email from a competing university. The best schools will get more space. Here are the 100 top schools. Com’ on, there is no french school in the 100 top !

Thanks Nicolas for the info.

Tagged with: , ,

E-ink

Posted in Geek by Pierre Jacob on 29 March 2012

Hey,

A few weeks ago I got fed up with printing, carrying around, losing, and reprinting papers, and decided to go for a 10″ e-reader.  I like to be able to write annotations on papers, so I went for one with a stylus.

I’m enthusiastic about it so far: you can read pretty well, 12 or 13 inches would have been better but 10 inches is big enough to read scientific articles. The device automatically removes the margins so you can have an entire page on the screen, so there’s no need to scroll all the time (at least for most papers). You can even click on the hyper links with the stylus, search in the text, put bookmarks etc, so there’s no particular issues when handling long documents like books. The only disappointing feature is the web browser, that I find quite close to unusable, except for briefly connecting to arXiv to see the most recent entries.

One thing that you don’t get with an e-reader is the satisfaction of seeing your papers getting old, torn, coffee stained etc. If you’re really nostalgic you can still add coffee stains on your documents using this LaTeX package.

On a more geeky note, the device I chose runs linux and the manufacturer seems to support open source software; and hence there is a SDK to add more applications. The device seems even capable of running python! I’m eager to see what’ll come out of this.

Pierre

GPUs in Computational Statistics

Posted in Geek, General, Seminar/Conference by Pierre Jacob on 17 January 2012

Hey,

Next week the Centre for Research in Statistical Methodology (CRiSM, in Warwick, UK) will be hosting a workshop on the use of graphics processing units in statistics, a quickly expanding area that I’ve blogged about here. Xian and I are going to talk about Parallel IMH and Parallel Wang Landau. We’ll be able to interact with top researchers in methodological statistics and early adopters of GPUs like Chris Holmes, whose talk at Valencia 9 was quite influential in the field (and for my PhD!), and Christophe Andrieu, who is one of the to-be-praised Particle MCMC guys (see e.g. here)

The programme for the workshop is here: http://www2.warwick.ac.uk/fac/sci/statistics/crism/workshops/gpu/programme

…and registration fee is only 50£, ie not even a half GPU!

http://www.louisaslett.com/Talks/GPU_Programming_Basics_Getting_Started.html

should help you a lot if you want to use plain CUDA C code within R.

As for me, thanks to Anthony Lee I will spend a week in Warwick prior to the meeting, working at CRiSM and enjoying the West Midlands.

Cheers!

Posted in Dataset, Geek by Julyan Arbel on 8 December 2011

A quick post about another Google service that I discovered recently called Fusion Tables. There you can store, share and visualize data up to 250 MB, of course in the cloud. With Google Docs, Google Trends and Google Public Data Explore, it is another example of Google’s efforts to gain ground in data management. Has anyone tried it out?

Tagged with:

Posted in Geek, General by Pierre Jacob on 8 November 2011

Hey,

Just to try it out, I’ve launched a statisfaction page on Google+. You should be able to see the page without a Google plus account. What can we do with a G+ page? I have no idea.

PAWL package on CRAN

Posted in Geek, R by Pierre Jacob on 26 October 2011

The PAWL package (which I talked about there, and which implements the parallel adaptive Wang-Landau algorithm and adaptive Metropolis-Hastings for comparison) is now on CRAN!

http://cran.r-project.org/web/packages/PAWL/index.html

which means that within R you can easily install it by typing

install.packages("PAWL")

Isn’t that amazing? It’s just amazing. Kudos to the CRAN team for their quickness and their help.

Calling Google Maps API from R

Posted in Geek, R by Pierre Jacob on 5 October 2011

Hi,

Related to Julyan’s previous post, I want to share an easy way to access Google Maps API through R. And then we’ll stop about Google, otherwise it’ll look like we’re just looking for jobs.

My problem was the following: I have a database (from priceofweed.com), with locations written as “city, region, country”. What I wanted was the precise location (latitude, longitude) for each city. After some browsing it’s possible to grab a list of cities for each country from some local geographical institute and merge that with the database. The problem is that for each country the database is often in a different format, and full of unnecessary information for the problem at hand (and hence unnecessarily large). For example the information for the US is there somewhere (and it’s amazingly detailed by the way), whereas for other countries it’s there.

So instead a “lazy” method consists in calling Google Maps to find the location for each city, since google maps has a pretty good world-wide coverage of geographic names, it should work! The R function is described there, and I copy paste it here:

getDocNodeVal=function(doc, path)
{
sapply(getNodeSet(doc, path), function(el) xmlValue(el))
}

gGeoCode=function(str)
{
library(XML)
doc = xmlTreeParse(u, useInternal=TRUE)
str=gsub(' ','%20',str)
lat=getDocNodeVal(doc, "/GeocodeResponse/result/geometry/location/lat")
lng=getDocNodeVal(doc, "/GeocodeResponse/result/geometry/location/lng")
list(lat = lat, lng = lng)
}

gGeoCode("Malakoff, France")

Created by Pretty R at inside-R.org

There are limitations though: it’s free up to 2,500 requests per day and then you’re kicked out for 24 hours. Otherwise… you have to pay! See the terms here. Pretty convenient though!

EDIT: a more detailed post about Google GeoCoding, and the use of it on Missouri Sex Offender Registry: