Cloud services for statisticians, part II

Posted in Geek by Pierre Jacob on 26 January 2011

On July 21st 2010, I blogged about cloud services that could be useful for stat’ researchers. Among a few services I mentioned Dropbox, a free sync software from a startup that is apparently doing quite well, and saved my life two weeks ago when my laptop died.

I also mentioned, a cloud service to perform statistical and data analysis online. It wasn’t free but it allowed to do statistical computations on a remote server, store the results, access them from anywhere, etc. This website is dead now, so I guess they’re either working on it before opening it again, or they didn’t make enough money to survive… sorry for them! Their idea was neat and probably someone will be able to make money out of it.

In fact this someone might already exist. Amazon proposes a cloud computing service called Amazon Elastic Computer Cloud (Amazon EC2). For this cloud computing service you can pay either by the hour or take a year plan. Once you’ve registered, you seem to be allowed to remotely start the operating system that you want with already installed scientific software. For instance, this guy explains how to setup R on Amazon EC2.

This service seems to be particularly easy to use with StarDev Cluster. It’s a free program made with python, to which you give your Amazon credentials and that sets up a cluster on Amazon EC2, running Ubuntu, python, numpy and the appropriate packages to perform parallel computation (namely openMPI). Since the fee by the hour starts at 0.17$ for a “medium – High-CPU instance”, I’m tempted to give it a try.

Update: following a comment, let’s make a little advertisement for Project Dirigible, another cloud service that could be useful for statisticians. It looks like an online spreadsheet like Google Docs, only you can use python / numpy programs to compute the values in the cells, using Amazon servers. So suppose you want to do a cool Java applet to show how well your method works but the method involves a lot of computations, that looks like the way to go. Check out their introductory video.  Thanks Giles, very interesting!


  1. Giles Thomas said, on 27 January 2011 at 14:39

    I hope this doesn’t sound spammy, but have you taken a look at Project Dirigible? We’re calling it a “programmable cloud spreadsheet” — it’s basically a mashup of a spreadsheet and a Python programming environment, and lets you build networks of interconnected spreadsheet models that can run in parallel across large numbers of computers (on EC2, of course). We’ve only recently started the beta, but there are a few statisticians trying it out already, and we’re really keen to get more — in particular, people who’d be interested in trying out interconnection with R, which is something we aim to have working soon.

  2. Pierre Jacob said, on 27 January 2011 at 21:41

    Cheers Giles, I’ve updated the post. Your project looks very interesting, I’ll try to spread the word.

