Statisfaction

Finding datasets on the Internet

Posted in Dataset by Pierre Jacob on 10 September 2010

For teaching purposes it can be cool to have small free datasets on various topics, like music, movies, anything. So I browsed for general purpose dataset websites.

The first website I’ve found is the Comprehensive Knowledge Archiving Network. The people behind it are the Open Knowledge Foundation, which has a blog (a good way to stay aware of the “power of open data”, to quote them). I like! I also came into Infochimps, which is a data market, so some datasets are free and some are not. Still looks like a good source of data, a lot of the datasets are free, and under various licenses, including free licenses like Creative Commons.

With these two sites you can find funny things, like the list of tweets with GOOOAL spelled with a ridiculous number of O’s during the 2010 World Cup, and I’m sure, other very useful datasets.

Apart from these websites, Robin showed me this blog from the Guardian website, the “Datablog”. It’s not actually only data, there are also visualizations and comments. There are always links to online spreadsheets. That’s the spirit! I wish all articles presenting graphs and tables would give links to the corresponding spreadsheets. By the way some of the visuals of the Datablog actually come from Information is Beautiful, which is an amazing blog, even for non-statisticians.

There are of course plenty of other database websites, but those are the one I would go to for a first quick search. If you’re aware of other good resources in the same spirit (that is, free and open, and dealing with various topics), I’d be glad to have the links in the comment. Actually I think there would be room for a “Funny Datasets Archiving Network”, or something like that, with only funny and stupid datasets for teaching purposes.

6 Responses

Subscribe to comments with RSS.

  1. arthur said, on 13 September 2010 at 03:16

    I like that idea of “Funny Datasets Archiving Network”… I have severall datasets to upload there. For instance, in one of my lectures, students are usually interested when I introduce temporal issues in econometric models with playboy’s playmate.
    Nice blog by the way

    • pierrejacob said, on 13 September 2010 at 05:44

      Yeah it would actually be great to have a website like that. I’ll see with the others, maybe we can host it here, as one of the “pages”? It’d be a start.

      Thanks for your support!

  2. Amanda said, on 15 September 2010 at 08:52

    This is extremely helpful! Thanks so much. I am also interested in stats, and did not about most of these sites, so it will be very helpful for the future.

    It’s also good that you emphasized showing that these are free and open, because I think a lot of people starting out, including myself, wouldn’t always think of that at first. I also really like the idea of funny datasets. I was very surprised at how difficult those are to find!

    I am actually working a statistics blog based on fun/random data collected. So far I have data on the number of licks to get to the tootsie-roll center of a tootsie-pop. I will be getting more soon though, don’t worry. The link is akesting.personal.asu.edu/wordpress if you were interested.

    Other ideas for fun data to collect would be much appreciated as well!

    • pierrejacob said, on 15 September 2010 at 09:40

      Hey Amanda,

      Thanks for your interest. I went on your blog, it was fun, keep it up! I was familiar with the Monty Hall problem but certainly not on tootsie-pop licking analysis.

      Allow me to put a comment here: you mention that 13 people have “collected the data”, and you could probably have some info on them: their gender, heights, age, etc, but you don’t use them in your study. Obviously data on their tongue lengths would be really useful here but not easy to collect. So the consistency of your results is kind of hard to assess: how do I know if I’d get the same result, for instance? Maybe all the 13 people were little girls, or all were older than 30, etc.

      So, thanks for the link!

  3. Mario said, on 24 February 2012 at 12:08

    I just googled “funny data sets” and came right to this blog. Thank you very much.

  4. jpatel3 said, on 31 July 2014 at 23:46

    Have you checked https://tuvalabs.com/datasets/ for datasets and activities around datasets?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: