Finding datasets on the Internet
For teaching purposes it can be cool to have small free datasets on various topics, like music, movies, anything. So I browsed for general purpose dataset websites.
The first website I’ve found is the Comprehensive Knowledge Archiving Network. The people behind it are the Open Knowledge Foundation, which has a blog (a good way to stay aware of the “power of open data”, to quote them). I like! I also came into Infochimps, which is a data market, so some datasets are free and some are not. Still looks like a good source of data, a lot of the datasets are free, and under various licenses, including free licenses like Creative Commons.
With these two sites you can find funny things, like the list of tweets with GOOOAL spelled with a ridiculous number of O’s during the 2010 World Cup, and I’m sure, other very useful datasets.
Apart from these websites, Robin showed me this blog from the Guardian website, the “Datablog”. It’s not actually only data, there are also visualizations and comments. There are always links to online spreadsheets. That’s the spirit! I wish all articles presenting graphs and tables would give links to the corresponding spreadsheets. By the way some of the visuals of the Datablog actually come from Information is Beautiful, which is an amazing blog, even for non-statisticians.
There are of course plenty of other database websites, but those are the one I would go to for a first quick search. If you’re aware of other good resources in the same spirit (that is, free and open, and dealing with various topics), I’d be glad to have the links in the comment. Actually I think there would be room for a “Funny Datasets Archiving Network”, or something like that, with only funny and stupid datasets for teaching purposes.