Datasets

This page gives some links to various websites on which you can find free datasets. We use some of these datasets ourselves, for testing methods and for teaching.

Organizations

United Nations

World Bank

French public datasets (French government, INSEE)

Google Public Data Explore

General

Comprehensive Knowledge Archiving Network

Infochimps (and this blog post on Infochimps API)

DataMarket (and this blog post about rdatamarket)

datamob.org

KDnuggets: large datasets for dataminig projects

Amazon’s cloud

BuzzData

reddit datasets

Rdatasets: an archive of datasets distributed with R

Geodata

Data and Maps at GeoCommons

COW: Correlates Of Wars

Country codes: package on CRAN

Country files

France communes polygons (IGN) and this post

Population of France communes since 1062

osmar-package : geographic elements of OpenStreetMap via its API.

London transport data for this kind of map with ggplot2

Social network

twitteR: Twitter API within R

Unbiased samples of facebook users

Funny

Eurovision

priceofweed

Global Terrorism Database (OK it’s not funny)

Sports

Football

Tennis

All time athletics

Datasport.com

Ipitos: general races, running, triathlon…

Orienteering

Blogs

Guardian

Information is Beautiful

Open Knowledge Foundation

Washington Post

Prediction competitions

Kaggle

Scientific collaborations

DBLP Bibliography

R packages for dealing with data

Stackexchange : Data APIs/feeds available as packages in R

This interesting blog post about various methods to get data directly from R

And this one too

RGoogleTrends, RGoogleDocs and this blog post:  How to use a Google Spreadsheet as data in R

New York Times: RNYTimes R package

4 thoughts on “Datasets

Leave a comment