This page gives some links to various websites on which you can find free datasets. We use some of these datasets ourselves, for testing methods and for teaching.
Organizations
French public datasets (French government, INSEE)
General
Comprehensive Knowledge Archiving Network
Infochimps (and this blog post on Infochimps API)
DataMarket (and this blog post about rdatamarket)
KDnuggets: large datasets for dataminig projects
Rdatasets: an archive of datasets distributed with R
Geodata
Country codes: package on CRAN
France communes polygons (IGN) and this post
Population of France communes since 1062
osmar-package : geographic elements of OpenStreetMap via its API.
London transport data for this kind of map with ggplot2
Social network
Unbiased samples of facebook users
Funny
Global Terrorism Database (OK it’s not funny)
Sports
Ipitos: general races, running, triathlon…
Blogs
Prediction competitions
Scientific collaborations
R packages for dealing with data
Stackexchange : Data APIs/feeds available as packages in R
This interesting blog post about various methods to get data directly from R
RGoogleTrends, RGoogleDocs and this blog post: How to use a Google Spreadsheet as data in R
New York Times: RNYTimes R package
Il y a une petite erreur dans le lien qui mène au site de l’INSEE : https://statisfaction.wordpress.com/datasets/www.insee.fr
Merci Jérémy, c’est corrigé!