Statisfaction

Triathlon data with ggplot2

Posted in Dataset, Sport by Julyan Arbel on 11 October 2011

As Jérôme and I like so much to play with triathlon data, it is a pleasure to see that we are not alone. Christophe Ladroue, from the university of Bristol, wrote this post yesterday: An exercise in plyr and ggplot2 using triathlon results, followed by part II, way better than ours, here and here. For example, the time distributions by age, “faceted” by discipline (swim, cycle, run and total), look like this

As the number of participants to the Stratford triathlon (400 or so) is a bit small for this number of age categories, it would be nice to compare with the Paris triathlon results (about 4000).

Here is the rank for the 3 disciplines and for the total time, “colored” by the final quartile (check the full part II post for colors by quartile in the 3 disciplines):

We see that the rank at the swim lag is not much informative for the final performance, all 4 colors being pretty mixed at that stage, and that it is tidied by the cycle lag. It is the longer one, and as such, the more predictive for the final perf. It is nice to see that some of the poor swimmers  finally reach the first quartile (in orange). Check those ones whit sawtooth patterns: first quartile at swimming, last cycling, first running, and last at the end!

An interesting thing to do with that kind of sports databases would be to build panel data. As most race websites provide historical data with the participants names and age, identification is possible. It is the case for Ipitos, or for Paris 20 km race, with data from 2004 to 2010 (and soon 2011). Remains to check if enough people compete in all the races in a row, my guess is that the answer is yes. The next steps would be to study the impact of the age on the progress, and on the way ones manages the effort from the beginning to the end of the race (thanks to intermediate times in running races, or discipline times in triathlon). Well, maybe in a later post.

About these ads
Tagged with: , ,

One Response

Subscribe to comments with RSS.

  1. Random art on the web « Statisfaction said, on 15 October 2011 at 21:42

    [...] and Christophe Ladroue’s representation of individual rankings across lags in a triathlon: [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 53 other followers

%d bloggers like this: