Triathlon in three colors

Posted in Dataset, Sport by Julyan Arbel on 23 November 2010

With Jérôme Lê we are planning to swim/bike/run Paris triathlon next July. Before begining the trainning, we want to know where to concentrate efforts. Let us look at some data.

The race distance is known as Intermediate, or Standard, or Olympic distance, with 1.5 km swim, 40 km ride and 10 km run. Data for 2010 Open race (ie not the Elite race) can be found on a site of running races results called Ipitos, after free registration. It consist in 1412 finisher times, for the three parts of the race. Gender is available. Histograms normalized as probabilities are as follows, for time in minutes:



Times for swimming are shorter than the two other parts (resp. 30, 70 and 50 minutes in average). The larger standard deviation is for cycling (resp. 4, 8 and 7 minutes). So larger differences in time are done in this part of the race.

It appears that the skew is positive for the three parts of the race: it sounds usual for that kind of event. It is open to everyone, and most of newcomers enlarge the bulk of the right tail. The cycling histogram is the most skewed (resp. .5, 1.3 and .9). We can see that with boxplots and density estimates. These are done with centered data:

As expected, no outlier is found on the left of the distributions: this is the “no-superman” effect. On the contrary, the otherside of the box outliers are overcrowded, the “nowcomer” effect.

As an aside I have plotted the normalized 3 dimensional data in a square array, with squares of a color defined by data in the RGB model. Sampling 1024 of the 1412 finishers, this provides this (pointless) Richter-like plot:


The following triangle is obtained as in this post:

The fact that the points cloud is on the left illustrates the massive skewness of cycling. The few points outside the cloud correspond to poor performers in the corresponding sport, with swimming at the bottom left, cycling at the bottom right, and running at the top. For example, for the three light green points, loosy bikers, but rather good at swimming and running.


16 Responses

Subscribe to comments with RSS.

  1. pierrejacob said, on 23 November 2010 at 19:40

    Man, you just invented the Richter plot!! Amazing! Totally pointless as you said, but I love it. I’ll Richter-plot everything from now on.

  2. xi'an said, on 25 November 2010 at 07:54

    Duh?! Congrats for trying a triathlon (as I would never ever do, as my swimming capacities are those of an anvil and I hate fighting the wind when biking), but I do not fully agree with your conclusions: to me the strong skewness in the biking time means a kind of wall below which no-one gets, while the small left-hand tail for running means some room for improvement… I also love the triangle plot (but would not advise trying to sell the Richter plot!!!)

    • Julyan said, on 25 November 2010 at 12:46

      Hey, I should try the swimming training hooked to an anvil… OK for the interpretation between cycling and biking histograms, still I do not fully explain the difference in left tails. As you say, a few runners outperform the rest of the competitors, while the gap between them shortens on the bike.

      • Darren said, on 25 November 2010 at 13:26

        In the picture the swim course looks like it is the Seine?? If it is you are very brave and I hope that you will be in good health afterwards … 🙂
        From my understanding of triathlon tactics the idea is to be conservative on the bike in order not to die in the run … Maybe this could help to explain some of the sharp skewness (wall) for the cycling. I don’t know whether this tactic is pursued by newcomers though …
        I like the plots! 🙂

      • Julyan said, on 25 November 2010 at 18:19

        Yes, the Seine… do you know the french adjective “saine” Darren? it means clean, healthy!! As a friend told me, the VIP registration includes the ringworm option!
        You are right: the top bikers follow a very precise timing plan that they will not exceed of a single seconde. On the contrary they end up with the run with no other plan than crossing the finishing line first. Another point is that the times are not independent, espacially for cycling where biking by packs is common…

      • Darren said, on 26 November 2010 at 14:41

        I think ringworm would be just one of the many “presents” that you would receive after finishing the swim leg. 🙂 In this triathlon are you allowed to “draft” (ride directly behind someone) on the bike leg? In professional races (or at least for the section for the pro’s) often it is not allowed.

      • Julyan said, on 26 November 2010 at 18:21

        Yes Darren, I was told you are allowed to draft. Actually, the Elite race is part of a team tournament along the year, all around in France. And teams are allowed to ride by pack, for maximizing the team results. Here, data comes from the Open race, without such team strategies, but still with no restriction on drafting.

  3. xi'an said, on 26 November 2010 at 07:54

    Biking by packs: I was wondering about this as it makes an enormous difference! I once (and only once!) did 38k in less than one hour by simply standing right behind my almost-professional-cyclist brother-in-law… So that could explain very nicely the sharp wall on the left. Nothing of the kind in running nor swimming of course. (Even though having a coach on a bike next to you is also a huge help in half-marathons.)

  4. Firas said, on 29 November 2010 at 22:40

    My interpretation of the skewness of the cycling histogram is due to
    physics and physiology.

    It seems reasonable to assume that the distribution of the power
    output, i.e. mechanical work that can be done by a cyclist per unit
    time, measured in Watts, and a function of training, and genetics,
    follows a much more symmetric distribution- close in shape to what we
    see for the running times. Indeed for the sport of running, the
    relation between speed and mechanical power is almost perfectly
    linear, so the running speed distribution directly reflects the
    Wattages that the triathletes were capable of during that leg.

    For cycling, at least on a flat course, things are more complicated:
    the force of air resistance faced by a cyclist increases with the
    *square* of the speed; alternatively, the power output required to
    maintain a velocity increases quadratically in V. This may account
    for the wall- just a change of variable on the power distribution.

    Not sure about the rules of this triathlon, but in North American
    amateur tri, drafting, i.e. riding in packs, is typically prohibited.

    Hope this wasn’t totally off!

  5. Graph gallery in R « Statisfaction said, on 6 January 2011 at 15:34

    […] To finish with, and because I’m keen on ternary plots: […]

  6. Andrew said, on 11 April 2011 at 02:30

    Vraiment impressionnant. This is great, Julyan! It visually proves a concept that many beginner triathletes may not be aware of. I have a question: in this model, is it possible to exclude from the data athletes who punctured or crashed in the bike leg (crevaison, chute, accident)?

    The green points that you describe as “lousy bikers”, it is difficult to explain how a racer can do very poorly on the bike but then well on the run. More likely that they had a delay from mechanical problem or crash.

  7. Julyan Arbel said, on 15 April 2011 at 02:06

    Thank you Andrew. Unfortunately the data were collected by chips, and do not record any mishap or accident. You are right, that would correspond to the green outliers. The upper (blue) part, for the run leg, is more diffuse. But this is OK because some people end up the bike leg exhausted.

  8. […] an appetizer for Paris triathlon, Jérôme and I ran as a team last week end an adventure racing in Champagne region (it mainly […]

  9. […] Juju le mentionne dans son post précédent, les résultats détaillés de ce genre de compétitions sont disponibles et téléchargeables sur […]

  10. […] few entries on sports here and there, I was wondering what kind of law follow the running records with respect to the […]

  11. […] post yesterday: An exercise in plyr and ggplot2 using triathlon results, way better than ours, here and here. For example, the time distributions by age, “faceted” by discipline (swim, […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: