Statisfaction

Power-laws: choose your x and y variables carefully

Posted in R, Sport by Julyan Arbel on 16 November 2011

This is a follow-up of the post Power of running world records

As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a [now] well-known [to me] sports result provided in a 2000 Nature paper by Sandra Savaglio and Vincenzo Carbone (thanks Ken): the dependence between time and distance in log-log scale is not linear on the whole range of races, but piecewise linear. There is one break-point around time 2’33’’ (or equivalently distance around 1100 m). As mentioned in the article, this threshold corresponds to a physiological critical change in the athlete’s energy expenditure: in short races (less than 1000 m) the effort follows an anaerobic metabolism, whereas it switches to aerobic metabolism for middle and long distances (or longer…). Interestingly, the energy is more efficiently consumed in the second regime than in the first: the decay in speed slows down for endurance races.

The reason of this graphical/visual difference is simple. Denote distance, time and speed by D, T and S. I have plotted the log T~ log D relation, which gave T\propto D^{\alpha} with \alpha=1.11. When using the speed S as one of the variables, the relations are S\propto D^{\gamma} and S\propto T^{\beta} with \gamma=1-\alpha and \beta=\frac{1}{\alpha}-1\approx 1-\alpha to the first order because \alpha is close to 1. With Nature paper findings (with the opposite sign convention), the two \betas are \beta_{\text{an}}=-0.165 (anaerobic) and \beta_{\text{ac}}=-0.072 (aerobic), ie \alpha_{\text{an}}=1.20 and \alpha_{\text{ac}}=1.08. My improper \alpha=1.11 is indeed in between. The slope ratio is much larger (larger than 2) on a plot involving the speed, clearly showing the two regimes, than on my original plot (a few 10%), which is the reason why it appear almost linear (although afterthought, and with good goggles, two lines might have been detected).

Below is the S ~ log D relation (click to enlarge) on which it appears clearly that 100 m and 100 km races are two outliers. It takes time to erase the loss of time due to the start of the race (100 m and 200 m are run at the same speed…), whereas the 100 km suffers from a lack of interest among athletes.Achim Zeileis also provides an extended world records table and R code in his comment.

As an aside, Andrew and Cosma Shalizi also comment and resolve an ambiguity of mine: one usually speaks about power-laws without much precision of context, but there are mainly two separate sets of power-law models. Either power-law regressions, where you plot y~x for two different variables (this is the case here); or power-law distributions, ie the probability distribution of a single variable x is p(x)\propto x^{-a}, or extensions of that (with lots of natural examples, ranging from the size of cities to the number of deaths in attacks in wars).

About these ads

One Response

Subscribe to comments with RSS.

  1. Cosma Shalizi said, on 17 November 2011 at 02:27

    About that “lot of natural examples“…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: