HOWTO: do a PhD in stats, in France
Hey there,
Following questions from master students interested in doing a PhD in France, I thought I’d share some answers here. I’ll try to explain how the French universities work for PhD students (it’s a bit peculiar). It’s gonna get boring for any other reader, so if you don’t want to apply for a PhD in France, skip this post (and sorry). Basically three things are separated:
- the university to which the PhD student is affiliated. Every student has to be affiliated to an university (or more exactly to an “Ecole doctorale” = “doctoral school”). For instance CREST is a research centre, but there’s no doctoral school attached to it (yet), hence a PhD student at CREST has to have another affiliation. One more thing about doctoral schools: the PhD student is affiliated to the same doctoral school as the supervisor. So once you choose the supervisor, you don’t choose the doctoral school: you pick the same as his.
- where the money comes from: you have to get a funding, and the funding can come from various places: the university itself, the research labs (like CREST), some big private firms (e.g. EDF, the main French energy supplier, funds a lot of research on stats and economics), some private funding programs (for instance I’m funded by AXA Research), etc. There are many different institutions funding PhDs so at first, it’s hard to know where to search.
- where you actually work: that is, the place where you actually got an office (and possibly even a computer!). It could be in a third institution, in the university, in a research lab, etc. Not difficult to find when you have a funding, because at least the institution that agrees to fund you for a few years will also agree to give you a seat.
So for example I’m funded by AXA Research, my “doctoral school” is Université Paris-Dauphine (where my supervisor is affiliated) and my office is at CREST.
All this seems complicated but in practice it is not so difficult if you apply with the right professor, that is, the professor who knows the system and where to get the money from. So the hard part is to find the professor and to convince him that you would be worth his efforts. Obviously, when you contact a professor, it’s better to show that you’re motivated by having some understanding of the educational system, so that he / she doesn’t feel like he / she’s going to do all the administrative work alone. I can’t say much more about how you can convince a professor to take you for a PhD; that’d be for professors to answer (professors, if you read this…).
Some practical information: PhDs usually start during Fall, so the deadlines for applications are usually in Spring before (but it is not centralized, so some institutions put deadlines in January whereas others put them in June; the best is to contact professors quite early during your final year of master). It does not cost money to apply for a PhD. The PhDs funded by most universities and CREST are totally research-based: there is no classes to attend (you can attend courses if you want but you don’t have too), you can go to seminars (but you don’t have to either). The fundings are usually for three years, and they pay decently well: the CREST funding is around 1400€ per month I think, so you can live with that, even in Paris, though you won’t go to strip clubs every night with that kind of money (of course it’s pure speculation, I never went), and you would need to share a flat if you’d want to live in the city centre. You can earn additional income by giving classes (usually practical lessons for bachelor or master students), which can pay pretty well (up to 50€ per hour). Some funding contract will force you to teach, while some others will not. This is an important difference, both in terms of time and money. The best is obviously to have the possibility to teach without having to (CREST’s funding is in that case, at least for now).
Many students take a bit more than 3 years to finish their PhD, so there are usually possibilities to find an additional funding for the end of your PhD: be prepared to spend 4 years instead of 3. It’s rare that people would need more than 4 years, in statistics.
Finally, compared to other countries a PhD in France is really unrelated to a master so if you want to a PhD at university U, you don’t necessarily need a master from university U. As long as your master is research-oriented, you can apply for a PhD in the same field at any university.
Last and final on Richter’s painting
For a quick recap, Pierre and I supervised a team project at Ensae last year, on a statistical critique of the abstract painting 1024 Colours by painter Gerhard Richter. The four students, Clémence Bonniot, Anne Degrave, Guillaume Roussellet and Astrid Tricaud, did an outstanding job. Here is a selection of graphs and results they produced.
1. As a preliminary descriptive study, the following scatter plots come and complete the triangle plot.The R function , from the package of the same name, displays the pixels with their coordinates in the RGB cube. It shows that, as a joint law, the triplets are somehow concentrated along the black-white diagonal of the cube.

The same occurs when the points are projected on the sides of the cube. Here is a comparison with uniform simulations.

2. It is interesting to see what happens in other color representations. HSL and HSV are two cylindrical models, succintly described by this Wikimedia picture:

The points parameterized in these model were projected on the sides as well; here, the sides of the cylinder are to be seen as the circular top (or bottom), the lateral side, and the section of the cylinder by a half-plane from its axis. Its shows that some colors in the hue coordinate (rainbow-like colors) are missing, for instance green or purple.
For the HSL model,

and the HSV model.

The histograms complete this first analysis. For HSL,

and HSV.

3. The students built a few ad-hoc tests for uniformity, either following our perspective or on their own. They used a Kolmogorov-Smirnov test, a test, and some entropy based tests.
4. We eventually turned to testing for spatial autocorrelation. In other words, is the color of one cell related to the color of its neighbors (in which case you can predict the neighbors’ colors given one cell), or is it “non informative”? A graphical way to check this is to plot average level of a color coordinate of the neighbors of a pixel with respect to its own coordinate. Then to fit a simple regression on this cloud: if the slope of the regression line is non zero, then there is some correlation, of the sign of the slope. We tried various combinations of coordinates, and different radii for the neighborhood’s definition, with no significant correlation. A (so-called Moran) test quantified this assessment. Here is for example the plot of the average level of red of the eight closer neighbors of each pixel with respect to its level of red.
Le Kernel Smoothing avec rupture(s) sous SAS
Voici une petite macro SAS bien utile pour tout ceux qui souhaitent faire du Kernel Smoothing. En plus de cela, elle est adaptée au cas des variables qui présentent des ruptures (1 ou 2 max), bien connu des économètres qui font des regressions sur discontinuité.
Rappelons tout d’abord le principe du Kernel Smoothing (ou “lissage par noyau”, mais ça fait tellement moins classe). Vous avez 2 variables X et Y (continues ou discrètes) et vous souhaitez avoir une première idée de la forme de la relation qui les relient, sans imposer de forme fonctionnelle particulière. Ou comme diraient les économètres “une estimation non paramétrique” de Y=f(X) (oh yeah). Pour cela, on calcule sur des intervalles glissants de X la moyenne pondérée de Y. Le noyau permet simplement de pondérer plus fortement les points à proximité de la valeur de X à une itération donnée.
Si vous n’avez rien compris, regardez plutôt ce petit dessin tiré de Wikipédia:

Ou allez directement voir sur Wikipédia!
Lorsque votre relation présente une (ou deux) discontinuité(s), le lissage aura tendance à masquer la realité de cette rupture. Pour vous donner un exemple, supposons que vous étudiez le taux de formation (Y) en fonction de la taille d’une entreprise (en nombre de salariés, X). Vous savez qu’en général, plus il y a de salariés, plus l’entreprise aura tendance à former (économies d’échelle, facilité à remplacer les salariés absents etc…). Mais vous savez également que les syndicats ont tendance à pousser les entreprises à former. Or, aux seuils de 20 et 50 salairés, les entreprises ont des obligations en matière d’élections de représentants du personnel et de comité d’entreprise. Vous vous attendez donc à ce que le taux de formation à ces seuils “bondissent” de manière discontinue.
Dans ce cas, un lissage classique vous donnerait quelque chose comme (données réelles):

En fait, si les discontinuités à X=50 et X=20 sont perceptibles, elles tendent à être lissées car on calcule la moyenne des taux de formation (Y) en mélangeant les points au-dessus et au-dessous des discontinuités. Pour résoudre le problème, on fait glisser le noyau en le stoppant autour du ou des seuil(s). De cette manière, par exemple, à X=49.5 on ne considère dans le calcul de la moyenne que les observations vérifiant X<50 (et vice versa).

Dans ce cas, on obtient le joli graphe suivant:

Le programme se présente sous forme d’une petite macro SAS qu’il faut paramétrer. Une notice est également fournie.
A télécharger: la macro Kernel Smooth et sa notice.
Power of running world records
Following a few entries on sports here and there, I was wondering what kind of law follow the running records with respect to the distance. The data are available on Wikipedia, or here for a tidied version. It collects 18 distances, from 100 meters to 100 kilometers. A log-log scale is in order:

It is nice to find a clear power law: the relation between the logarithms of time T and of distance D is linear. Its slope (given by the lm function) defines the power in the following relation:
Another type of race consists in running backwards (or retrorunning). The linear link is similar

with a slightly larger power
So it gets harder to run longer distances backwards than forwards…
It would be interesting to compare the powers for other sports like swimming and cycling.


12 comments