Follow-up on my previous post on covid deaths in France

In my previous post, I compared two sources of data regarding death counts (one from SPF, Santé Publique France, for covid deaths in hospitals; one from INSEE, for all-cause deaths), in order to get a better idea of the actual death toll of covid-19 in France.

In this post, I would like to do the following:

  1. describe briefly a new, richer data-set recently published by INSEE (and do some graphs);
  2. use the updated data (from both sources) to repeat my analysis, with some variants (weekly aggregates, separating men and women);
  3. reply to a few comments I got on LinkedIn and elsewhere;
  4. provide a few pointers regarding death counts in other countries (particularly the UK).

New INSEE data

INSEE now provides every Friday an exhaustive data-set that records, for each death that has occurred since 01-01-2018, the following variables: date of birth, date of death, sex, département of death, and so on. Neat. Let’s take this opportunity to do a few plots, such as this one:

Number of deaths in France each day in 2018, 2019, 2020, until 13th Apr 2020 (as recored in INSEE dataset described above)

(it’s nice to observe this sharp drop) or that one:

Death counts during weeks 13, 14 and 15 per age, for years 2018-20 (same INSEE dataset

The latter plot covers the same period (weeks 13 to 15, 23rd March to 12th April) as in the analysis below. As expected, over-mortality seems to affect mostly people above 60.

Updated analysis

Ok, now let’s repeat my previous analysis, based on merging the SPF data (daily covid death counts in hospitals, in each département and each sex) and the aforementioned INSEE data (all-cause deaths). Except this time:

  1. The overlap between the two datasets now covers more than three weeks (18th March, first date in SPF dataset, to 12th April, latest date in INSEE dataset) so I decided to consider weekly aggregates, for two reasons: they are more stable than daily aggregates, and less affected by artifacts such as delays (e.g. a death occurring during a week-end is reported to the next Monday).
  2. I also separated men and women.
  3. I am going to simplify a bit the model, and simply regress excess deaths (number of deaths in 2020 minus the average over 2018 and 2019) on hospital deaths.

First, a joint plot:

So, to recap, each point in this plot corresponds to a pair of death counts, for each département in France, each week between 13 and 15, and for each sex. The corresponding linear regression (without an intercept) gives a slope estimate of 1.79 (95% confidence interval: [1.73, 1.85]). The basic interpretation would be: in each département, when 100 covid deaths occur in hospitals, the number of covid-related (see below) deaths should be approximately 179. The current total number of covid deaths reported by SPF is 22, 614, which is 60% above that the number of covid deaths in hospitals (14050). So this estimate suggests the actual death toll might be a tad larger. More about the interpretation below.

Now for something more interesting: let’s redo the previous plot, but with a different colour for each sex:

Clearly the two linear trends are different; see below the OLS estimates.

sexslope estimateslope 95% confidence intervalR^2
F2.40[2.30, 2.50]89%
M1.56[1.50, 1.62]90%

What is going on? Well, women tend to live longer than men. And the proportion of women in EHPADS (French retirement homes) is 74%. Since the main reason behind the discrepancy between hospital deaths and excess deaths is covid death occurring in pension homes, these results make sense.

Reply to previous comments

What’s the point?

Fair enough, since 4th April, SPF does include in its total estimate both hospital deaths and pension home deaths, and the proportion of the latter is not too far from my estimate. Note that however that:

  • it’s really hard to estimate properly the number of covid deaths occurring in pension homes. Apparently several pension homes did not provide any data, while others marked as “covid” all the deaths that have occurred after the first covid deaths.
  • My estimate might measure other direct or indirect effects of the pandemic, such as people dying at home, people not receiving proper care because the health system is at capacity and so on.
  • The fact that data from two different institutions may be compared, and seem to be somehow consistent, is, in my opinion, a good piece of news which deserves to be reported!

Accounting for car accidents

Boy, that one was popular. Please have a look a the plot on the front page of ONIRS (click on “tués”): yes, the number car-related deaths dropped sharply thanks to the lock-down… But in March of last year, this number was around 250, that is, 1% of the current covid death count. “Fun” fact: this point would have been quite relevant in the 70s! In those years, the number of car-related deaths was about five times larger (18 034 deaths in 1972).

Better predictive model

The idea of comparing the 2020 deaths to the average of the two previous years is a bit crude, and demographers have better models to predict death counts based on age repartition and so on. That said, the notion of “excess deaths” seems quite popular in various countries, as I explain below, so I guess that my approach is not so daft after all.

In other countries

To be honest, I was hoping to apply the same approach to the UK, a country where the official estimate is still limited to hospital deaths, and thus clearly quite biased; see e.g. this Guardian paper. Sadly, Public Health England only reports daily hospital death counts … per nation (nation=England, Scotland, Wales, or Northern Ireland). On the other hand, the Office of National Statistics reports every week the number of “excess deaths” (relative to the five year average), and the proportion of these deaths where the word “covid” is mentioned on the death certificate.

Interestingly, the Guardian paper I mentioned above first complains that the UK only reports hospital deaths, and then claims erroneously that the UK is still behind France in terms of covid mortality. It’s not, if you compare in terms of hospital deaths (UK: 20,319 on Saturday, while France: 14,050). The fact that even journalists reporting on this issue may get it wrong seems indicative of how confusing are covid death data.

More generally, my impression is that looking at “excess deaths” makes far more sense for most countries at the moment: it’s easier to measure (albeit with a delay of course), and easier to interpret. This is also more or less the point made by this NYT paper. (Notice how their plot for France only covers January to April; for the complete plot, see my first plot above!).

Published by Nicolas Chopin

Professor of Statistics at ENSAE, IPP

5 thoughts on “Follow-up on my previous post on covid deaths in France

  1. Thanks for these great posts !

    Did you try with an intercept? It’s entirely possible that the non-covid mortality is lower than in previous years, not only because of the small car accidents effects, but mostly because of less circulatory disease deaths, a consequence of the diminution of work hours that has been observed in various economic recessions. Visually, it’s difficult to guess on your graph is such an intercept would be significantly negative.

    On the same token, you distinguish men and women excess mortality, is it possible with age and place of death categories ?

    1. Hi,
      * intercept: not statistically significant.
      * age: possible at “région” level, but not at “département” level, unfortunately. (The “département” aggregates provided by SpF are for all ages; but “région” aggregates for each age class are also available).
      * death category: not sure what you mean? in the INSEE data-set, there is a “location” variable (which may be set to “hospital”, “home”, “pension home”, etc), but there is a lot of missing values unfortunately.

      1. Thanks for the answers ! Yes, I meant the location variable. It seems difficult to use, but might flesh out some of the story : if excess death at home is not really correlated to covid death, that could point to a “fear of hospitals” effect, for instance.

        1. Believe me, I tried… 🙂 But there is too much missingness for this variable to get a clear picture.

  2. Dear Nicholas,
    just a quick comment: the link to the first dataset is not working anymore. Just to let you know…

Leave a comment