However, case counts per country are not very reliable, given that countries have very different policies regarding testing and so on; see e.g. Nate Silver’s opinion on case counts here.
You would think that death counts are far more reliable. In France, however,
Santé Publique France got criticized for reporting only COVID deaths that occurred in hospitals. Very recently, they started to include also deaths that occurred in retirement homes. However, they do so only at the national level (current count as of April 12th: 13832; 66% from hospitals). At a finer level (i.e. “régions” or “départements”), the data they provide (here) remains restricted to hospitals.
INSEE (French institute of official Statistics) decided to publish at the same time daily death counts at the département level. Note that INSEE is not a public health institute; the death counts they report are for all deaths, whatever the cause. See also this authoritative post (in French) explaining the challenges behind death counts reporting. In case, you wonder, a “département” is a regional unit (we have about 100 of those), see this wikipedia article.
I decided to compare both datasets using a very, very simple methodology. First, I merged both datasets, so as to obtain, for each département, and each day with a certain period:
- the number of covid deaths reported in a hospital, call it (where is the département, is the day);
- the total number of deaths (whatever the cause) on the same day , in département ;
- the total number of death , , on the same day, respectively one year ago (in 2019), and two years ago (in 2018).
SPF data starts on the 18 of March, and INSEE publishes its data every Friday with a one week delay, so my merged dataset currently covers the period: 18 to 30 of March (13 days). And we have about 90 départements in the dataset; the sample size is 1200.
The model I have in mind is simply: .
The first term is a basic predictor of 2020 counts, in case they were no pandemic. It is pretty basic, but counts deaths are quite stable over the years. Granted, there is some variation in winter, due to the flu, but this seems to affect mostly February. For the record, here is a plot of the daily number of deaths in France in 2018, 2019 and 2020, for the period covered by the data:
The coefficient of course measure under-reporting.
Thus, I fitted a linear regression model to predict 2020 deaths as a function of the 2018 and 2019 deaths, and the CH deaths (no intercept). Here are the results:
Look in particular at the estimate of : 1.596 (95% confidence interval: [1.51, 1.68] ). In other words, on average, one should add something between 50% and 70% to the reported number of covid deaths in hospitals to get an estimate of all covid deaths.
I tried other models; for instance by forcing the coefficients of the two years to be exactly equal to one half (how to do this is left as a simple exercise!). I got similar results. I’d like to repeat the analysis on weekly aggregated data. We don’t have yet two full weeks of data, so it’s too early for that. The usual caveats regarding linear regression apply; e.g. there should be some heteroscedasticity, given that the size of département vary significantly.
I will update these results as I get more data. I find it interesting that merging these two datasets already gives results that are reasonable and easy to interpret. In particular, I got similar results using only the first six days that were available one week ago. The secret here is we compensate the small number of days by a large number of “départements”.
I am not an expert on public health data, so I do not want to comment on why SPF reports only hospital data; I guess it is much harder to determine that a death is covid-related outside of a hospital, but again I am out of my depth here.
On the other hand, I think it is commendable that INSEE decided to make their own reporting. Of course, both institutions report different things. But the fact that we are able to compare and combine two sources of data potentially gives a clearer picture.
Comments more welcome. I would be curious in particular to know whether other countries provide this kind of double reporting.