Hi all,

With Leah South from QUT we are organizing an online workshop on the topic of “Measuring the quality of MCMC output”. The event website is here with more info:

https://bayescomp-isba.github.io/measuringquality.html

This is part of ISBA BayesComp section’s efforts to organize activities while waiting for the next “big” in-person meeting, hopefully in 2023. The event benefits from the generous support of QUT Centre for Data Science. The event’s website will be regularly updated between now and the event in October 2021, with three live sessions:

- 11am-2pm UTC on Wednesday 6th October,
- 1pm-4pm UTC on Thursday 14th October,
- 3pm-6pm UTC on Friday 22nd October.

Registration is free but compulsory (form here) as we want to make sure the live sessions remain convivial and focused; hence the rather specific theme, but it’s an exciting topic with lots of very much open questions, which we hope will attract both practitioners and methodologists. Meanwhile some material will be available on the website to everyone, including video recordings of presentations, and posters, so that the workshop hopefully benefits the wider community.

If you have suggestions for this event, or would like to organize a similar event in the future, on another “BayesComp” topic, do not hesitate to get in touch. Our contact details are on the workshop’s website.

]]>This post is about estimating the parameter of a Bernoulli distribution from observations, in the “Dempster” or “Dempster–Shafer” way, which is a generalization of Bayesian inference. I’ll recall what this approach is about, and describe a Gibbs sampler to perform the computation. Intriguingly the associated Markov chain happens to be equivalent to the so-called “donkey walk” (not this one), as pointed out by Guanyang Wang and Persi Diaconis.

Denote the observations, or “coin flips”, by . The model stipulates that , where are independent Uniform(0,1) variables, and is the parameter to be estimated. That is, if some uniform lands below , which indeed occurs with probability , otherwise . We’ll call the uniform variables “auxiliary”, and denote by the counts of “0” and “1”, with .

In a Bayesian approach, we would specify a prior distribution on the parameter; for example a Beta prior would lead to a Beta posterior on . The auxiliary variables would play no role; apart perhaps in Approximate Bayesian Computation. In Dempster’s approach, we can avoid the specification of a prior, and instead, and “transfer” the randomness from the auxiliary variables to a distribution of subsets of parameters; see ref [1] below. Let’s see how this works.

Given observations , there are auxiliary variables that are compatible with the observations, in the sense that there exists some such that . And there are other configurations of that are not compatible. If we denote by the indices corresponding to an observed , and likewise for , we can see that there exists some “feasible” only when . In that case the feasible are in the interval . The following diagram illustrates this with .

How do we obtain the distribution of these sets , under the Uniform distribution of and conditioning on ? We could draw uniforms, sorted in increasing order, and report the interval between the -th and the -th values (Section 4 in [1]). But that would be no fun, so let us consider a Gibbs sampler instead (taken from [4]). We will sample the auxiliary variables uniformly, conditional upon , and we will proceed by sampling the variables indexed by given the variables indexed by , and vice versa. The joint distribution of all the variables has density proportional to

From this joint density we can work out the conditionals. We can then express the Gibbs updates in terms of the endpoints of the interval . Specifically, writing the endpoints at iteration as , the Gibbs sampler is equivalent to:

- Sampling .
- Sampling .

This is exactly the model of Buridan’s donkey in refs [2,3] below. The idea is that the donkey, being both hungry and thirsty but not being able to choose between the water and the hay, takes a step in either direction alternatively.

The donkey walk has been generalized to higher dimensions in [3], and in a sense our Gibbs sampler in [4] is also a generalization to higher dimensions… it’s not clear whether these two generalizations are the same or not. So I’ll leave that discussion for another day.

A few remarks to wrap up.

- It’s a feature of Dempster’s approach that it yields random subsets of parameters rather than singletons as standard Bayesian analysis. Dempster’s approach is a generalization of Bayes: if we specify a standard prior and apply “Dempster’s rule of combination” we retrieve standard Bayes.
- What do we do with these random intervals , once we obtain them? We can compute the proportion of them that intersects/is contained in a set of interest, for example the set , and these proportions are transformed into measures of agreement, disagreement or indeterminacy regarding the set of interest, as opposed to posterior probabilities in standard Bayes.
- Dempster’s estimates depend on the choice of sampling mechanism and associated auxiliary variables, which is topic of many discussions in that literature.
- In a previous post I described an equivalence between the sampling mechanism considered in [1] when there are more than two categories, and the Gumbel-max trick… it seems that the Dempster’s approach has various intriguing connections.

**References**:

- [1] Arthur P. Dempster, New Methods for Reasoning Towards Posterior Distributions Based on Smple Data, 1966. [link]
- [2] Jordan Stoyanov & Christo Pirinsky, Random motions, classes of ergodic Markov chains and beta distributions, 2000. [link]
- [3] Gérard Letac, Donkey walk and Dirichlet distributions, 2002. [link]
- [4] Pierre E Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P. Dempster, A Gibbs sampler for a class of random convex polytopes, 2021. [link]

This module implements various variance estimators that may be computed from a single run of an SMC algorithm, à la Chan and Lai (2013) and Lee and Whiteley (2018). For more details, see this notebook.

This module makes it easier to load the datasets included in the module. Here is a quick example:

from particles import datasets as dts

dataset = dts.Pima()

help(dataset) # basic info on dataset

help(dataset.preprocess) # how data was pre-processed

data = dataset.data # typically a numpy array

The library makes it possible to run several SMC algorithms in parallel, using the multiprocessing module. Hai-Dang Dau noticed there was some performance issue with the previous implementation (a few cores could stay idle) and fixed it.

While testing the new version, I noticed that function distinct_seeds (module utils), which, as the name suggests, generate distinct random seeds for the processes run in parallel, could be very slow in certain cases. I changed the way the seeds were generated to fix the problem (using stratified resampling). I will discuss this in more detail a separate blog post.

Development of this library is partly driven by interactions with users. For instance, the next version will have a more general MvNormal distribution (allowing for a covariance matrix that varies across particles), because one colleague got in touch and needed that feature.

So don’t be shy, if you don’t see how to do something with particles, please get in touch. It’s likely our interaction will help me to either improve the documentation or add new, useful features. Of course, I also welcome direct contributions (through pull requests)!

Otherwise, I have several ideas for future releases, but, for the next one, it is likely I will focus on the following two areas.

My priority #1 is to implement waste-free SMC in the package, following our recent paper with Dang. (Dang already has released his own implementation, which is built on top of particles, but, given that waste-free SMC seems to offer better performance than standard SMC samplers, it seems important to have it available in particles).

When this is done, I plan to add several important applications of SMC samplers, such as:

- the computation of orthant probabilities (Ridgway, 2014);
- variable selection (Schäfer and Chopin, 2012);
- ABC, perhaps using the rare-event approach of Prangle et al (2018).

I also plan to document SMC samplers a bit better.

Python libraries such as Tensorflow, Pytorch or JAX are all the rage in machine learning. They offer access to very fancy stuff, such as auto-differentation, and computation on the GPU.

I have started to play a bit with Pytorch, and even have a working implementation of a particle filter that runs entirely on the GPU. The idea is to make the core parts of particles completely independent of numpy. In that way, one may use Pytorch tensors to store the particles and their weights. This is really work in progress.

]]>When I’m asked by students whether they should accept some referee invitation (being it for a stat journal or a machine learning conference) I almost invariably say yes. I think that there is a lot to be learnt when refereeing papers and that this worth the time spent in the process. I’ll detail in this post why I think so.

First, this post is not about tips on *how* to write a referee report, but rather on *why*. It is instructive to consult tips on the *how*s, and good posts can be found out there. Note that some journals will also have specific guidelines.

Before diving into the benefits of refereeing, let me first say that a referee invitation can also be declined for many good reasons: in case of a conflict of interest (CoI), and/or if some of the authors are too close to you in some sense (although in some fields with a tiny community, this almost inevitably happens); if you do not feel qualified enough; or sometimes, if you feel you are qualified, but the refereeing task can seem overwhelming due to length or technicality of the paper; do not feel obliged to accept invitations from journals you do not know about, and of course ignore those coming from predatory journals or publishers (use this checklist). In any case, be conscious that it is ok to decline an invitation. Keep in mind that the associate editor in charge will very much appreciate pointers to alternative referee names.

Now, what are the benefits of refereeing? It is a legitimate question, given that refereeing work is usually time-consuming, done on a voluntary basis, without implying any direct or instant reward. So it is important to understand what you can gain out of it.

**Learning about editorial process**

In the early stages of an academic career, refereeing papers is an opportunity of learning by doing about the ins and outs of the editorial mechanism. You do not get the chance to practice replying to referee reports every other day when you are a student. But getting papers to review, you may also get to see replies by the authors, and reports from other referees (eg in revision rounds). This may help and build some habit about how you will get into action when your turn comes to reply to referees!

**Opening research interests**

We are usually asked to referee papers in our own area of expertise, but accepting to review papers slightly outside of one’s research interests can be rewarding. Be curious! There is a chance that reading submitted papers will trigger new research directions of yours. This happened to me at least twice: I have started to work on Bayesian deep learning after refereeing an ICLR paper dealing with the behaviour of neural networks in the infinitely wide limit; and (dis)proving a conjecture stated in a COLT submission stimulated a new line of research of mine on the sub-Gaussian property of random variables. Pay attention that in order to start working on such submitted papers in a legit way, you should ensure that they are also made available as preprints on some open repository like arxiv.

**Prompting new opportunities**

Refereeing papers surely increases your visibility. It is also a preliminary step before being associate editor. I’m AE for several stat journals, and managing papers is a task that I find enjoyable, with a social side that consists of writing referee invitation messages to colleagues. This helps connect or stay in touch with colleagues we do not have occasions to meet in conferences those days!

]]>Andras Fulop, Jeremy Heng (both ESSEC), and me (Nicolas Chopin, ENSAE, IPP) are currently advertising a post-doc position to work on developing SMC methods for challenging models found in Finance and Econometrics. If you are interested, click here for more details, and get in touch with us.

]]>Ever wanted to learn more about particle filters, sequential Monte Carlo, state-space/hidden Markov models, PMCMC (particle MCMC) , SMC samplers, and related topics?

In that case, you might want to check the following book from Omiros Papaspiliopoulos and I, which has just been released by Springer:

and which may be ordered from their web-site, or from your favourite book store.

The aim of the book is to cover the many facets of SMC: the algorithms, their practical uses in different areas, the underlying theory, how they may be implemented in practice, etc. Each chapter contains a “Python corner” which discusses the practical implementation of the covered methods in Python, a set of exercises, and bibliographical notes. Speaking of chapters, here is the table of contents:

- Introduction
- Introduction to state-space models
- Beyond state-space models
- Introduction to Markov processes
- Feynman-Kac models: definition, properties and recursions
- Finite state-spaces and hidden Markov models
- Linear-Gaussian state-space models
- Importance sampling
- Importance resampling
- Particle filtering
- Convergence and stability of particle filters
- Particle smoothing
- Sequential quasi-Monte Carlo
- Maximum likelihood estimation of state-space models
- Markov chain Monte Carlo
- Bayesian estimation of state-space models and particle MCMC
- SMC samplers
- SMC^2, sequential inference in state-space models
- Advanced topics and open problems

And here is one fancy plot taken from the book. (For some explanation, you will have to read it!)

A big thanks to all the colleagues who took the time to read draft versions and send feedback (see the introduction for a list of names). Also, don’t write books, folks. Seriously, it takes WAY too much time…

]]>Hi all,

This post is about a way of sampling from a Categorical distribution, which appears in Arthur Dempter‘s approach to inference as a generalization of Bayesian inference (see Figure 1 in “A Generalization of Bayesian Inference”, 1968), under the name “structure of the second kind”. It’s the starting point of my on-going work with Ruobin Gong and Paul Edlefsen, which I’ll write about on another day. This sampling mechanism turns out to be strictly equivalent to the “Gumbel-max” trick that got some attention in machine learning see e.g. this blog post by Francis Bach.

Let’s look at the figure above: the encompassing triangle is equivalent to the “simplex” with 3 vertices (K vertices more generally). Any point within the triangle is a convex combination of the vertices, where are non-negative “weights” summing to one, and where are the vertices. The weights are the “barycentric coordinates” of the point. Any point in the triangle induces a partition into K sets . Each “sub-simplex” can be obtained by considering the entire simplex and replacing vertex by . It has a volume equal to relative to the volume of the entire simplex. Can you see why? If not, it’s OK, great scientific endeavors require a certain degree of trust and optimism.

Since the volume of each is , if we sample a point uniformly within the encompassing simplex, it will land within with probability . In other words we can sample from a Categorical distribution with probabilities by sampling uniformly within the simplex, and by identifying which index k is such that the point lands in . This appears in various places in Arthur Dempster’s articles (see references below), because Categorical distributions provide a pedagogical setting for new methods of statistical inference, and because this sampling mechanism does not rely on any arbitrary ordering of the categories (contrarily to “inverse transform sampling”).

How does this relate to the Gumbel-max trick? One way of sampling uniformly within the simplex is to sample Exponentials(1) and to define weights . Furthermore, a point is within for a given , if and only if for all . The next figure illustrates such inequalities: the points with coordinates satisfying are under/above some line that originates from the vertex opposite the segment and goes through .

An Exponential(1) is also minus the logarithm of a Uniform(0,1). Putting all these pieces together, a Uniform point in the simplex is within if and only if, for all ,

.

Since is a Gumbel variable, the above mechanism is equivalent to where are independent Gumbel variables. It’s the Gumbel-max trick!

- It’s hard to trace back the first instance of this sampling mechanism, but it appears in various of Arthur Dempster’s articles, e.g. “New methods for reasoning towards posterior distributions based on sample data”, 1966, and it is discussed at length in “A class of random convex polytopes”, 1972.
- The connection occurred to me while reading Xi’an’s blog post, which points to this interesting article on Emil Gumbel, academic in Heidelberg up to his exile in 1932, “pioneer of modern data journalism” and active opponent to the nazis. Quoting from the article, “His fate was sealed when, at a speech in memory of the 700,000 who had perished of hunger in the winter of 1916/17, he remarked that a rutabaga would certainly be a better memorial than a scantily clad virgin with a palm frond”.
- The Gumbel-max trick is interesting for many reasons, it amounts to viewing sampling as an optimization program, it can be “relaxed” in various useful ways, etc. In Art Dempster’s work that sampling mechanism is appealing because of its invariance by relabeling of the categories (“category 2” is not between “category 1” and “category 3”). This matters when performing inference with Categorical distributions (i.e. with count data) using Art Dempster’s approach, because the estimation depends on the choice of sampling mechanism and not simply on the likelihood function.

Hi everyone,

This short post is just to point to a course on “Couplings and Monte Carlo”, available here https://sites.google.com/site/pierrejacob/cmclectures. Versions of the course were given in Université Paris-Dauphine in February 2020 (thanks Robin Ryder and Christian P. Robert), at the University of Bristol in March 2020 (thanks Anthony Lee) and at the University of Torino for the M.Sc. in Stochastics and Data Science in May 2020 (thanks Matteo Ruggiero). I am grateful to these colleagues and their institutions for supporting this course. The course website points to about 100 pages of lecture notes, and 16 videos are available on youtube. It is intended for advanced undergraduate students or graduate students, with some previous exposure to Monte Carlo methods. This is work in progress, and as I am hoping to develop the course over the coming years, feedback would be much welcome.

]]>In this post, I would like to do the following:

- describe briefly a new, richer data-set recently published by INSEE (and do some graphs);
- use the updated data (from both sources) to repeat my analysis, with some variants (weekly aggregates, separating men and women);
- reply to a few comments I got on LinkedIn and elsewhere;
- provide a few pointers regarding death counts in other countries (particularly the UK).

INSEE now provides every Friday an exhaustive data-set that records, for each death that has occurred since 01-01-2018, the following variables: date of birth, date of death, sex, département of death, and so on. Neat. Let’s take this opportunity to do a few plots, such as this one:

(it’s nice to observe this sharp drop) or that one:

The latter plot covers the same period (weeks 13 to 15, 23rd March to 12th April) as in the analysis below. As expected, over-mortality seems to affect mostly people above 60.

Ok, now let’s repeat my previous analysis, based on merging the SPF data (daily covid death counts in hospitals, in each département and each sex) and the aforementioned INSEE data (all-cause deaths). Except this time:

- The overlap between the two datasets now covers more than three weeks (18th March, first date in SPF dataset, to 12th April, latest date in INSEE dataset) so I decided to consider
**weekly aggregates**, for two reasons: they are more stable than daily aggregates, and less affected by artifacts such as delays (e.g. a death occurring during a week-end is reported to the next Monday). - I also separated
**men and women**. - I am going to simplify a bit the model, and simply regress
**excess deaths**(number of deaths in 2020 minus the average over 2018 and 2019) on**hospital deaths**.

First, a joint plot:

So, to recap, each point in this plot corresponds to a pair of death counts, for each département in France, each week between 13 and 15, and for each sex. The corresponding linear regression (without an intercept) gives a slope estimate of 1.79 (95% confidence interval: [1.73, 1.85]). The basic interpretation would be: in each département, when 100 covid deaths occur in hospitals, the number of covid-related (see below) deaths should be approximately 179. The current total number of covid deaths reported by SPF is 22, 614, which is 60% above that the number of covid deaths in hospitals (14050). So this estimate suggests the actual death toll might be a tad larger. More about the interpretation below.

Now for something more interesting: let’s redo the previous plot, but with a different colour for each sex:

Clearly the two linear trends are different; see below the OLS estimates.

sex | slope estimate | slope 95% confidence interval | R^2 |

F | 2.40 | [2.30, 2.50] | 89% |

M | 1.56 | [1.50, 1.62] | 90% |

What is going on? Well, women tend to live longer than men. And the proportion of women in EHPADS (French retirement homes) is 74%. Since the main reason behind the discrepancy between hospital deaths and excess deaths is covid death occurring in pension homes, these results make sense.

Fair enough, since 4th April, SPF does include in its total estimate both hospital deaths and pension home deaths, and the proportion of the latter is not too far from my estimate. Note that however that:

- it’s really hard to estimate properly the number of covid deaths occurring in pension homes. Apparently several pension homes did not provide any data, while others marked as “covid” all the deaths that have occurred after the first covid deaths.
- My estimate might measure other direct or indirect effects of the pandemic, such as people dying at home, people not receiving proper care because the health system is at capacity and so on.
- The fact that data from two different institutions may be compared, and seem to be somehow consistent, is, in my opinion, a good piece of news which deserves to be reported!

Boy, that one was popular. Please have a look a the plot on the front page of ONIRS (click on “tués”): yes, the number car-related deaths dropped sharply thanks to the lock-down… But in March of last year, this number was around 250, that is, 1% of the current covid death count. “Fun” fact: this point would have been quite relevant in the 70s! In those years, the number of car-related deaths was about five times larger (18 034 deaths in 1972).

The idea of comparing the 2020 deaths to the average of the two previous years is a bit crude, and demographers have better models to predict death counts based on age repartition and so on. That said, the notion of “excess deaths” seems quite popular in various countries, as I explain below, so I guess that my approach is not so daft after all.

To be honest, I was hoping to apply the same approach to the UK, a country where the official estimate is still limited to hospital deaths, and thus clearly quite biased; see e.g. this Guardian paper. Sadly, Public Health England only reports daily hospital death counts … per nation (nation=England, Scotland, Wales, or Northern Ireland). On the other hand, the Office of National Statistics reports every week the number of “excess deaths” (relative to the five year average), and the proportion of these deaths where the word “covid” is mentioned on the death certificate.

Interestingly, the Guardian paper I mentioned above first complains that the UK only reports hospital deaths, and then claims erroneously that the UK is still behind France in terms of covid mortality. It’s not, if you compare in terms of hospital deaths (UK: 20,319 on Saturday, while France: 14,050). The fact that even journalists reporting on this issue may get it wrong seems indicative of how confusing are covid death data.

More generally, my impression is that looking at “excess deaths” makes far more sense for most countries at the moment: it’s easier to measure (albeit with a delay of course), and easier to interpret. This is also more or less the point made by this NYT paper. (Notice how their plot for France only covers January to April; for the complete plot, see my first plot above!).

]]>However, case counts per country are not very reliable, given that countries have very different policies regarding testing and so on; see e.g. Nate Silver’s opinion on case counts here.

You would think that death counts are far more reliable. In France, however,

Santé Publique France got criticized for reporting only COVID deaths that occurred in hospitals. Very recently, they started to include also deaths that occurred in retirement homes. However, they do so only at the national level (current count as of April 12th: 13832; 66% from hospitals). At a finer level (i.e. “régions” or “départements”), the data they provide (here) remains restricted to hospitals.

INSEE (French institute of official Statistics) decided to publish at the same time daily death counts at the département level. Note that INSEE is not a public health institute; the death counts they report are for *all* deaths, whatever the cause. See also this authoritative post (in French) explaining the challenges behind death counts reporting. In case, you wonder, a “département” is a regional unit (we have about 100 of those), see this wikipedia article.

I decided to compare both datasets using a very, very simple methodology. First, I merged both datasets, so as to obtain, for each département, and each day with a certain period:

- the number of covid deaths reported in a hospital, call it (where is the département, is the day);
- the total number of deaths (whatever the cause) on the same day , in département ;
- the total number of death , , on the same day, respectively one year ago (in 2019), and two years ago (in 2018).

SPF data starts on the 18 of March, and INSEE publishes its data every Friday with a one week delay, so my merged dataset currently covers the period: 18 to 30 of March (13 days). And we have about 90 départements in the dataset; the sample size is 1200.

The model I have in mind is simply: .

The first term is a basic predictor of 2020 counts, in case they were no pandemic. It is pretty basic, but counts deaths are quite stable over the years. Granted, there is some variation in winter, due to the flu, but this seems to affect mostly February. For the record, here is a plot of the daily number of deaths in France in 2018, 2019 and 2020, for the period covered by the data:

The coefficient of course measure under-reporting.

Thus, I fitted a linear regression model to predict 2020 deaths as a function of the 2018 and 2019 deaths, and the CH deaths (no intercept). Here are the results:

Look in particular at the estimate of : 1.596 (95% confidence interval: [1.51, 1.68] ). In other words, on average, one should add something between 50% and 70% to the reported number of covid deaths in hospitals to get an estimate of all covid deaths.

I tried other models; for instance by forcing the coefficients of the two years to be exactly equal to one half (how to do this is left as a simple exercise!). I got similar results. I’d like to repeat the analysis on weekly aggregated data. We don’t have yet two full weeks of data, so it’s too early for that. The usual caveats regarding linear regression apply; e.g. there should be some heteroscedasticity, given that the size of département vary significantly.

I will update these results as I get more data. I find it interesting that merging these two datasets already gives results that are reasonable and easy to interpret. In particular, I got similar results using only the first *six* days that were available one week ago. The secret here is we compensate the small number of days by a large number of “départements”.

I am not an expert on public health data, so I do not want to comment on why SPF reports only hospital data; I guess it is much harder to determine that a death is covid-related outside of a hospital, but again I am out of my depth here.

On the other hand, I think it is commendable that INSEE decided to make their own reporting. Of course, both institutions report different things. But the fact that we are able to compare and combine two sources of data potentially gives a clearer picture.

Comments more welcome. I would be curious in particular to know whether other countries provide this kind of double reporting.

]]>