# Statisfaction

## Bayesian demography

Posted in General by Julyan Arbel on 26 May 2016

“For about two centuries, Bayesian demography remained largely dormant. Only in recent decades has there been a revival of demographers’ interest in Bayesian methods, following the methodological and computational developments of Bayesian statistics. The area is currently growing fast, especially with the United Nations (UN) population projections becoming probabilistic—and Bayesian.”    Bijak and Bryant (2016)

It is interesting to see that Bayesian statistics have been infiltrating demography in the recent years. The review paper Bayesian demography 250 years after Bayes by Bijak and  Bryant (Population Studies, 2016) stresses that promising areas of application include demographic forecasts, problems with limited data, and highly structured and complex models. As an indication of this growing interest, ISBA meeting to be held next June will showcase a course and a session devoted to the field (given and organized by Adrian Raftery).

With Vianney Costemalle from INSEE, we recently modestly contributed to the field by proposing a Bayesian model (paper in French) which helps reconciling apparently inconsistent population datasets. The aim is to estimate annual migration flows to France (note that the work covers the period 2004-2011 (long publication process) and as a consequence does not take into account recent migration events). We follow the United Nations (UN) definition of a long-term migrant, who is someone who settles in a foreign country for at least one year. At least two datasets can be used to this aim: 1) the population census $C$, annual since 2004, and 2) data from residence permits $R$.

For every migration year $n$, both datasets provide longitudinal data for subsequent years $n+i$

$C_{i,n}$ = #(migrants entered year $n$, counted in census of year $n+i$),
$R_{i,n}$ = #(migrants entered year $n$, obtained (first) permit year $n+i$).

The two datasets are apparently in contradiction, in that the raw numbers would give as much as 50% more permits than census counts per year. But $C$ and $R$ actually do not account for the same populations. We interpret the longitudinal data counts $C_{i,n}$ and $R_{i,n}$ as fractions of the unknown total number of migrants entered year $n$, denoted by $M_n$:

$C_{i,n}\sim$ Binomial $(\theta_i,M_n)$,
$R_{i,n}\sim$ Binomial $(\alpha_i,M_n)$.

Note that we assume that the probabilities $\theta_i$ and $\alpha_i$ are stationary over time (which would be too strong an assumption for recent data, but was OK over the period 2004-2011). Those two parameters take on the interpretation of presence to the census $i$ year(s) after entrance for $\theta_i$, and proportion of permits delivered after $i$ year(s) for $\alpha_i$. By definition, the $\alpha_i$ need to sum up to some $\tau$ less than or equal to 1. We have observed that letting $\tau$ unfixed led to non convergence of the chains in posterior sampling, which we interpreted as non identifiability of the parameters in this case. Thus, we have estimated the model under different scenarios corresponding to fixed values for $\tau$. The interpretation is that a proportion $1-\tau$ of the migrants do not get a residence permit at all during their stay.

We have used noninformative priors on the parameters, and have implemented this model in JAGS (I did not know about Stan at that time).

Such an interplay of the two datasets with common (unknown) total number of migrants $M_n$ is key to borrowing strength between the datasets and being able to estimate the probabilities $\theta_i$ and $\alpha_i$. Implementation in JAGS nicely accommodates with disaggregating the data by age, gender and nationality, hence allowing for estimation of probabilities $\theta_i$ and $\alpha_i$ by categories of age, gender and nationality: