Turing revisited in Turin, and Oxford

Posted in General by Julyan Arbel on 18 June 2015
Are you paying attention? Good. If you are not listening carefully, you will miss things.

Are you paying attention? Good. If you are not listening carefully, you will miss things. Important things.

With colleagues Stefano Favaro and Bernardo Nipoti from Turin and Yee Whye Teh from Oxford, we have just arXived an article on discovery probabilities. If you are looking for some info on a space shuttle, a cycling team or a TV channel, it’s the wrong place. Instead, discovery probabilities are central to ecology, biology and genomics where data can be seen as a population of individuals belonging to an (ideally) infinite number of species. Given a sample of size n, the l-discovery probability D_{n}(l) is the probability that the next individual observed matches a species with frequency l in the n-sample. For instance, the probability of observing a new species D_{n}(0) is key for devising sampling experiments.

By the way, why Alan Turing? Because with his fellow researcher at Bletchley Park Irving John Good, starred in The Imitation Game too, Turing is also known for the so-called Good-Turing estimator of the discovery probability


which involves m_{l+1,n}, the number of species with frequency l+1 in the sample (ie frequencies frequency, if you follow me). As it happens, this estimator defined in Good 1953 Biometrika paper became wildly popular among ecology-biology-genomics communities since then, at least in the small circles where wild popularity and probability aren’t mutually exclusive.

Simple explicit estimators \hat{\mathcal{D}}_{n}(l) of discovery probabilities in the Bayesian nonparametric (BNP) framework of Gibbs-type priors were given by Lijoi, Mena and Prünster in a 2007 Biometrika paper. The main difference between the two estimators of D_{n}(l) is that Good-Turing involves n and m_{l+1,n} only, while the BNP involves n, m_{l,n} (instead of m_{l+1,n}), and k_n, the total number of observed species. It has been shown in the literature that the BNP estimators are more reliable than Good-Turing estimators.

How do we contribute? (i) we describe the posterior distribution of the discovery probabilities in the BNP model, which is pretty useful for deriving exact credible intervals of the estimates, and (ii) we investigate large n asymptotic behavior of the estimators.

We are not aware of any non-asymptotic method for deriving credible interval for the BNP estimators. We fill this gap by describing the posterior distribution of D_{n}(l). More specifically, we derive all posterior moments of D_{n}(l). Since this random variable has a compact support, [0,1], it is characterized by its moments. So one can use a moment-based technique for sampling draws, see e.g. our momentify R package written for another article. We also show that the posterior distribution is explicit in two special cases of Gibbs-type priors known as the two parameter Poisson-Dirichlet prior and the normalized generalized Gamma prior. The posterior distribution is in fact shamelessly simple (once you know it) since it essentially amounts to [[a random] fraction of] a Beta distribution [with random coefficients].

As for large n asymptotic behavior of the estimators, we prove the following asymptotic equivalences, denoted by \simeq,

\widehat{\mathcal{D}}_{n}(0) \simeq\frac{\sigma k_n}{n}  and  \widehat{\mathcal{D}}_{n}(l)\simeq (l-\sigma)\frac{m_{l,n}}{n} for l\geq 1,

where \sigma\in(0,1) is a parameter of the Gibbs-type prior. These can serve as approximations. In the cases of the two parameter Poisson-Dirichlet prior and the normalized generalized Gamma prior, we provide also a second order term to the asymptotic expansion of the estimators

\widehat{\mathcal{D}}_{n}(0)=\frac{\sigma k_n}{n}+\frac{S_n}{n}+o\big(\frac{1}{n}\big)  and  \widehat{\mathcal{D}}_{n}(l)=(l-\sigma)\frac{m_{l,n}}{n}\big(1-\frac{S_n}{n}+o\big(\frac{1}{n}\big)\big) for l\geq 1,

 where the second order S_n is either a constant, or a quantity which converges almost surely to a random variable. In both cases, we show that it involves the second (and last) parameter of the priors, whereas the asymptotic equivalence given before involves only \sigma. Whether similar asymptotic expansions also hold in the whole Gibbs-type class remains an open problem!

If you have read till this point, then you may also be interested in listening to Stefano Favaro about it at the 10th Conference on Bayesian Nonparametrics next week in Raleigh, NC 🙂



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: