# Statisfaction

## Final update on unbiased smoothing

Posted in Statistics by Pierre Jacob on 27 August 2018

Two coupled chains, marginally following a conditional particle filter algorithm with ancestor sampling, and meeting in 20 iterations. Script here.

Hi,

Two years ago I blogged about couplings of conditional particle filters for smoothing.  The paper with Fredrik Lindsten and Thomas Schön has just been accepted for publication at JASA, and the arXiv version and github repository are hopefully in their final forms. Here I’ll mention a few recent developments and follow-up articles by other researchers.

## Couplings of Normal variables

Posted in R, Statistics by Pierre Jacob on 24 August 2018

Hi,

Just to play a bit with the gganimate package, and to celebrate National Coupling Day, the above plot shows different couplings of two univariate Normal distributions, Normal(0,1) and Normal(2,1). That is, each point is a pair (x,y) where x follows a Normal(0,1) and y follows a Normal(2,1). Below I’ll recall briefly how each coupling operates, in the Normal case. The code is available at the end of the post.

## Different ways of using MCMC algorithms

Posted in General, Statistics by Pierre Jacob on 19 July 2018

Hi,

This post is about different ways of using Markov chain Monte Carlo (MCMC) algorithms for numerical integration or sampling. It can be a hard job to design an MCMC algorithm for a given target distribution. Once it’s finally implemented, it gives a way of sampling a new point X’ given an existing point X. From there, the algorithm can be used in various ways to construct estimators of integrals/distribution of interest. Some ways are more amenable to parallel computing than others. I give some examples with references below.

## Bayesian workshop in Grenoble, September 6-7

Posted in General, Seminar/Conference by Julyan Arbel on 23 May 2018

We are organising a two-day Bayesian workshop in Grenoble in September 6-7, 2018. It will be the second edition of the Italian-French statistics seminar (link to first edition), titled this year: Bayesian learning theory for complex data modeling. The workshop will give to young statisticians the opportunity to learn from and interact with highly qualified senior researchers in probability, theoretical and applied statistics, with a particular focus on Bayesian methods.

Anyone interested in this field is welcome. There will be two junior sessions and a poster session with a call for abstract open until June 30. A particular focus will be given to researchers in the early stage of their career, or currently studying for a PhD, MSc or BSc. The junior session is supported by ISBA through travel awards.

There will be a social dinner on September 6, and a hike organised in the mountains on September 8.

Confirmed invited speakers

• Simon Barthelmé, Gipsa-lab, Grenoble, France
• Arnoldo Frigessi, University of Oslo, Norway
• Benjamin Guedj, Inria Lille – Nord Europe, France
• Alessandra Guglielmi, Politecnico di Milano, Italy
• Antonio Lijoi, University Bocconi, Milan, Italy
• Bernardo Nipoti, Trinity College Dublin, Ireland
• Sonia Petrone, University Bocconi, Milan, Italy

Important Dates:

• June 30, 2018: Abstract submission closes
• July 20, 2018: Notification on abstract acceptance
• August 25, 2018: Registration closes

More details and how to register: https://sites.google.com/view/bigworkshop

We look forward to seeing you in Grenoble.

Best,

Julyan

Tagged with: , ,

## Scaling of MCMC with dimension (experiments)

Posted in Statistics by Pierre Jacob on 15 May 2018

Taken from Wikipedia’s article on dimension.

Hi all,

In this post, I’ll go through numerical experiments illustrating the scaling of some MCMC algorithms with respect to the dimension. I will focus on a simple setting, to illustrate some theoretical results developed by Gareth Roberts, Andrew Stuart, Alexandros Beskos, Jeff Rosenthal and many of their co-authors over many years, for instance here for random walk Metropolis-Hastings (RWMH, and see here more recently), here for Modified Adjusted Langevin Algorithm (MALA), here for Hamiltonian Monte Carlo (HMC).

## A healthy dose of foundational crisis

Posted in General by Rémi Bardenet on 26 March 2018

I finally took the time to read about axiomatic foundations of Bayesian statistics. I like axioms, I like Bayesians stats, so this was definitely going to be a pleasant opportunity to read some books, comfortably seated on the sofa I just added to my office. Moreover, my team in Lille includes supporters of Dempster-Shafer belief functions, another framework for uncertainty modelling and decision-making, so being precise on my own axioms was the best way to discuss more constructively with my colleagues.

Long story short, I took the red pill: there is a significant gap between axiomatic constructions of the Bayesian paradigm and current Bayesian practice. None of this is new, but I had never been told. It’s not keeping me awake at night, but it’s bothering my office mates who cannot stop me blabbering about it over coffee. The good side is that smart people have thought about this in the past. Also, reading about this helped me understand some of the philosophical nuances between the thought processes of different Bayesians, say de Finettians vs. Jeffreysians. I will not attempt a survey in a blog post, nor do I feel to be knowledgeable enough for this, but I thought I could spare my office mates today and annoy Statisfaction’s readers for once.

Take Savage’s axioms, for instance. I’ve always heard that they were the current justification behind the saying “being Bayesian is being a coherent decision-maker”. To be precise, let $\Theta$ be the set of states of the world, that is, everything useful to make your decision. To fix ideas, in a statistical experiment, your decision might be a “credible” interval on some real parameter, so $\Theta$ should at least be the product of $\mathbb{R}$ times whatever space your data live in. Now an action $a$ is defined to be a map from $\Theta$ to some set of outcomes $\mathcal{Z}$. For the interval problem, an action $a_I$ corresponds to the choice of a particular interval $I$ and the outcomes $\mathcal{Z}$ should contain whatever you need to assess the performance of your action, say, the indicator of the parameter actually belonging to your interval $I$, and the length of $I$. Outcomes are judged by utility, that is, we consider functions $u:\mathcal{Z}\rightarrow\mathbb{R}_+$ that map outcomes to nonnegative rewards. In our example, this could be a weighted sum of the indicator and the interval length. The weights translate your preference for an interval that actually captures the value of the parameter of interest over a short interval. Now, the axioms give the equivalence between the two following bullets:

• (being Bayesian) There is a unique pair $(u,\pi)$, made of a utility function and a finitely additive probability measure $\pi$ defined on all subsets of the set $\Theta$ of states of the world, such that you choose your actions by maximizing an expected utility criterion:

$\displaystyle a^\star \in \arg\max_a \mathbb{E}_\pi u(a(\theta)),$

• (being coherent) Ranking actions according to a preference relation that satisfies a few abstract properties that make intuitive sense for most applications, such as transitivity: if you prefer $a_I$ to $a_J$ and $a_J$ to $a_G$, then you prefer $a_I$ to $a_G$. Add to this a few structural axioms that impose constraints on $\Theta$ on $\mathcal{Z}$.

Furthermore, there is a natural notion of conditional preference among actions that follows from Savage’s axioms. Taken together, these axioms give an operational definition of our “beliefs” that seems to match Bayesian practice. In particular, 1) our beliefs take the form of a probability measure –which depends on our utility–, 2) we should update these beliefs by conditioning probabilities, and 3) make decisions using expected utility with respect to our belief. This is undeniably beautiful. Not only does Savage avoid shaky arguments or interpretations by using your propensity to act to define your beliefs, but he also avoids using “extraneous probabilities”. By the latter I mean any axiom that artificially brings mathematical probability structures into the picture, such as “there exists an ideal Bernoulli coin”.

But the devil is in the details. For instance, some of the less intuitive of Savage’s axioms require the set of states of the world to be uncountable and the utility bounded. Also, the measure $\pi$ is actually only required to be finitely additive, and it has to be defined on all subsets of the set of states of the world. Now-traditional notions like Lebesgue integration, $\sigma$-additivity, or $\sigma$-algebras do not appear. In particular, if you want to put a prior on the mean of a Gaussian that lives in $\Theta=\mathbb{R}$, Savage says your prior should weight all subsets of the real line, so forget about using any probability measure that has a density with respect to the Lebesgue measure! Or, to paraphrase de Finetti, $\sigma$-additive probability does not exist. Man, before reading about axioms I thought “Haha, let’s see whether someone has actually worked out the technical details to justify Bayesian nonparametrics with expected utility, this must be technically tricky”; now I don’t even know how to fit the mean of a Gaussian anymore. Thank you, Morpheus-Savage.

There are axiomatic ways around these shortcomings. From what I’ve read they all either include extraneous probabilities or rather artificial mathematical constructions. Extraneous probabilities lead to philosophically beautiful axioms and interpretations, see e.g. Chapter 2 of Bernardo and Smith (2000), and they can get you finite and countably finite sets of states of the world, for instance, whereas Savage’s axioms do not. Stronger versions also give you $\sigma$-additivity, see below. Loosely speaking, I understand extraneous probabilities as measuring uncertainty with respect to an ideal coin, similarly to measuring heat in degrees Celsius by comparing a physical system to freezing or boiling water. However, I find extraneous probability axioms harder to swallow than (most of) Savage’s axioms, and they involve accepting a more general notion of probability than personal propensity to act.

If you want to bypass extraneous probability and still recover $\sigma$-additivity, you could follow Villegas (1964), and try to complete the state space $\Theta$ so that well-behaved measures $\pi$ extend uniquely to $\sigma$-additive measures on a $\sigma$-algebra on this bigger set of states $\hat\Theta$. Defining the extended $\hat\Theta$ involves sophisticated functional analysis, and requires to add potentially hard-to-intepret states of the world, so losing some of the interpretability of Savage’s construction. Authors of reference books seem reluctant to go in that direction: De Groot (1970), for instance, solves the issue by using a strong extraneous probability axiom that allows working in the original set $\Theta$ with $\sigma$-additive beliefs. Bernardo & Smith use extraneous probabilities, but keep their measures finitely additive until the end of Chapter 2. Then they admit departing from axioms for practical purposes and define “generalized beliefs” in Chapter 3, defined on a $\sigma$-algebra of the original $\Theta$. Others seem to more readily accept the gap between axioms and practice, and look for a more pragmatic justification of the combined use of expected utility and countably additive probabilities. For instance, Robert (2007) introduces posterior expected utility, and then argues that it has desirable properties among decision-making frameworks, such as respecting the likelihood principle. This is unlike Savage’s approach, for whom the (or rather, a finitely additive version of the) likelihood principle is a consequence of the axioms. I think this is an interesting subtlelty.

To conclude, I just wanted to share my excitement for having read some fascinating works on decision-theoretic axioms for Bayesian statistics. There still is some unresolved tension between having both an applicable and axiomatized Bayesian theory of belief. I would love this post to generate discussions, and help me understand the different thought processes behind each Bayesian being Bayesian (and each non-Bayesian being non-Bayesian). For instance, I had not realised how conceptually different the points of view in the reference books of Robert and Bernardo & Smith were. This definitely helped me understand (Xi’an) Robert’s short three answers to this post.

If this has raised your interest, I will mention here a few complementary sources that I have found useful, ping me if you want more. Chapters 2 and 3 of Bernardo and Smith (2000) contain a detailed description of their set of axioms with extraneous probability, and they give great lists of pointers on thorny issues at the end of each chapter. A lighter read is Parmigiani and Inoue (2009), which I think is a great starting point, with emphasis on the main ideas of de Finetti, Ramsey, Savage, and Anscombe and Aumann, how they apply, and how they relate to each other, rather than the technical details. Technical details and exhaustive reviews of sets of axioms for subjective probability can be found in their references to Fishburn’s work, which I have found to be beautifully clear, rigorous and complete, although like many papers involving low-level set manipulations, the proofs sometimes feel like they are written for robots. But after all, a normative theory of rationality is maybe only meant for robots.

## AI in Grenoble, 2nd to 6th July 2018

Posted in General, Seminar/Conference by Julyan Arbel on 22 March 2018

This is an advertisement for on conference on AI organised at Inria Grenoble by Thoth team and Naver labs : https://project.inria.fr/paiss/. This AI summer school comprises lectures and practical sessions conducted by renowned experts in different areas of artificial intelligence.

This event is the revival of a past series of very successful summer schools which took place in Grenoble and Paris. The latest edition of this series was held in 2013. While originally focusing on computer vision, the summer school now targets a broader AI audience, and will also include presentations about machine learning, natural language processing, robotics, and cognitive science.

Note that NAVER LABS is funding a number of students to attend PAISS. Apply before 4th April. (more…)

Tagged with: , , ,

## Do you really need to work 60+ hours per week to be a good researcher ?

Posted in General, publishing by JB Salomond on 6 February 2018

There is a controversy these days on social media about academics claiming that “if you do not feel like working 60+ hours per week including weekends and evenings you should probably find another job”. This is utterly frustrating. As an academic myself, I clearly do not feel like working days and nights, even if I am neck deep in a project. Does it makes me a poor assistant prof ?

It is no secret that academic research is not, in general, a 9 to 5 job (not saying that it cannot be). I myself usually work on weekends, when commuting or during holidays. I always carry a few papers that I did not had time to read or relentlessly write equations in my notebook when an idea strikes me during the morning commute. I do think about how I could improve my next day lecture and make some changes in my slides late in the evening. That is partly because I am disorganised, also because the job sort of requires it.  We lack time to do all the things we want to do while at the lab. Conferences, seminars and meetings wreck your schedule if you had any. So you might end up seeing any time off as a lost opportunity to do more research.

This situation is clearly not good, and many academics, including me, have or had a hard time dealing with this. In particular when you are still a PhD/postdoc/tenure track (you name it) and need to stick out of the pack to get a position in an extremely competitive environment. And when senior full professors are telling you that you need to work even harder if you want to be considered as worthy, that is clearly not helping.

My view is that even if we all want to shoot for the moon (and hit the damn thing), it is totally fine to simply do some good research. Each of your paper does not have to be ground breaking, as long as it contribute to the field, it should be good enough and acknowledge as such. If your goal is to have that-many JRSS/Biometrika/Annals of Stats papers and a sky-rocketing h-index no matter what, you are probably doing it wrong. Competition between academics can be a good boost from time to time, but it should not be the end of it. More importantly, it should not be what drives an academic careerer. The negative externalities of this system are depressed junior researchers and limited scientific research squeezed out of our brains to extend our publication record.

So what should we do about this ? Well first, stop bragging about how many hours to work per week, if you feel like working 24/7, good for you, but it does not have to be that way for everyone. Secondly, stop judging academics (especially junior ones) on some dumb metrics such as h-index or so, if you need to evaluate a candidate, read their research. In short, cut the competition and be supportive. Make academia fun again !

## Sub-Gaussian property for the Beta distribution (part 3, final)

Posted in General, R by Julyan Arbel on 26 December 2017

When a Beta random variable wants to act like a Bernoulli: convergence of optimal proxy variance.

In this third and last post about the Sub-Gaussian property for the Beta distribution [1] (post 1 and post 2), I would like to show the interplay with the Bernoulli distribution as well as some connexions with optimal transport (OT is a hot topic in general, and also on this blog with Pierre’s posts on Wasserstein ABC). (more…)

## Sub-Gaussian property for the Beta distribution (part 2)

Posted in R by Julyan Arbel on 20 December 2017

Left: What makes the Beta optimal proxy variance (red) so special? Right: The difference function has a double zero (black dot).

As a follow-up on my previous post on the sub-Gaussian property for the Beta distribution [1], I’ll give here a visual illustration of the proof.

A random variable $X$ with finite mean $\mu=\mathbb{E}[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that:

$\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.$

We focus on X being a Beta$(\alpha,\beta)$ random variable. Its moment generating function $\mathbb{E}[\exp(\lambda X)]$ is known as the Kummer function, or confluent hypergeometric function $_1F_1(\alpha,\alpha+\beta,\lambda)$. So is $\sigma^2$-sub-Gaussian as soon as the difference function

$u_\sigma(\lambda)=\exp\left(\frac{\alpha}{\alpha+\beta}\lambda+\frac{\sigma^2}{2}\lambda^2\right)-_1F_1(\alpha,\alpha+\beta,\lambda)$

remains positive on $\mathbb{R}$. This difference function $u_\sigma(\cdot)$ is plotted on the right panel above for parameters $(\alpha,\beta)=(1,1.3)$. In the plot, $\sigma^2$ is varying from green for the variance $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ (which is a lower bound to the optimal proxy variance) to blue for the value $\frac{1}{4(\alpha+\beta+1)}$, a simple upper bound given by Elder (2016), [2]. The idea of the proof is simple: the optimal proxy-variance corresponds to the value of $\sigma^2$ for which $u_\sigma(\cdot)$ admits a double zero, as illustrated with the red curve (black dot). The left panel shows the curves with $\mu = \frac{\alpha}{\alpha+\beta}$ varying, interpolating from green for $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ to blue for $\frac{1}{4(\alpha+\beta+1)}$, with only one curve qualifying as the optimal proxy variance in red.

#### References

[1] Marchal and Arbel (2017), On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017. Code on GitHub.
[2] Elder (2016), Bayesian Adaptive Data Analysis Guarantees from Subgaussianity, https://arxiv.org/abs/1611.00065

Tagged with: