# Statisfaction

## Do you really need to work 60+ hours per week to be a good researcher ?

Posted in General, publishing by JB Salomond on 6 February 2018

There is a controversy these days on social media about academics claiming that “if you do not feel like working 60+ hours per week including weekends and evenings you should probably find another job”. This is utterly frustrating. As an academic myself, I clearly do not feel like working days and nights, even if I am neck deep in a project. Does it makes me a poor assistant prof ?

It is no secret that academic research is not, in general, a 9 to 5 job (not saying that it cannot be). I myself usually work on weekends, when commuting or during holidays. I always carry a few papers that I did not had time to read or relentlessly write equations in my notebook when an idea strikes me during the morning commute. I do think about how I could improve my next day lecture and make some changes in my slides late in the evening. That is partly because I am disorganised, also because the job sort of requires it.  We lack time to do all the things we want to do while at the lab. Conferences, seminars and meetings wreck your schedule if you had any. So you might end up seeing any time off as a lost opportunity to do more research.

This situation is clearly not good, and many academics, including me, have or had a hard time dealing with this. In particular when you are still a PhD/postdoc/tenure track (you name it) and need to stick out of the pack to get a position in an extremely competitive environment. And when senior full professors are telling you that you need to work even harder if you want to be considered as worthy, that is clearly not helping.

My view is that even if we all want to shoot for the moon (and hit the damn thing), it is totally fine to simply do some good research. Each of your paper does not have to be ground breaking, as long as it contribute to the field, it should be good enough and acknowledge as such. If your goal is to have that-many JRSS/Biometrika/Annals of Stats papers and a sky-rocketing h-index no matter what, you are probably doing it wrong. Competition between academics can be a good boost from time to time, but it should not be the end of it. More importantly, it should not be what drives an academic careerer. The negative externalities of this system are depressed junior researchers and limited scientific research squeezed out of our brains to extend our publication record.

So what should we do about this ? Well first, stop bragging about how many hours to work per week, if you feel like working 24/7, good for you, but it does not have to be that way for everyone. Secondly, stop judging academics (especially junior ones) on some dumb metrics such as h-index or so, if you need to evaluate a candidate, read their research. In short, cut the competition and be supportive. Make academia fun again !

## Sub-Gaussian property for the Beta distribution (part 3, final)

Posted in General, R by Julyan Arbel on 26 December 2017

When a Beta random variable wants to act like a Bernoulli: convergence of optimal proxy variance.

In this third and last post about the Sub-Gaussian property for the Beta distribution [1] (post 1 and post 2), I would like to show the interplay with the Bernoulli distribution as well as some connexions with optimal transport (OT is a hot topic in general, and also on this blog with Pierre’s posts on Wasserstein ABC). (more…)

## Sub-Gaussian property for the Beta distribution (part 2)

Posted in R by Julyan Arbel on 20 December 2017

Left: What makes the Beta optimal proxy variance (red) so special? Right: The difference function has a double zero (black dot).

As a follow-up on my previous post on the sub-Gaussian property for the Beta distribution [1], I’ll give here a visual illustration of the proof.

A random variable $X$ with finite mean $\mu=\mathbb{E}[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that:

$\mathbb{E}[\exp(\lambda (X-\mu))]\le\exp\left(\frac{\lambda^2\sigma^2}{2}\right)\,\,\text{for all } \lambda\in\mathbb{R}.$

We focus on X being a Beta$(\alpha,\beta)$ random variable. Its moment generating function $\mathbb{E}[\exp(\lambda X)]$ is known as the Kummer function, or confluent hypergeometric function $_1F_1(\alpha,\alpha+\beta,\lambda)$. So is $\sigma^2$-sub-Gaussian as soon as the difference function

$u_\sigma(\lambda)=\exp\left(\frac{\alpha}{\alpha+\beta}\lambda+\frac{\sigma^2}{2}\lambda^2\right)-_1F_1(\alpha,\alpha+\beta,\lambda)$

remains positive on $\mathbb{R}$. This difference function $u_\sigma(\cdot)$ is plotted on the right panel above for parameters $(\alpha,\beta)=(1,1.3)$. In the plot, $\sigma^2$ is varying from green for the variance $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ (which is a lower bound to the optimal proxy variance) to blue for the value $\frac{1}{4(\alpha+\beta+1)}$, a simple upper bound given by Elder (2016), [2]. The idea of the proof is simple: the optimal proxy-variance corresponds to the value of $\sigma^2$ for which $u_\sigma(\cdot)$ admits a double zero, as illustrated with the red curve (black dot). The left panel shows the curves with $\mu = \frac{\alpha}{\alpha+\beta}$ varying, interpolating from green for $\text{Var}[X]=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ to blue for $\frac{1}{4(\alpha+\beta+1)}$, with only one curve qualifying as the optimal proxy variance in red.

#### References

[1] Marchal and Arbel (2017), On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017. Code on GitHub.
[2] Elder (2016), Bayesian Adaptive Data Analysis Guarantees from Subgaussianity, https://arxiv.org/abs/1611.00065

Tagged with:

## A big problem in our community

Posted in General, Seminar/Conference, Statistics by Pierre Jacob on 14 December 2017

“Tout va très bien”, meaning “all is well”, by Franquin.

Hi all,

Kristian Lum, who was already one of my Statistics superheroes for her many interesting papers and great talks, bravely wrote the following text about her experience as a young statistician going to conferences:

https://medium.com/@kristianlum/statistics-we-have-a-problem-304638dc5de5

I can’t thank Kristian enough for speaking out. Her experience is both shocking and hardly surprising. Many, many academics report similar stories. This simply can’t go on like that.

I happen to have gone to the conferences mentioned by Kristian, and my experience as a young man was completely different. It was all about meeting interesting people, discussing ideas, being challenged, and having good times. Nobody harassed, touched or assaulted me. There was some flirting, as I guess is natural when hundreds of people are put in sunny places far away from home, but I was never the victim of any misconduct or abuse of power. So instead of driving me out of the field, conferences became important, enriching and rewarding moments of my professional life.

Looking back at those conferences I feel sick, and heartbroken, at the thought that some of my peers were having such a difficult time, because of predators who don’t ever  face the consequences of their actions. Meanwhile I was part of the silent majority.

The recent series of revelations about sexual harassment and assaults in other professional environments indicate that this is not specific to our field, nor to academia. But this does not make it any more acceptable. I know for a fact that many leaders of our field take this issue extremely seriously (as Kristian mentions too),  but clearly much much more needs to be done. The current situation is just shameful; strong and  coordinated actions will be needed to fix it. Thanks again to Kristian for the wake-up call.

## nrow, references and copies

Posted in R by Pierre Jacob on 10 December 2017

Claude Monet’s paintings have nothing to do with the topic of this post.

Hi all,

This post deals with a strange phenomenon in R that I have noticed while working on unbiased MCMC. Reducing the problem to a simple form, consider the following code, which iteratively samples a vector ‘x’ and stores it in a row of a large matrix called ‘chain’ (I’ve kept the MCMC terminology).

dimstate = 100
nmcmc = 1e4
chain = matrix(0, nrow = nmcmc, ncol = dimstate)
for (imcmc in 1:nmcmc){
if (imcmc == nrow(chain)){ #call to nrow
}
x = rnorm(dimstate, mean = 0, sd = 1)
chain[imcmc,] = x #copying of x in chain
}


If you execute this code, you will see that it is surprisingly slow: it takes close to a minute on my computer. Now, consider the next block, which does exactly the same except that the vector ‘x’ is not copied into the matrix ‘chain’.

dimstate = 100
nmcmc = 1e4
chain = matrix(0, nrow = nmcmc, ncol = dimstate)
for (imcmc in 1:nmcmc){
if (imcmc == nrow(chain)){ #call to nrow
}
x = rnorm(dimstate, mean = 0, sd = 1)
# chain[imcmc,] = x #no more copying
}


This code runs nearly instantaneously. Could it be that just copying a vector in a matrix takes a lot of time? Sounds unlikely. Now consider this third block.

dimstate = 100
nmcmc = 1e4
chain = matrix(0, nrow = nmcmc, ncol = dimstate)
for (imcmc in 1:nmcmc){
if (imcmc == nmcmc){ #no call to nrow
}
x = rnorm(dimstate, mean = 0, sd = 1)
chain[imcmc,] = x #copying of x in chain
}


This code runs nearly instantaneously as well; this time ‘x’ is copied into ‘chain’, but the call to the nrow function is removed….?! What is nrow doing? It is meant to simply return dim(chain)[1], the first dimension of chain. So consider this fourth block.

dimstate = 100
nmcmc = 1e4
chain = matrix(0, nrow = nmcmc, ncol = dimstate)
for (imcmc in 1:nmcmc){
if (imcmc == dim(chain)[1]){ #call to dim instead of nrow
}
x = rnorm(dimstate, mean = 0, sd = 1)
chain[imcmc,] = x #copying of x in chain
}


This one also runs instantaneously! So replacing nrow(chain) by dim(chain)[1] solves the problem. Why?

The answer comes from R guru and terrific statistician Louis Aslett. I directly quote from an exchange of emails, since he brilliantly explains the phenomenon.

You probably know R stores everything by reference, so if I do:

x <- matrix(0, nrow=1e5, ncol=100)
y <- x

I actually only have one copy of the matrix in memory with two references to it. If I then do:

x[1,1] <- 1

R will first make a copy of the whole matrix, update x to point to that and then change the first element to one. This idea is used when you pass a variable to a standard (i.e. non-core, non-primitive) R function, which nrow is: it creates a reference to the variable you pass so that it doesn’t have to copy and the function call is very fast …. as long as you don’t write to it inside the function, no copy need ever happen. But the “bad design” bit is that R makes a decision whether to copy on write based only on a reference count and crucially that reference count stays increased even after a function returns, irrespective of whether or not the function has touched the variable.

So:

x <- matrix(0, nrow=1e5, ncol=100) # matrix has ref count 1
x[1,1] <- 1 # ref count is 1, so write with no copy
nrow(x) # ref count is 2 even though nothing was touched
x[1,1] <- 1 # ref count still 2, so R copies before writing first element. Now the ref count drops to 1 again
x[2,2] <- 1 # this writes without a copy as ref count got reset on last line
nrow(x) # ref count jumps
x[3,3] <- 1 # copy invoked again! Aaaargh!

So by calling nrow in the loop for the first example, the chain matrix is being copied in full on every iteration. In the second example, chain is never written to so there is no negative side effect to the ref count having gone up. In the third example, chain only ever has ref count 1 so there are no copies and each row is written in-place. I did a quick bit of profiling and indeed in the slow example, the R garbage collector allocates and tidies up nearly 9GB of RAM when executing the loop!

The crazy thing is that dim(chain)[1] works full speed even though that is all that nrow is doing under the hood, but the reason is that dim is a so-called “primitive” core R function which is special because it doesn’t affect the reference counter of its arguments. If you want to dig into this yourself, there’s a function refs() in the pryr package which tells you the current reference count to any variable.

Thanks Louis!

## Bayesian model comparison with vague or improper priors

Posted in Statistics by Pierre Jacob on 6 November 2017

Synthetic data from a Lévy-driven stochastic volatility model (top), log-Bayes factor between two such models (middle) and “Hyvärinen factor” (proposed approach, bottom). Each line represents a different Monte Carlo estimate, obtained sequentially over time.

Hi,

With Stephane Shao, Jie Ding and Vahid Tarokh we have just arXived a tech report entitled “Bayesian model comparison with the Hyvärinen score: computation and consistency“. Here I’ll explain the context, that is, scoring rules and Hyvärinen scores (originating in Hyvärinen’s score matching approach to inference), and then what we actually do in the paper.

## Dynamic publication list for research webpage using arXiv, HAL, or bibtex2html

Posted in General by Julyan Arbel on 23 October 2017

Well of course, dynamic is conditional upon some manual feeding. If you put your papers on arXiv or HAL, then those two propose dynamic widgets. If you maintain a .bib file of your papers, you can use tools like bibtex2html. This is not dynamic at all, but it allows for finer tuning of url links you might want to add than with arXiv or HAL options. I review below those three options. (more…)

## ISBA elections, let’s go voting

Posted in General by Julyan Arbel on 16 October 2017

So it seems even Thomas B. went voting.

The International Society for Bayesian Analysis (ISBA), is running elections until November, 15. This year, two contributors on this blog, Nicolas Chopin and myself, are running for an ISBA Section office. The sections of the society, nine in number as of today, gather researchers with common research interests: Computation, Objective Bayes, Nonparametrics, etc.

Here are our candidate statements:

Tagged with: ,

## Approximating the cut distribution

Posted in Statistics by Pierre Jacob on 1 October 2017

The cut distribution can be summoned by adding diodes to the graph representing relations between random variables. Just like that.

Hi,

This post is about computational issues with the cut distribution for Bayesian inference in misspecified models. Some motivation was given in a previous post about a recent paper on modular Bayesian inference. The cut distribution, or variants of it, might play an important role in combining statistical models, especially in settings where one wants to propagate uncertainty while preventing misspecification from damaging estimation. The cut distribution can also be seen as a probabilistic analog of two-step point estimators. So the cut distribution is more than just a trick! And it raises interesting computational issues which I’ll describe here along with a solution via unbiased MCMC.

## Unbiased Hamiltonian Monte Carlo with couplings

Posted in Statistics by Pierre Jacob on 17 September 2017

Two Hamiltonian Monte Carlo chains, casually exploring a target distribution while contracting.

With Jeremy Heng we have recently arXived a paper describing how to remove the burn-in bias of Hamiltonian Monte Carlo (HMC). This follows a recent work on unbiased MCMC estimators in general on which I blogged here. The case of HMC requires a specific yet very simple coupling. A direct consequence of this work is that Hamiltonian Monte Carlo can be massively parallelized: instead of running one chain for many iterations, one can run short coupled chains independently in parallel. The proposed estimators are consistent in the limit of the number of parallel replicates. This is appealing as the number of available processors increases much faster than clock speed, over recent years and for the years to come, for a number of reasons explained e.g. here.