Statisfaction

particles

Posted in General, Statistics by nicolaschopin on 4 June 2019

I have released a few months ago a Python package for particle filtering, called particles; you can find it on Github here. You may want to have a look first at the documentation, in particular the tutorials here.

This package has been developed to support our (with Omiros Papaspiliopoulos) forthcoming book called (tentatively): an introduction to Sequential Monte Carlo. It implements all the algorithms discussed in the book; e.g.

  • bootstrap, guided and auxiliary particle filters
  • all standard resampling schemes
  • most particle smoothing algorithms
  • sequential quasi-Monte Carlo
  • PMCMC (PMMH, Particle Gibbs), SMC^2
  • SMC samplers

It also contains all the scripts that were used to perform the numerical experiments discussed in the book.

A random plot taken from the forthcoming book. Can you guess what it represents exactly?

This package is hopefully useful to people with different expectations and level of expertise. For instance, if you just want to run a particle filter for a basic state-space model, you may describe that model as follows:

import particles
from particles import state_space_models as ssm

class ToySSM(ssm.StateSpaceModel):
    def PX0(self):  # Distribution of X_0 
        return dists.Normal()  # X_0 ~ N(0, 1)
    def PX(self, t, xp):  # Distribution of X_t given X_{t-1}
        return dists.Normal(loc=xp)  # X_t ~ N( X_{t-1}, 1)
    def PY(self, t, xp, x):  # Distribution of Y_t given X_t (and X_{t-1}) 
        return dists.Normal(loc=x, scale=self.sigma)  # Y_t ~ N(X_t, sigma^2)

And then simulate data, and run the corresponding bootstrap filter, as follows:

my_model = ToySSM(sigma=0.2)
x, y = my_model.simulate(200)  # sample size is 200

alg = particles.SMC(fk=ssm.Bootstrap(ssm=my_model, data=y), N=200)
alg.run()

On the other hand, if you are an SMC expert, you may re-use only the parts you need; e.g. a resampling scheme:

from particles import resampling 

A = resampling.systematic(W)

Up to now, this package has been tested mostly by my PhD students, and the students of my M2 course on particle filtering at the ENSAE; many thanks to all of them. Since no computer screen has been smashed in the process, I guess I can publicize it a bit more. Please let me know if you have any questions, comments, or feature request. (You may report a bug by raising an issue on the Github page.)

Based on your feedback, I’m planning to write a few more posts in the coming weeks about particles and more generally numerical computation in Python. Stay tuned!

Advertisements

Gaussian variates truncated to a finite interval

Posted in General by nicolaschopin on 29 December 2016

Alan Rogers, an anthropologist at University of Utah, got in touch with me about my paper on the simulation of truncated Gaussian distributions (journal version, arxiv version). The method I proposed in this paper works for either a finite interval [a,b], or a semi-finite one [a,+inf[, but my C code implements only the latter, and Alan needed the former.

Alan thus decided to re-implement my method and several others (including Christian Robert’s accept-reject algorithm proposed in this paper) in C; see here:

https://github.com/alanrogers/dtnorm

Alan also sent me this interesting plot that compares the different methods. The color of a dot at position (a,b) corresponds to the fastest method for simulating N(0,1) truncated to [a,b];

dtnorm

CPU time comparison of truncated Normal samplers (c=chopin, r=robert, g=N(0,1) proposal, e=exponential proposal)

A few personal remarks:

  • My method is an accept-reject algorithm, where the proposal is a mixture of uniform distributions on rectangles. The point is to have a large probability that the number of basic operations (multiplication, division) needed to return the draw is small. However, the improvement brought by such a method might be be observable only in compiled languages. In an interpreted language such as R, Matlab and Python, loops over basic operations come with a certain overhead, which might cancel any improvement. This was the experience of a colleague who tried to implement it in Julia.
  • Even in C, this comparison might depend on several factors (computer, compiler, libraries, and so on). If I remember correctly, the choice of the random generator in particular may have a significant impact. (I used the GSL library which makes it easy to try different generators for the same piece of code.)
  • Also bear in mind that some progress has been made for computing the inverse CDF of a unit Gaussian distribution. Hence the basic inverse CDF method, while not being the fastest approach, works reasonably well these days, especially (again) in interpreted languages. (Update: Alan tells me the inverse CDF methods remains 10 times slower for his C implementation, based on the GSL library.)

 

post-doc positions at ENSAE

Posted in General by nicolaschopin on 27 November 2016

postdoc

Hi,

interested in doing a post-doc with me on anything related to Bayesian Computation? Please let me know, as there is currently a call for post-doc grants at the ENSAE, see below.

Nicolas Chopin

The Labex ECODEC is a research consortium in Economics and Decision Sciences common to three leading French higher education institutions based in the larger Paris area: École polytechnique, ENSAE and HEC Paris. The Labex Ecodec offers:

  • One-year postdoctoral fellowships for 2017-2018

  • Two-year postdoctoral fellowships for 2017-2019

The monthly gross salary of postdoctoral fellowships is 3 000 €.

Candidates are invited to contact as soon as possible members of the research group (see below) with whom they intend to work.

Research groups concerned by the call:

Area 1: Secure Careers in a Global Economy

Area 2: Financial Market Failures and Regulation

Area 3: Product Market Regulation and Consumer Decision-Making

Area 4: Evaluating the Impact of Public Policies and Firms’ Decisions

Area 5: New Challenges for New Data

Details of axis can be found on the website:

http://labex-ecodec.fr/

Deadlines for application:

31st December 2016

Screening of applications and decisions can be made earlier for srong candidates who need an early decision.

The application should be sent to application@labex-ecodec.fr in PDF. Please mention the area number on which you apply in the subject.

The application package includes:

  1. A cover letter with the name of a potential supervisor among the group;
  2. A research statement;
  3. A letter from the potential supervisor in support of the project;
  4. A Curriculum vita (with the address of the candidate, phone and e-mail contact).
  5. The Ph.D. dissertation or papers/preprint;
  6. Reference letters, including one from the PhD advisor. A letter from a member of the research group with whom the candidate is willing to interact will be appreciated.

 

Please note that HEC, Genes, and X PhD students are not eligible to apply for this call.

Selection will be based on excellence and a research project matching the group’s research agenda.

Area 1 “Secure careers in a Global Economy”: Pierre Cahuc (ENSAE), Dominique Rouziès (HEC), Isabelle Méjean (École polytechnique)

Area 2: “Financial Market Failures and Regulation”: François Derrien (HEC), Jean-David Fermanian, (ENSAE) Edouard Challe (École polytechnique)

Area 3: “Decision-Making and Market Regulation”: Nicolas Vieille (HEC), Philippe Choné (ENSAE), Marie-Laure Allain (École polytechnique)

Area 4: “Evaluating the Impact of Public Policies and Firm’s Decisions”: Bruno Crépon (ENSAE), Yukio Koriyama (École polytechnique), Daniel Halbheer (HEC)

Area 5:  “New Challenges for New Data”: Anna Simoni (ENSAE), Gilles Stoltz (HEC)

 

Leave the Pima indians alone: the R package

Posted in General by nicolaschopin on 4 September 2015

Hi there,

while everyone was away in July, James Ridgway and I posted our “leave (the) pima paper alone” paper on arxiv, in which we discuss to which extent probit/logit regression and not too big datasets (such as the now famous Pima Indians dataset) constitute a relevant benchmark for Bayesian computation.

The actual title of the paper is “Leave Pima Indians alone…”, but xian changed it to “Leave *the* Pima Indians alone…” when discussing it on his blog. Any opinion on whether it does sound better with “the”?

Man in the maze, the official symbol of the Pima tribe

Man in the maze, the official symbol of the Pima tribe; perhaps a metaphor for slow convergence of certain MCMC schemes

On a different note, one of our findings is that Expectation-Propagation works wonderfully for such models; yes it is an approximate method, but it is very fast, and the approximation error is consistently negligible on all the datasets we looked at.

James has just posted on CRAN the EPGLM package, which computes an EP approximation of the posterior of a logit or probit model. The documentation is a bit terse at the moment, but it is very straightforward to use.

Comments on the package, the paper, its grammar or Pima Indians are most welcome!

slides of SMC2015

Posted in General by nicolaschopin on 31 August 2015

Hi,

Adam Johansen, Thomas Schön and me co-organised SMC2015, a workshop on Sequential Monte Carlo method that took place at ENSAE last week. In case you missed it, I’ve just uploaded the slides of most talks here. Enjoy!

Unfortunate typos in read paper

Posted in General by nicolaschopin on 3 December 2014

Mathieu and I have just realised that the version of our SQMC paper made available on the RSS web site contains several unfortunate typos. In particular, the symbol for “small o” has been replaced by a “big O” by editors. For instance, Theorem 9 should state the QMC beats standard SMC; i.e. the MSE (mean square error) of an SQMC estimator is

\mathrm{MSE} = o(N^{-1})

but in the RSS version, it reads

\mathrm{MSE} = O(N^{-1}).

Well, that’s a bummer. For now, I recommend anyone to read instead the arxiv version (updated on Monday).

SQMC read paper

Posted in General by nicolaschopin on 29 October 2014

the typical RSS meeting

Another way to generate random particles

Almost 10 months since my latest post? I guess bloggin’ ain’t my thing… In my defense, Mathieu Gerber and I were quite busy revising our SQMC paper. I am happy to announce that it has just been accepted as a read paper in JRSSB. If all goes as planned, we should present the paper at  the RSS ordinary meeting on Dec 10. Everybody is welcome to attend, and submit an oral or written discussion (or both). More details soon, when the event is officially announced on the RSS web-site.

What is SQMC? It is a QMC (Quasi-Monte Carlo) version of particle filtering. For the same CPU cost, it typically generates much more accurate estimators. Interested? consider reading the paper here (more recent version coming soon), checking this video where I present SQMC, or, even better, attending our talk in London!

 

See you in Chamonix

Posted in General by nicolaschopin on 3 January 2014

LogoFinal2

Happy new year to everyone, and perhaps see you at MCMski 4 in Chamonix next week, which I expect to be a very friendly and exciting even if I’m not much into skiing. 🙂

I will talk for the first time about SQMC, a QMC (Quasi Monte Carlo) variant of particle filtering (PF) that Mathieu Gerber and I developed in the recent months. We are quite excited about it for a variety of reasons, but I will give more details shortly on this blog.

I thought that my talk would clash with a session on PMCMC, which was quite unfortunate as I suspect that session would target perhaps the same audience, but looking at the program, I see it’s no longer the case. Thanks the power that be!

I also organise a session on “Bayesian computation in Neurosciences” in MCMski 4. Feel free to come if you have interest in the subject. Myself, I think it’s a particular cool area of application, about which I know very very little… which is why I organise a session to learn more about it! 🙂 I also co-organise (with Simon Barthelmé and Adam Johansen) a workshop at Warwick on the same subject, more details soon.

another nice one from Elsevier

Posted in General by nicolaschopin on 23 December 2013

In case you have missed the new round of misdeeds by Elsevier, here is an excellent summary (plus a good overview of the current debate on open access an so on):

http://www.washingtonpost.com/blogs/the-switch/wp/2013/12/19/how-one-publisher-is-stopping-academics-from-sharing-their-research/?0

Many reactions seem to focus on Academia.edu, which is private company, so perhaps that case is no so black and white. However, I found the story (also mentioned by the WP paper) of our colleague Daniel Povey much more infuriating: Daniel put a legit copy of one of his paper on his web site, some robot wrongly detected this copy as the version owned by Elsevier, sent a DCMA take down note to Google, and boom, Google automatically shut downs Daniel’s google web page entirely. Welcome to the brave new world of robots enacting the Law.

I was talking with an Economist the other day. He told me that big corporations very rarely innovate, because they invested so much in a particular, currently lucrative, business model, even that model is doomed in the medium term. He gave me the example of Kodak: they developed the first digital camera before anyone else, yet they never managed to turn around their business model to make the transition to digital photography. They filed for bankruptcy last year.  I think the same applies to Elsevier: even if it does not even make sense for them in the long run, this company is going to fight ugly to defend its current business model (the “treasure chest behind a pay wall”, the treasure being our papers) rather that trying to transition to a new business model compatible with open access. So I guess it falls on us to consider sending our paper to new players in academic publishing.

In other news, I have heard many French Universities are going to lose any access to Elsevier journals as of 1st Jan 2014, because of failed negociations between Elsevier and these Universities, but I found little detail on the interweb on this particular story.

 

Post-doc with me?

Posted in General by nicolaschopin on 18 October 2013

a typical post-doc at CREST

Please note this is a very early, preliminary, non-official announcement, but I understand that our lab might be able to fund a post-doc position next academic year (starting around September 2014). The successful candidate would be expected to interact with a (non-empty!) subset of our Stats group (Arnak Dalayan, Eric Gautier, Judith Rousseau, Alexandre Tsybakov, and me).  In particular, I’d be  interested to hear from anyone who would like to apply in order to interact with me (and maybe other lab members) on things related to Bayesian computation (Sequential Monte Carlo, MCMC, fast approximations, etc), at least partially. I have various projects in mind, but I’m quite flexible and open to discussion. I think that the selection process might occur some time in May-June of next year, but again I don’t have exact details for now.

%d bloggers like this: