Using graphic cards to perform statistical computation

Posted in Geek by Pierre Jacob on 16 July 2010


For my first post I’m going to blog about a trendy topic in computational statistics: the increasingly common use of graphic cards to perform any kind of scientific computation (see here) and in particular statistical computation.

The fun thing about graphic cards (fun in a weird and definitely nerd way) is that they were built to display pictures (obviously), mainly for movies and video games. These cards became more and more powerful in the 90s and until now, helping the games to achieve an always higher level of photorealism. But I’m not sure their early developers realized how much they could be useful in scientific computation.

In the last years it has become clear that, for a limited amount of money, say 300$, you can get much more computational power if you buy a GPU (a graphic card processor) than a CPU (a standard processor). This gain can be of order 100 or even more!… meaning a program that takes several hours on a standard CPU can be run in less than a minute on a GPU. Of course it doesn’t work for any computation (otherwise GPU would simply replace CPU and there would be nothing to blog about).

Compared to a CPU, a GPU is extremely good at doing parallel computation. It means that it can do hundreds of little tasks at the same time, where a standard CPU with two cores can do… two tasks, basically. Algorithms that need to do a lot of independent little tasks can therefore be parallelized, other algorithms cannot.

  • For example, if you have a vector (x_0, x_1, \ldots x_n) and want to take the log of each component of this vector, you can do it in parallel, because you don’t need to know the result of log(x_i) in order to compute log(x_j), for any i \neq j. Similarly, evaluating a likelihood of a vector of independent variables can also be parallelized.
  • On the contrary, suppose you want to compute Fibonacci numbers, which are defined by F_0 = F_1 = 1 and  the recurrence relation:  F_n = F_{n-1} + F_{n-2}. Then you obviously need to compute the n-th term before computing the n+1-th term. The program doing this computation would therefore be sequential. Any algorithm requiring a recurrence is thus hard to parallelize, and that includes for instance the Metropolis-Hastings algorithm. Though I’m working on using parallel computation to improve Metropolis-Hastings based estimation… maybe more about this in a future post.

My PhD focuses on Sequential Monte Carlo methods (SMC), also known as Particle Filters, which happen to be very easy to parallelize. Hence my interest in the topic! Instead of waiting hours for my computations to finish, I could have the results in (near) real-time. But then I would have no excuse for playing table football

If you are interested, there are plenty of links to check. Owning a NVIDIA device, I use CUDA and its python counterpart, PyCUDA. Apparently you can also use CUDA in R with this. There is also a MATLAB plugin. An alternative is the open standard called OpenCL, that allows you to program NVIDIA and ATI devices (those two being the main graphic card designers), so that’s probably the best solution, or at least will be in the future. In the statistical literature, there has been recent articles, like this one in the SMC context. This website from the University of Oxford gives lots of links as well, and this one too from Duke University. Programming on a GPU already interests a lot of people, namely those who have intensive computations to perform (financial engineers, physicists, biologists…), I hope you’ll find it useful too.

Stay tuned for more exciting posts!…


7 Responses

Subscribe to comments with RSS.

  1. Julyan said, on 16 July 2010 at 12:15

    I like the surreptitious link to table football!

  2. Aledine said, on 3 September 2010 at 03:50

    Interesting and many thanks for the links. I am wondering how much computational gain can be achieved by running several video-cards in SLI or Crossfire mode and overclocking them.

    • pierrejacob said, on 3 September 2010 at 19:09

      Good question, I don’t know! Theoretically two GPUs would allow twice as many cores, and twice as much GPU memory. However the bandwidth between the “normal” memory and the GPUs is probably unchanged so filling the whole GPU memory would take twice as long. Anyway multiple GPUs are totally supported by CUDA C library so it should provide a significant gain.

  3. […] Metropolis-Hastings (IMH) algorithm that can highly benefit from parallel processing units (see this previous post for an introduction on parallel […]

  4. […] it’s my temporary hobby (see here for instance), I’ve coded the Box-Muller transform with pyCUDA to take advantage of the GPU. The result is […]

  5. […] Using graphic cards to perform statistical computation […]

  6. […] of graphics processing units in statistics, a quickly expanding area that I’ve blogged about here. Xian and I are going to talk about Parallel IMH and Parallel Wang Landau. We’ll be able to […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: