Using graphic cards to perform statistical computation

Hi!
For my first post I’m going to blog about a trendy topic in computational statistics: the increasingly common use of graphic cards to perform any kind of scientific computation (see here) and in particular statistical computation.
The fun thing about graphic cards (fun in a weird and definitely nerd way) is that they were built to display pictures (obviously), mainly for movies and video games. These cards became more and more powerful in the 90s and until now, helping the games to achieve an always higher level of photorealism. But I’m not sure their early developers realized how much they could be useful in scientific computation.
In the last years it has become clear that, for a limited amount of money, say 300$, you can get much more computational power if you buy a GPU (a graphic card processor) than a CPU (a standard processor). This gain can be of order 100 or even more!… meaning a program that takes several hours on a standard CPU can be run in less than a minute on a GPU. Of course it doesn’t work for any computation (otherwise GPU would simply replace CPU and there would be nothing to blog about).
Compared to a CPU, a GPU is extremely good at doing parallel computation. It means that it can do hundreds of little tasks at the same time, where a standard CPU with two cores can do… two tasks, basically. Algorithms that need to do a lot of independent little tasks can therefore be parallelized, other algorithms cannot.
- For example, if you have a vector
and want to take the log of each component of this vector, you can do it in parallel, because you don’t need to know the result of
in order to compute
, for any
. Similarly, evaluating a likelihood of a vector of independent variables can also be parallelized.
- On the contrary, suppose you want to compute Fibonacci numbers, which are defined by
and the recurrence relation:
. Then you obviously need to compute the n-th term before computing the n+1-th term. The program doing this computation would therefore be sequential. Any algorithm requiring a recurrence is thus hard to parallelize, and that includes for instance the Metropolis-Hastings algorithm. Though I’m working on using parallel computation to improve Metropolis-Hastings based estimation… maybe more about this in a future post.
My PhD focuses on Sequential Monte Carlo methods (SMC), also known as Particle Filters, which happen to be very easy to parallelize. Hence my interest in the topic! Instead of waiting hours for my computations to finish, I could have the results in (near) real-time. But then I would have no excuse for playing table football…
If you are interested, there are plenty of links to check. Owning a NVIDIA device, I use CUDA and its python counterpart, PyCUDA. Apparently you can also use CUDA in R with this. There is also a MATLAB plugin. An alternative is the open standard called OpenCL, that allows you to program NVIDIA and ATI devices (those two being the main graphic card designers), so that’s probably the best solution, or at least will be in the future. In the statistical literature, there has been recent articles, like this one in the SMC context. This website from the University of Oxford gives lots of links as well, and this one too from Duke University. Programming on a GPU already interests a lot of people, namely those who have intensive computations to perform (financial engineers, physicists, biologists…), I hope you’ll find it useful too.
Stay tuned for more exciting posts!…
I like the surreptitious link to table football!
Interesting and many thanks for the links. I am wondering how much computational gain can be achieved by running several video-cards in SLI or Crossfire mode and overclocking them.
Good question, I don’t know! Theoretically two GPUs would allow twice as many cores, and twice as much GPU memory. However the bandwidth between the “normal” memory and the GPUs is probably unchanged so filling the whole GPU memory would take twice as long. Anyway multiple GPUs are totally supported by CUDA C library so it should provide a significant gain.
[...] Metropolis-Hastings (IMH) algorithm that can highly benefit from parallel processing units (see this previous post for an introduction on parallel [...]
[...] it’s my temporary hobby (see here for instance), I’ve coded the Box-Muller transform with pyCUDA to take advantage of the GPU. The result is [...]
[...] Using graphic cards to perform statistical computation [...]
[...] of graphics processing units in statistics, a quickly expanding area that I’ve blogged about here. Xian and I are going to talk about Parallel IMH and Parallel Wang Landau. We’ll be able to [...]