Parallel R

4,151 views
3,924 views

Published on

Published in: Technology

Parallel R

  1. 1. Parallel MCMC Random Number Generators SummaryParallel Bayesian computation in R ≥ 2.14 using the packages foreach and parallel Matt Moores Cathy Hargrave Bayesian Research & Applications Group Queensland University of Technology, Brisbane, Australia CRICOS provider no. 00213J Thursday September 27, 2012 BRAG Sept. 27 Parallel MCMC in R
  2. 2. Parallel MCMC Random Number Generators SummaryOutline 1 Parallel MCMC Introduction R packages 2 Random Number Generators RNG and parallel MCMC RNGs available in R BRAG Sept. 27 Parallel MCMC in R
  3. 3. Parallel MCMC Introduction Random Number Generators R packages SummaryMotivation Why parallel? large datasets many MCMC iterations multiple CPU cores now commonplace eg. Intel Core i5 and i7 even mobile phones have multicore CPUs BRAG Sept. 27 Parallel MCMC in R
  4. 4. Parallel MCMC Introduction Random Number Generators R packages SummaryParallel MCMC 2 kinds of parallelism: concurrent MCMC chains always applicable straightforward to implement concurrent updates within an iteration only useful for a very large parameter space ideally in a compiled language (eg. Rcpp with OpenMP) also implicit parallelism, eg. with Intel Math Kernel Library BRAG Sept. 27 Parallel MCMC in R
  5. 5. Parallel MCMC Introduction Random Number Generators R packages SummaryConcurrent Chains BRAG Sept. 27 Parallel MCMC in R
  6. 6. Parallel MCMC Introduction Random Number Generators R packages SummarySimple Network Of Workstations R package snow by Luke Tierney, et al. spawns multiple copies of R provides several options for inter-process communication TCP sockets available on any platform, including Microsoft Windows Message Passing Interface (via the package Rmpi) Parallel Virtual Machine (via the package rpvm) NetWorkSpaces (via the package nws) can either run on a local machine or a cluster (eg. Lyra) BRAG Sept. 27 Parallel MCMC in R
  7. 7. Parallel MCMC Introduction Random Number Generators R packages Summarymulticore R package by Simon Urbanek implemented using the POSIX fork system call available on Linux and Mac OS X clones the R instance (functions + data) takes advantage of copy-on-write will fork as many processes as there are available CPU cores, unless told otherwise BRAG Sept. 27 Parallel MCMC in R
  8. 8. Parallel MCMC Introduction Random Number Generators R packages Summaryparallel R package parallel included in the core R distribution available in versions ≥ 2.14.0 incorporates subsets of snow, multicore, and rlecuyer sensible default behaviour BRAG Sept. 27 Parallel MCMC in R
  9. 9. Parallel MCMC Introduction Random Number Generators R packages Summaryforeach "syntactic sugar" § l i b r a r y ( foreach ) library ( parallel ) library ( doParallel ) # w i l l a u t o m a t i c a l l y use a SOCK c l u s t e r on Windows # ( o t h e r w i s e uses m u l t i c o r e ) r e g i s t e r D o P a r a l l e l ( cores = d e t e c t C o r e s ( ) ) f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% { # t h i s code w i l l be executed c o n c u r r e n t l y ... } BRAG Sept. 27 Parallel MCMC in R
  10. 10. Parallel MCMC Introduction Random Number Generators R packages Summaryforeach with SNOW § l i b r a r y ( foreach ) library ( parallel ) library ( doParallel ) # setup l o c a l SOCK c l u s t e r f o r 4 CPU cores c l ← makePSOCKcluster ( 4 ) registerDoParallel ( cl ) f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% { # t h i s code w i l l be executed c o n c u r r e n t l y ... } stopCluster ( cl ) BRAG Sept. 27 Parallel MCMC in R
  11. 11. Parallel MCMC Introduction Random Number Generators R packages Summaryforeach with multicore § l i b r a r y ( foreach ) library ( parallel ) library ( doParallel ) # f o r k one c h i l d process f o r each CPU core c l ← makeForkCluster ( d e t e c t C o r e s ( ) ) registerDoParallel ( cl ) f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% { # t h i s code w i l l be executed c o n c u r r e n t l y ... } BRAG Sept. 27 Parallel MCMC in R
  12. 12. Parallel MCMC Introduction Random Number Generators R packages Summaryforeach with CODA If your Gibbs sampler returns an mcmc object, these can be conbined into an mcmc.list: § l i b r a r y ( coda ) samples . l i s t ← f o r e a c h ( i =1: getDoParWorkers ( ) , . combine=mcmc . l i s t , . m u l t i c o m b i n e =T ) %dopar% { # t h i s code w i l l be executed c o n c u r r e n t l y ... } BRAG Sept. 27 Parallel MCMC in R
  13. 13. Parallel MCMC Introduction Random Number Generators R packages Summaryforeach with other libraries You need to declare any libraries that are used inside the child process. For example: § l i b r a r y ( mvtnorm ) l i b r a r y ( coda ) f o r e a c h ( i =1: getDoParWorkers ( ) , . packages=c ( "mvtnorm" , "coda" ) ) %dopar% { # t h i s code uses mcmc ( . . . ) and rmvnorm ( . . . ) ... } BRAG Sept. 27 Parallel MCMC in R
  14. 14. Parallel MCMC RNG and parallel MCMC Random Number Generators RNGs available in R SummaryRandom Number Generators for parallel MCMC The chains of our Gibbs sampler run independently, but: if the same RNG is seeded with the same value, all of the chains will generate the same random numbers in the same sequence - they will be identical! we either need to use: different seeds, or different random number generators for each chain (preferably both) it is also advisable to choose (or generate) different initial values in each chain of our Gibbs sampler BRAG Sept. 27 Parallel MCMC in R
  15. 15. Parallel MCMC RNG and parallel MCMC Random Number Generators RNGs available in R SummaryMersenne Twister The default RNG in R pseudo-random sequence with 32bit precision periodicity of 219937 − 1 takes 0.4 seconds to generate 107 random numbers on an Intel Core i5 running R 2.15.1 and Windows 7 open-source implementation available at: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.htmlMatsumoto & Nishimura (1998) TOMACS 8: 3–30. BRAG Sept. 27 Parallel MCMC in R
  16. 16. Parallel MCMC RNG and parallel MCMC Random Number Generators RNGs available in R SummaryOther RNGs in the base package Wichmann-Hill (1982) Applied Statistics 31, 188–190. Marsaglia-Multicarry (Usenet newsgroup sci.stat.math, 1997) Super-Duper (Reeds, J., Hubert, S. and Abrahams, M., 1982–4) For JAGS with up to 4 concurrent chains: § r n g I n i t s ← p a r a l l e l . seeds ( "base::BaseRNG" , 4 ) BRAG Sept. 27 Parallel MCMC in R
  17. 17. Parallel MCMC RNG and parallel MCMC Random Number Generators RNGs available in R SummaryL’Ecuyer Available via R libraries rlecuyer or parallel Multiple independent streams of random numbers Periodicity ≈ 2191 (each stream is a subsequence of length 2127 ) 0.6 seconds to generate 107 random numbers via runif To initialize each child process in a SNOW cluster with an independent stream: § c l ← makeCluster ( 4 ) clusterSetRNGStream ( c l ) registerDoParallel ( cl )L’Ecuyer, et al. (2002) Operations Research, 50(6): 1073–1075. BRAG Sept. 27 Parallel MCMC in R
  18. 18. Parallel MCMC Random Number Generators SummarySummary Most MCMC algorithms are "embarrasingly parallel" chains run independently (as long as the RNG is set up correctly) The R packages foreach and doParallel make parallelism easy, on any computing platform Related topics (not covered in this presentation): Running R on a supercomputer (eg. lyra.qut.edu.au) Cloud computing with Apache Hadoop GPU programming in R (nVidia CUDA) BRAG Sept. 27 Parallel MCMC in R
  19. 19. Appendix For Further ReadingFor Further Reading Norman Matloff The Art of R Programming. No Starch Press, 2011. M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney & U. Mansmann State of the Art in Parallel Computing with R. Journal of Statistical Software, 31(1), 2009. P. L’Ecuyer, R. Simard, E.J. Chen & W.D. Kelton An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6): 1073–1075, 2002. M. Matsumoto & T. Nishimura Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulation, 8: 3–30, 1998. BRAG Sept. 27 Parallel MCMC in R

×