Parallel Computing in R

2,882 views

Published on

An introduction to use the parallel computing package "snow"

  • Be the first to comment

Parallel Computing in R

  1. 1. Parallel Computing in R 2010/05/09 Tsukuba.R#7 id:mickey24
  2. 2. id: mickey24 (@mickey24)
  3. 3. Tsukuba.R
  4. 4. Tsukuba.R#4 •R Brainf*ck • Brainf*ck > hello <- "+++++++++[>++++++++>++++++++++ +>+++++<<<-]>.>++.+++++++..++ +.>-.------------.<++++++++.--------.++ +.------.--------.>+." > brainfxxk(hello) [1] "Hello, world!" http://www.slideshare.net/mickey24/rbrainfck-1085191
  5. 5. Tsukuba.R#5 • Animation with R • library(animation) http://d.hatena.ne.jp/mickey24/20090614
  6. 6. Tsukuba.R#6 • Extend R with C!!! • C R (C OpenCV ) http://d.hatena.ne.jp/mickey24/20091123/r_de_extension
  7. 7. Tsukuba.R#7
  8. 8. • •n snow •
  9. 9. n
  10. 10. • • CPU !"#! !"#! !"#! !!!! !!!! !!!! !"#!
  11. 11. 1CPU sapply(1:8, function(x){ x^2 }) (function(x){x^2})(1) (function(x){x^2})(2) (function(x){x^2})(3) (function(x){x^2})(4) !"#! (function(x){x^2})(5) (function(x){x^2})(6) (function(x){x^2})(7) (function(x){x^2})(8) [1] 1 4 9 16 25 36 49 64
  12. 12. 4CPU sapply(1:8, function(x){ x^2 }) (function(x){x^2})(1) (function(x){x^2})(3) (function(x){x^2})(5) (function(x){x^2})(7) (function(x){x^2})(2) (function(x){x^2})(4) (function(x){x^2})(6) (function(x){x^2})(8) !"#! !"#! !"#! !"#! [1] 1 4 9 16 25 36 49 64
  13. 13. • • • • 1CPU 1 → 100CPU → 10
  14. 14. R • snow • R • http://cran.r-project.org/web/packages/snow/index.html • R apply • • Socket PVM MPI
  15. 15. n snow
  16. 16. snow • 2CPU • • CPU
  17. 17. • snow > install.packages("snow") • •
  18. 18. matprod.R • 1000 n <- 1000 A <- matrix(rnorm(n^2), n) B <- matrix(rnorm(n^2), n) C <- A %*% B •
  19. 19. clmatprod.R library(snow) n <- 1000 A <- matrix(rnorm(n^2), n) B <- matrix(rnorm(n^2), n) cpu <- 2 hosts <- rep("localhost", cpu) cl <- makeCluster(hosts, type="SOCK") C <- parMM(cl, A, B) # C <- A %*% B stopCluster(cl)
  20. 20. • > source("clmatprod.R") • > head(C)
  21. 21. • • parMM(cl, A, B) • apply • parApply(cl, X, MARGIN, fun, ...) • parLapply(cl, X, fun, ...) • parSapply(cl, X, fun, ..., simplify=TRUE, USE.NAMES=TRUE) • etc. • • clusterMap(cl, fun, ..., MoreArgs = NULL, RECYCLE = TRUE) • clusterCall(cl, fun, ...) • etc. • help (?parApply, ?clusterMap )
  22. 22. snow
  23. 23. • • snow • mCPU n • O(n^3) #! $! !"#! !"#! !"#! !!!! !"#! #! !! "!
  24. 24. • 1CPU( ) system.time(A %*% B) • mCPU ( ) system.time(parMM(cl, A, B))
  25. 25. • @DBCLS • • Sun Grid Engine OpenMPI •
  26. 26. @DBCLS 1 8 CPU 8CPU 64CPU 16GB 128GB !"#$! !"#$! !"#$! !"#$! !"#$! !"#$! !"#$! !"#$!
  27. 27. 1000 ( ) 3.00 2.50 2.00 1.50 1.00 Faster! 0.50 0 1 4 8 16 CPU 4CPU 1.14
  28. 28. 3000 ( ) 40.00 30.00 20.00 10.00 Faster! 0 1 4 8 16 CPU 8CPU 2.70
  29. 29. • snow • C • C R (40 50 ) www http://d.hatena.ne.jp/ syou6162/20090117/1232120983
  30. 30. • snow • snow Simplified http://www.sfu.ca/~sblay/R/snow.html • RjpWiki - R http://www.okada.jp.org/RWiki/?R%A4%C7%CA%C2%CE%F3%B7%D7%BB%BB • RjpWiki - L.Tierney snow http://www.okada.jp.org/RWiki/?L.%20Tierney%BB%E1%A4%CEsnow %A5%D1%A5%C3%A5%B1%A1%BC%A5%B8%A4%C7%A5%AF %A5%E9%A5%B9%A5%BF%B7%D7%BB%BB%A4%F2%B9%D4%A4%A6#scce80a1 • Rmpi + snow + Sun Grid Engine • Scheduled Parallel Computing with R: R + Rmpi + OpenMPI + Sun Grid Engine (SGE) http://blog.nguyenvq.com/2010/01/20/scheduled-parallel-computing-with-r-r-rmpi- openmpi-sun-grid-engine-sge/
  31. 31. R+ Sun Grid Engine + Open MPI + snow
  32. 32. • @DBCLS • Sun Grid Engine • OpenMPI • mpirun R • Rmpi snow OpenMPI • •
  33. 33. • • Rmpi OpenMPI • snow Rmpi > install.packages("Rmpi", configure.args= "/path/to/mpidir") > install.packages("snow")
  34. 34. gmatprod.R library(Rmpi) library(snow) n <- 1000 A <- matrix(rnorm(n^2), n) B <- matrix(rnorm(n^2), n) cl <- makeCluster() # C <- parMM(cl, A, B) # C <- A %*% B stopCluster(cl)
  35. 35. gmatprod.sh #!/bin/bash #$ -S /bin/bash # /bin/bash #$ -j y # stdout stderr -o #$ -o gmatprod.log # stdout stderr #$ -pe openmpi 8 # export PATH=[path to R & snow_install_dir & mpirun]:$PATH export LD_LIBRARY_PATH=/path/to/mpidir/lib:$LD_LIBRARY_PATH MPIRUN_PATH=/path/to/mpirun MPIRUN_OPTS="-np ${NSLOTS} -machinefile ${TMPDIR}/machines" RSCRIPT=gmatprod.R RPATH=/path/to/snow_install_dir/RMPISNOW CMD="${RPATH} CMD BATCH --no-save ${RSCRIPT}" ${MPIRUN_PATH} ${MPIRUN_OPTS} ${CMD}
  36. 36. • qsub Grid Engine $ qsub gmatprod.sh
  37. 37. • • Scheduled Parallel Computing with R: R + Rmpi + OpenMPI + Sun Grid Engine (SGE) http://blog.nguyenvq.com/2010/01/20/scheduled-parallel- computing-with-r-r-rmpi-openmpi-sun-grid-engine-sge/ • Sun Grid Engine MPI R

×