Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Speeding up R with Parallel Programming in the Cloud

1,700 views

Published on

Presented to eRum (Budapest), May 2018

There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe the doAzureParallel package, a backend to the "foreach" package that automates the process of spawning a cluster of virtual machines in the Azure cloud to process iterations in parallel. This will include an example of optimizing hyperparameters for a predictive model using the "caret" package.

Published in: Software
  • Be the first to comment

Speeding up R with Parallel Programming in the Cloud

  1. 1. Speeding up R with Parallel Programming in the Cloud David M Smith Developer Advocate, Microsoft @revodavid
  2. 2. Hi. I’m David. • Cloud Developer Advocate at Microsoft – cda.ms/7f • Board Member, R Consortium • Editor, Revolutions blog – cda.ms/7g • Twitter: @revodavid
  3. 3. Speeding up your R code • Vectorize 3 @revodavid • Vectorize • Moar megahertz! • Moar RAM! • Moar cores! • Moar computers! • Moar cloud!
  4. 4. Embarassingly Parallel Problems Easy to speed things up when: • Calculating similar things many times – Iterations in a loop, chunks of data, … • Calculations are independent of each other • Each calculation takes a decent amount of time Just run multiple calculations at the same time 4 @revodavid
  5. 5. The Birthday Paradox What is the likelihood that there are two people in this room who share the same birthday? 5 @revodavid
  6. 6. Birthday Problem Simulator 6 pbirthdaysim <- function(n) { ntests <- 100000 pop <- 1:365 anydup <- function(i) any(duplicated( sample(pop, n, replace=TRUE))) sum(sapply(seq(ntests), anydup)) / ntests } bdayp <- sapply(1:100, pbirthdaysim)  About 3 minutes (on this laptop)
  7. 7. library(foreach) Looping with the foreach package on CRAN – x is a list of results – each entry calculated from RHS of %dopar% Learn more about foreach: cda.ms/tc 7 @revodavid x <- foreach (n=1:100) %dopar% pbirthdaysim(n)
  8. 8. Parallel processing with foreach • Change how processing is done by registering a backend – registerDoSEQ() sequential processing (default) – registerdoMC() local cores via multicore (Mac / Linux) – registerdoParallel() local cluster via library(parallel) – registerdoFuture() HPC schedulers incl LSF, OpenLava & Slurm – registerAzureParallel() remote cluster in Azure Batch • Whatever you use, call to foreach does not change – Also: no need to worry about data, packages etc. (mostly) 8 @revodavid
  9. 9. Local multi-process: foreach + doParallel library(doParallel) cl <- makeCluster(2) # local cluster, 2 workers registerDoParallel(cl) bdayp <- foreach(n=1:100) %dopar% pbirthdaysim(n) Launches a separate R process for each task Works on all platforms (Windows / Mac / Unix) 9 @revodavid
  10. 10. Single node multicore: doMC on 16-CPU VM 10 @revodavid > library(doMC) > registerdoMC(cluster) > system.time( bdayp <- foreach(n=1:100) %dopar% pbirthdaysim(n) ) user system elapsed 291.617 0.760 20.743 About 11x times faster than sequential • 20s vs 220s (using Azure DS5v2 instance) Note: disable multithreaded BLAS to avoid core contention • In Microsoft R Open: setMKLthreads(1)
  11. 11. Cloud cluster: foreach + doAzureParallel doAzureParallel: A simple, open-source R package that uses the Azure Batch cluster service as a parallel-backend for foreach github.com/Azure/doAzureParallel
  12. 12. Birthday simulation: cluster 8-node cluster (standard D2v2: 2 vCPU, 7 Gb) • specify VM class in cluster.json • specify credentials for Azure Batch and Azure Storage in credentials.json 12 @revodavid library(doAzureParallel) setCredentials("credentials.json") cluster <- makeCluster("cluster.json") registerDoAzureParallel(cluster) bdayp <- foreach(n=1:100) %dopar% pbirthdaysim(n) bdayp <- unlist(bdayp) cluster.json (excerpt): "name": "davidsmi8caret", "vmSize": "Standard_D2_v2", "maxTasksPerNode": 8, "poolSize": { "dedicatedNodes": { "min": 8, "max": 8 } 45 seconds (more than 5 times faster) on a warm start
  13. 13. Cross-validation with caret • Most predictive modeling algorithms have “tuning parameters” • Example: Boosted Trees – Boosting iterations – Max Tree Depth – Shrinkage • Parameters affect model performance • Try ‘em out: cross-validate 14 @revodavid grid <- data.frame( nrounds = …, max_depth = …, gamma = …, colsample_bytree = …, min_child_weight = …, subsample = …) )
  14. 14. Cross-validation in parallel • Caret’s train function will automatically use the registered foreach backend • Just register your cluster first: registerDoAzureParallel(cluster) • Handles sending objects, packages to nodes 15 @revodavid mod <- train( Class ~ ., data = dat, method = "xgbTree", trControl = ctrl, tuneGrid = grid, nthread = 1 )
  15. 15. Low Priority Nodes • Low-Priority = (very) Low Costs VMs from surplus capacity – up to 80% discount • Clusters can mix dedicated VMs and low-priority VMs My Local R Session Azure Batch Low Priority VMs at up to 80% discount Dedicated VMs "poolSize": { "dedicatedNodes": { "min": 3, "max": 3 }, "lowPriorityNodes": { "min": 9, "max": 9 },
  16. 16. caret speed-ups • Max Kuhn benchmarked various hardware and OS for local parallel – cda.ms/6V • Let’s see how it works with doAzureParallel 17 @revodavid Source: cda.ms/6V
  17. 17. Packages and Containers • Docker images used to spawn nodes – Default: rocker/tidyverse:latest – Lots of R packages pre-installed • But this cross-validation also needs: – xgboost, e1071 • Easy fix: add to cluster.json 18 @revodavid { "name": "davidsmi8caret", "vmSize": "Standard_D2_v2", "maxTasksPerNode": 8, "poolSize": { "dedicatedNodes": { "min": 4, "max": 4 }, "lowPriorityNodes": { "min": 4, "max": 4 }, "autoscaleFormula": "QUEUE" }, "containerImage": "rocker/tidyverse:latest", "rPackages": { "cran": ["xgboost","e1071"], "github": [], "bioconductor": [] }, "commandLine": [] }
  18. 18. 19 @revodavid =================================================================================== Id: job20180126022301 chunkSize: 1 enableCloudCombine: TRUE packages: caret; errorHandling: stop wait: TRUE autoDeleteJob: TRUE =================================================================================== Submitting tasks (1250/1250) Submitting merge task. . . Job Preparation Status: Package(s) being installed....... Waiting for tasks to complete. . . | Progress: 13.84% (173/1250) | Running: 59 | Queued: 1018 | Completed: 173 | Failed: 0 | MY LAPTOP: 78 minutes THIS CLUSTER: 16 minutes (almost 5x faster)
  19. 19. Compute costs • Pay by the minute for VMs used in cluster • Using D2v2 Virtual Machines – Ubuntu 16, 7Mb RAM, 2-core “compute optimized” • 17 minutes × 8 VMs @ $0.14 / hour – about 32 cents (not counting startup or storage) 20 @revodavid
  20. 20. Thank you! Find foreach on CRAN: CRAN.R-project.org/package=foreach Code for birthday problem simulation: cda.ms/7d Docs for the foreach package: cda.ms/tc Get doAzureParallel: cda.ms/7w Get Azure subscription with $200 free credits: cda.ms/td David Smith davidsmi@microsoft.com 22 @revodavid

×