SlideShare a Scribd company logo
1 of 42
Download to read offline
Tackling repetitive tasks with serial
or parallel programming in R
Speaker: Lun-Hsien CHANG
Affiliation: Immunology in Cancer and Infection, QIMR Berghofer
Meeting: R user group meeting #14
Time: 1-2 PM, 28th July 2020
Place: Seminar room, Level 6, Central building, QIMR, Brisbane
It is the central dogma but….
Even your program is working
fine, you may still want to
● Try a faster R package
than the current one
● Rewrite code that is less
error-prone
● Revise code for simplicity
and efficiency
Outline
R programming basics
● Syntax forms, data structure, vector, elapsed time
Serial computing
● For loop, vectorised functions, *apply() functions
Parallel computing
● The doParallel, parallel, foreach package
Compare time performance in serial and parallel computing
Common syntax forms in a R program
# Comments preceded by a hash
# Assign value "A.1" to variable.1
variable.1 <- "A.1"
library(package.A)
# Use function.A from package.A
function.A( argument1=values
,argument2=values
,...)
# Use function.A from package.A
package.A::function.A( argument1=values
,argument2=values
,...)
Data structure in R
What is a vector in R?
A vector is a one-dimensioned collection of numbers, characters or logicals
v1 <- c(1:5)
v1
# [1] 1 2 3 4 5
v2 <- c("a","b","c","d","e")
v2
# [1] "a" "b" "c" "d" "e"
v3 <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
v3
# [1] TRUE TRUE FALSE FALSE TRUE
v4 <- c(1, "a", TRUE, 4, "b")
v4
#[1] "1" "a" "TRUE" "4" "b"
What is elapsed time in R?
User time : defined by your operating system (OS)
System time : defined by your OS
Elapsed time : the amount of time that passes from the start of a program to its
finish
Start.time <- proc.time()
# run some R code
End.time <- proc.time() - Start.time
system.time(# run some R code)
# user system elapsed
# 0.4 0.1 132.2
Serial computing in R
What is serial (sequential) computing?
Runs on a single CPU core, solving one task at a time
Ideal for dependent tasks (e.g. Task 2 uses result from task 1)
Run time is a function of the number of tasks
Task 4Task 3Task 2Task 1
Time
Single-core
processor (CPU)
R functions that run serial computing
● for loop
● Vectorised functions
○ Most R functions taks a vector usually in their first arguments
○ Few R functions take a single value (e.g. dir.create() )
● lapply(), sapply() from the apply family
Syntax form of a for loop in R
for (i in 1:10){
Command1
Command2
...
}
Create a variable i with values 1 to 10
Syntax form of a for loop in R
for (i in 1:10){
Command1
Command2
...
}
Take each i value and do something using it
Syntax form of a for loop in R
for (i in 1:10){
Command1
Command2
...
} Close the for loop with }
Syntax forms of a for loop in R
for (i in 1:10){
Command1
Command2
...
}
for(i in c(1:10)){
Command1
Command2
...
}
This works
This works too
Vectorised operations in R
Many operations are vectorised in R, meaning that operations occur in all
elements of a vector in parallel
Task : Look up JPG images in 3 folders and get their file paths
dir.1 <- "C:/images"
dir.2 <- "D:/images"
dir.3 <- "E:/images"
list.files(path = c( dir.1, dir.2, dir.3)
,pattern = ".*.jpg"
,full.names = TRUE )
The *apply() functions
● Examples: lapply(X=, FUN=), sapply(X=, FUN=)
● Use them when the function to apply is simple
● Misconception: These are internal loops. They apply a function (FUN=) to all
the elements of a vector or list (X=). They are not faster than a for loop!
The *apply() functions
● Task: Create 3 folders under C:/images
# Specify the full path of new folders
new.folder.1 <- "C:/images/JPG"
new.folder.2 <- "C:/images/TIF"
new.folder.3 <- "C:/images/PNG"
# Create new folders using dir.create()
lapply( X= c( new.folder.1
,new.folder.2
,new.folder.3)
,FUN = function(x) dir.create(x, recursive = TRUE))
An unnecessary usage of lapply()
Tasks: check to see if 3 image folders exist
# Check the existence of 3 image
folders by lapply()
unlist(lapply(X=c( new.folder.1
,new.folder.2
,new.folder.3)
,FUN = function(x)
dir.exists(x))
)
# [1] TRUE TRUE TRUE TRUE
# By vectorised operation
dir.exists(paths = c( new.folder.1
,new.folder.2
,new.folder.3)
)
# [1] TRUE TRUE TRUE TRUE
Task: read multiple text files to a single data frame
with lapply()
Specify paths of input folders
Check to see if these input folders exist
Get full paths of input txt files
Read these files to a list
Concatenate these files as a single data frame
Read multiple text files to a single data frame (1/4)
# Specify full paths of data folders
drive.dir.C <- 'C:/Lab_MarkS'
input.data.dir <- file.path(drive.dir.C,"lunC/Immunohistochemistry_images/data_output")
input.data.folder.1 <- file.path(input.data.dir, "MT_Exp023.2_18-001-A","analysis-results")
input.data.folder.2 <- file.path(input.data.dir, "MT_Exp023.2_18-001-B","analysis-results")
input.data.folder.3 <- file.path(input.data.dir, "MT_Exp023.2_18-001-C","analysis-results")
input.data.folder.4 <- file.path(input.data.dir, "MT_Exp023.2_18-002-A","analysis-results")
input.data.folder.5 <- file.path(input.data.dir, "MT_Exp023.2_18-002-B","analysis-results")
Read multiple text files to a single data frame (2/4)
# Check if input folders exist
dir.exists(path=c( input.data.folder.1
,input.data.folder.2
,input.data.folder.3
,input.data.folder.4
,input.data.folder.5))
Read multiple text files to a single data frame (3/4)
# Get full paths of input files
input.data.file.paths <- list.files(path = c( input.data.folder.1
,input.data.folder.2
,input.data.folder.3
,input.data.folder.4
,input.data.folder.5)
,pattern =
".*cell-segmentation-summary_long-format_based-on-merged-cell-seg-file.tsv
"
,full.names = TRUE ) #
length(input.data.file.paths) 5
Read multiple text files to a single data frame (4/4)
# Read multiple tsv files to a list
input.data.list <- lapply( X=input.data.file.paths
,FUN= function(x) read.delim( file=x
,header = TRUE
,stringsAsFactors = F)
) # class(input.data.list) "list" # length(input.data.list) 5
# Combine list elements to a single data.frame
input.data.read <- do.call(what = "rbind", args = input.data.list) #
dim(input.data.read) 375 14
Parallel computing
What is parallel computing
Task 4
Task 3
Task 2
Task 1
TimeMulti-core
processor
Runs on multiple CPU cores, solving
tasks in parallel (simultaneously).
Ideal for independent tasks (i.e. Task
2 does not rely on the result from task
1)
Run time < serial computing
Parallelised programming in R
Use it when you run a batch of similar tasks that are independent of each other
● Call an R script in a Shell script multiple times in a super computer
● doParallel, parallel, foreach packages in a local computer
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
Load required packages into R (Windows users)
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
Find number of CPU cores in your computer
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
Use 2 CPU cores for R,
leave the other 2 for
software running in the
background
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
Register the cluster
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
Specify arguments in a foreach loop
for (i in 1:10){
Command1
Command2
...
}
# Load required packages
library(doParallel)
library(parallel)
library(foreach)
# Detect number of CPU cores in your computer
parallel::detectCores() # 4 cores detected
# Set up a backend with 2 CPU cores
cluster <- parallel::makeCluster(parallel::detectCores() -2 )
doParallel::registerDoParallel(cluster)
# foreach general form
foreach( i=1:10
,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{
Command.1
Command.2
... }
Syntax forms of multiple CPU cores and 1 foreach loop
for (i in 1:10){
Command1
Command2
...
}
Parallelise tasks with
%dopar%
Syntax form of nested foreach loops
# nested foreach general form
foreach( i=1:10
,.combine = 'rbind')%:%
foreach(j=1:5
,.combine = 'rbind'
,.packages =
c("package.A","package.B"))%dopar%{
command.1
command.2
}
%:% concatenates the outer and inner
foreach loop
Parallelise computation using %dopar%
Serial versus parallel
computing
Compare time used by vectorised & parallel computing
The testing tool: a birthday simulator
● A function to calculate the probability of having at least 2 people with same
birthdays given N people in the same room
● Returns N probabilities
The timing tool: system.time()
Compare time used by vectorised & parallel computing
The birthday simulation function
# Birthday problem simulator
pbirthdaysim <- function(n){
## n: number of people in the room
## ntests: number of simulations and averaging the
results
ntests <- 100000
pop <- 1:365
anydup <- function(i)
any(duplicated(
sample(pop, n, replace=TRUE)))
sum(sapply(seq(ntests), anydup)) / ntests
}
Compare the time used by vectorised and parallel
computing
system.time( # run birthday simulator using lapply())
system.time( # run birthday simulator using sapply())
system.time( # run birthday simulator using a for loop)
system.time( # run birthday simulator using 1 CPU core and foreach loop)
system.time( # run birthday simulator using all CPU cores and 1 foreach
loop)
Timing serial and parallel programming
Testing conditions:
● Dell E7440 laptop (Intel Core i5-4300U 2 x 1.9 - 2.9 GHz, Haswell.)
● 1 million simulations
Function Elapsed time
lapply
sapply
For loop
Foreach + 1 CPU core
Foreach + all CPU cores detected
sessionInfo()
# R version 4.0.0 (2020-04-24)
# Platform: x86_64-w64-mingw32/x64
(64-bit)
# Running under: Windows 7 x64 (build
7601) Service Pack 1
Don’t hesitate to ask yourself
● What is the time
performance of my
working code?
● Can I replace a loop with
vectorised functions?
● If my computing tasks are
independent, why haven’t
I used multiple CPU cores
and parallelised
computing?
Q & A
My Qs:
How many CPU cores detected in your computer?
What are the elapsed times running the birthday simulator in your R?
Your Qs?
Serial and parallel processing in real world
https://slideplayer.com/slide/7066858/
What is a CPU core?
A core, or CPU core, is the actual hardware component.
It is the "brain" of a CPU. It receives instructions, and performs calculations, or
operations, to satisfy those instructions. A CPU can have multiple cores.
A processor with two cores is called a dual-core processor; with four cores, a
quad-core; six cores, hexa-core; eight cores, octa-core.
As of 2019, the majority of consumer CPUs feature between 2 and 12 cores.

More Related Content

What's hot

Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Python programming: Anonymous functions, String operations
Python programming: Anonymous functions, String operationsPython programming: Anonymous functions, String operations
Python programming: Anonymous functions, String operationsMegha V
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparationKushaal Singla
 
Why we cannot ignore Functional Programming
Why we cannot ignore Functional ProgrammingWhy we cannot ignore Functional Programming
Why we cannot ignore Functional ProgrammingMario Fusco
 
4. python functions
4. python   functions4. python   functions
4. python functionsin4400
 
Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Pedro Rodrigues
 
Practical Functional Programming Presentation by Bogdan Hodorog
Practical Functional Programming Presentation by Bogdan HodorogPractical Functional Programming Presentation by Bogdan Hodorog
Practical Functional Programming Presentation by Bogdan Hodorog3Pillar Global
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in PythonSujith Kumar
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlowBayu Aldi Yansyah
 
Functional programming seminar (haskell)
Functional programming seminar (haskell)Functional programming seminar (haskell)
Functional programming seminar (haskell)Bikram Thapa
 
Dynamic memory allocation in c++
Dynamic memory allocation in c++Dynamic memory allocation in c++
Dynamic memory allocation in c++Tech_MX
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp languageDavid Gu
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG ManiaplAnkur Shrivastava
 
9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03Terry Yoast
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Pedro Rodrigues
 

What's hot (20)

Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Python programming: Anonymous functions, String operations
Python programming: Anonymous functions, String operationsPython programming: Anonymous functions, String operations
Python programming: Anonymous functions, String operations
 
Day3
Day3Day3
Day3
 
Day2
Day2Day2
Day2
 
Python course Day 1
Python course Day 1Python course Day 1
Python course Day 1
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
Why we cannot ignore Functional Programming
Why we cannot ignore Functional ProgrammingWhy we cannot ignore Functional Programming
Why we cannot ignore Functional Programming
 
4. python functions
4. python   functions4. python   functions
4. python functions
 
Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)
 
Practical Functional Programming Presentation by Bogdan Hodorog
Practical Functional Programming Presentation by Bogdan HodorogPractical Functional Programming Presentation by Bogdan Hodorog
Practical Functional Programming Presentation by Bogdan Hodorog
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlow
 
Functional programming seminar (haskell)
Functional programming seminar (haskell)Functional programming seminar (haskell)
Functional programming seminar (haskell)
 
Dynamic memory allocation in c++
Dynamic memory allocation in c++Dynamic memory allocation in c++
Dynamic memory allocation in c++
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp language
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG Maniapl
 
9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 

Similar to Tackling repetitive tasks with serial or parallel programming in R

Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
(1) cpp introducing the_cpp_programming_language
(1) cpp introducing the_cpp_programming_language(1) cpp introducing the_cpp_programming_language
(1) cpp introducing the_cpp_programming_languageNico Ludwig
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosEuangelos Linardos
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_PennonsoftPennonSoft
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabAbhinav Singh
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_iNico Ludwig
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and InvocationArvind Surve
 

Similar to Tackling repetitive tasks with serial or parallel programming in R (20)

Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
(1) cpp introducing the_cpp_programming_language
(1) cpp introducing the_cpp_programming_language(1) cpp introducing the_cpp_programming_language
(1) cpp introducing the_cpp_programming_language
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Tackling repetitive tasks with serial or parallel programming in R

  • 1. Tackling repetitive tasks with serial or parallel programming in R Speaker: Lun-Hsien CHANG Affiliation: Immunology in Cancer and Infection, QIMR Berghofer Meeting: R user group meeting #14 Time: 1-2 PM, 28th July 2020 Place: Seminar room, Level 6, Central building, QIMR, Brisbane
  • 2. It is the central dogma but…. Even your program is working fine, you may still want to ● Try a faster R package than the current one ● Rewrite code that is less error-prone ● Revise code for simplicity and efficiency
  • 3. Outline R programming basics ● Syntax forms, data structure, vector, elapsed time Serial computing ● For loop, vectorised functions, *apply() functions Parallel computing ● The doParallel, parallel, foreach package Compare time performance in serial and parallel computing
  • 4. Common syntax forms in a R program # Comments preceded by a hash # Assign value "A.1" to variable.1 variable.1 <- "A.1" library(package.A) # Use function.A from package.A function.A( argument1=values ,argument2=values ,...) # Use function.A from package.A package.A::function.A( argument1=values ,argument2=values ,...)
  • 6. What is a vector in R? A vector is a one-dimensioned collection of numbers, characters or logicals v1 <- c(1:5) v1 # [1] 1 2 3 4 5 v2 <- c("a","b","c","d","e") v2 # [1] "a" "b" "c" "d" "e" v3 <- c(TRUE, TRUE, FALSE, FALSE, TRUE) v3 # [1] TRUE TRUE FALSE FALSE TRUE v4 <- c(1, "a", TRUE, 4, "b") v4 #[1] "1" "a" "TRUE" "4" "b"
  • 7. What is elapsed time in R? User time : defined by your operating system (OS) System time : defined by your OS Elapsed time : the amount of time that passes from the start of a program to its finish Start.time <- proc.time() # run some R code End.time <- proc.time() - Start.time system.time(# run some R code) # user system elapsed # 0.4 0.1 132.2
  • 9. What is serial (sequential) computing? Runs on a single CPU core, solving one task at a time Ideal for dependent tasks (e.g. Task 2 uses result from task 1) Run time is a function of the number of tasks Task 4Task 3Task 2Task 1 Time Single-core processor (CPU)
  • 10. R functions that run serial computing ● for loop ● Vectorised functions ○ Most R functions taks a vector usually in their first arguments ○ Few R functions take a single value (e.g. dir.create() ) ● lapply(), sapply() from the apply family
  • 11. Syntax form of a for loop in R for (i in 1:10){ Command1 Command2 ... } Create a variable i with values 1 to 10
  • 12. Syntax form of a for loop in R for (i in 1:10){ Command1 Command2 ... } Take each i value and do something using it
  • 13. Syntax form of a for loop in R for (i in 1:10){ Command1 Command2 ... } Close the for loop with }
  • 14. Syntax forms of a for loop in R for (i in 1:10){ Command1 Command2 ... } for(i in c(1:10)){ Command1 Command2 ... } This works This works too
  • 15. Vectorised operations in R Many operations are vectorised in R, meaning that operations occur in all elements of a vector in parallel Task : Look up JPG images in 3 folders and get their file paths dir.1 <- "C:/images" dir.2 <- "D:/images" dir.3 <- "E:/images" list.files(path = c( dir.1, dir.2, dir.3) ,pattern = ".*.jpg" ,full.names = TRUE )
  • 16. The *apply() functions ● Examples: lapply(X=, FUN=), sapply(X=, FUN=) ● Use them when the function to apply is simple ● Misconception: These are internal loops. They apply a function (FUN=) to all the elements of a vector or list (X=). They are not faster than a for loop!
  • 17. The *apply() functions ● Task: Create 3 folders under C:/images # Specify the full path of new folders new.folder.1 <- "C:/images/JPG" new.folder.2 <- "C:/images/TIF" new.folder.3 <- "C:/images/PNG" # Create new folders using dir.create() lapply( X= c( new.folder.1 ,new.folder.2 ,new.folder.3) ,FUN = function(x) dir.create(x, recursive = TRUE))
  • 18. An unnecessary usage of lapply() Tasks: check to see if 3 image folders exist # Check the existence of 3 image folders by lapply() unlist(lapply(X=c( new.folder.1 ,new.folder.2 ,new.folder.3) ,FUN = function(x) dir.exists(x)) ) # [1] TRUE TRUE TRUE TRUE # By vectorised operation dir.exists(paths = c( new.folder.1 ,new.folder.2 ,new.folder.3) ) # [1] TRUE TRUE TRUE TRUE
  • 19. Task: read multiple text files to a single data frame with lapply() Specify paths of input folders Check to see if these input folders exist Get full paths of input txt files Read these files to a list Concatenate these files as a single data frame
  • 20. Read multiple text files to a single data frame (1/4) # Specify full paths of data folders drive.dir.C <- 'C:/Lab_MarkS' input.data.dir <- file.path(drive.dir.C,"lunC/Immunohistochemistry_images/data_output") input.data.folder.1 <- file.path(input.data.dir, "MT_Exp023.2_18-001-A","analysis-results") input.data.folder.2 <- file.path(input.data.dir, "MT_Exp023.2_18-001-B","analysis-results") input.data.folder.3 <- file.path(input.data.dir, "MT_Exp023.2_18-001-C","analysis-results") input.data.folder.4 <- file.path(input.data.dir, "MT_Exp023.2_18-002-A","analysis-results") input.data.folder.5 <- file.path(input.data.dir, "MT_Exp023.2_18-002-B","analysis-results")
  • 21. Read multiple text files to a single data frame (2/4) # Check if input folders exist dir.exists(path=c( input.data.folder.1 ,input.data.folder.2 ,input.data.folder.3 ,input.data.folder.4 ,input.data.folder.5))
  • 22. Read multiple text files to a single data frame (3/4) # Get full paths of input files input.data.file.paths <- list.files(path = c( input.data.folder.1 ,input.data.folder.2 ,input.data.folder.3 ,input.data.folder.4 ,input.data.folder.5) ,pattern = ".*cell-segmentation-summary_long-format_based-on-merged-cell-seg-file.tsv " ,full.names = TRUE ) # length(input.data.file.paths) 5
  • 23. Read multiple text files to a single data frame (4/4) # Read multiple tsv files to a list input.data.list <- lapply( X=input.data.file.paths ,FUN= function(x) read.delim( file=x ,header = TRUE ,stringsAsFactors = F) ) # class(input.data.list) "list" # length(input.data.list) 5 # Combine list elements to a single data.frame input.data.read <- do.call(what = "rbind", args = input.data.list) # dim(input.data.read) 375 14
  • 25. What is parallel computing Task 4 Task 3 Task 2 Task 1 TimeMulti-core processor Runs on multiple CPU cores, solving tasks in parallel (simultaneously). Ideal for independent tasks (i.e. Task 2 does not rely on the result from task 1) Run time < serial computing
  • 26. Parallelised programming in R Use it when you run a batch of similar tasks that are independent of each other ● Call an R script in a Shell script multiple times in a super computer ● doParallel, parallel, foreach packages in a local computer
  • 27. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop Load required packages into R (Windows users)
  • 28. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop Find number of CPU cores in your computer
  • 29. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop Use 2 CPU cores for R, leave the other 2 for software running in the background
  • 30. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop Register the cluster
  • 31. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop Specify arguments in a foreach loop for (i in 1:10){ Command1 Command2 ... }
  • 32. # Load required packages library(doParallel) library(parallel) library(foreach) # Detect number of CPU cores in your computer parallel::detectCores() # 4 cores detected # Set up a backend with 2 CPU cores cluster <- parallel::makeCluster(parallel::detectCores() -2 ) doParallel::registerDoParallel(cluster) # foreach general form foreach( i=1:10 ,.combine = 'c',.packages = c("package.A",”package.B”))%dopar%{ Command.1 Command.2 ... } Syntax forms of multiple CPU cores and 1 foreach loop for (i in 1:10){ Command1 Command2 ... } Parallelise tasks with %dopar%
  • 33. Syntax form of nested foreach loops # nested foreach general form foreach( i=1:10 ,.combine = 'rbind')%:% foreach(j=1:5 ,.combine = 'rbind' ,.packages = c("package.A","package.B"))%dopar%{ command.1 command.2 } %:% concatenates the outer and inner foreach loop Parallelise computation using %dopar%
  • 35. Compare time used by vectorised & parallel computing The testing tool: a birthday simulator ● A function to calculate the probability of having at least 2 people with same birthdays given N people in the same room ● Returns N probabilities The timing tool: system.time()
  • 36. Compare time used by vectorised & parallel computing The birthday simulation function # Birthday problem simulator pbirthdaysim <- function(n){ ## n: number of people in the room ## ntests: number of simulations and averaging the results ntests <- 100000 pop <- 1:365 anydup <- function(i) any(duplicated( sample(pop, n, replace=TRUE))) sum(sapply(seq(ntests), anydup)) / ntests }
  • 37. Compare the time used by vectorised and parallel computing system.time( # run birthday simulator using lapply()) system.time( # run birthday simulator using sapply()) system.time( # run birthday simulator using a for loop) system.time( # run birthday simulator using 1 CPU core and foreach loop) system.time( # run birthday simulator using all CPU cores and 1 foreach loop)
  • 38. Timing serial and parallel programming Testing conditions: ● Dell E7440 laptop (Intel Core i5-4300U 2 x 1.9 - 2.9 GHz, Haswell.) ● 1 million simulations Function Elapsed time lapply sapply For loop Foreach + 1 CPU core Foreach + all CPU cores detected sessionInfo() # R version 4.0.0 (2020-04-24) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 7 x64 (build 7601) Service Pack 1
  • 39. Don’t hesitate to ask yourself ● What is the time performance of my working code? ● Can I replace a loop with vectorised functions? ● If my computing tasks are independent, why haven’t I used multiple CPU cores and parallelised computing?
  • 40. Q & A My Qs: How many CPU cores detected in your computer? What are the elapsed times running the birthday simulator in your R? Your Qs?
  • 41. Serial and parallel processing in real world https://slideplayer.com/slide/7066858/
  • 42. What is a CPU core? A core, or CPU core, is the actual hardware component. It is the "brain" of a CPU. It receives instructions, and performs calculations, or operations, to satisfy those instructions. A CPU can have multiple cores. A processor with two cores is called a dual-core processor; with four cores, a quad-core; six cores, hexa-core; eight cores, octa-core. As of 2019, the majority of consumer CPUs feature between 2 and 12 cores.