SlideShare a Scribd company logo
Scalability comparison: Traditional fork-join-based
parallelism vs. Goroutines
Porting the Barcelona OpenMP Tasks Suite to Go
Artjom Simon
https://github.com/artjomsimon/go-bots
Know Your Gophers
2015-05-12
Traditional approach in C
Cilk:
cilk_spawn task();
[...]
cilk_sync;
OpenMP:
#pragma omp parallel
{
#pragma omp task
[...]
#pragma omp taskwait
[...]
}
Go: Parallel For Loop Pattern1
queue := make(chan int)
done := make(chan bool)
NP := runtime.GOMAXPROCS(0)
go func() {
for i := 0; i < n; i++ { queue <- i }
close(queue)
}()
for i := 0; i < NP; i++ {
go func() {
for i := range queue { work(i) }
done<-true
}()
}
for i := 0; i < NP; i++ { <-done }
1
Benchmarking Usability and Performance of Multicore Languages, PDF:
http://arxiv.org/pdf/1302.2837v2
Barcelona OpenMP Tasks Suite2
2
https://github.com/alcides/bots
...used in academic publications3
3
http://www.sarc-ip.org/files/null/Workshop/1234128788173_
_TSchedStrat-iwomp08.pdf
Micro benchmarks
1 8 16 32 48
1
8
16
32
48
OMP_NUM_THREADS
Speeduprel.toseq.
spc (opteron)
n=1000µs
n=100µs
n=10µs
Figure: Speedup spc (icc), 10 000 Tasks
Task pools: Variations
• notaskpool
Start Goroutines as needed, no limitation, uses WaitGroup for
synchronization
• simple-queue
Buffered channel of func()s holds task queue. n goroutines
receive the func()s and execute them
• goroutines-dispatcher
Dispatcher function, executing tasks in Goroutine only if a
global counter of running goroutines is < n
• const-goroutines
n goroutines remove tasks from a double-linked list
Micro benchmarks
1 8 16 32 48
1
8
16
32
48
OMP_NUM_THREADS
Speeduprel.zusequentiell
spc (opteron)
gcc
icc
clang
go-notaskpool
go-simple-queue
go-const-goroutines
go-goroutine-dispatch
Figure: Speedup spc, n=100µs, 10 000 Tasks
BOTS: nqueens
• N-Queens problem with n=12
• Recursive backtracking search
• No cut-off when creating tasks
Ergebnisse: BOTS (nqueens)
1 8 16 32 48
0
5
10
CPU cores
Speedup nqueens (opteron)
gcc
icc
clang
go-const-goroutines
go-dispatch
go-notaskpool
gccgo-const-goroutines
gccgo-dispatch
gccgo-notaskpool
Figure: Speedup for nqueens -n 12, parallel
BOTS: sparselu
• LU factorization of a sparse block matrix
• 50x50-Matrix, 100x100 sub block matrices
Results: BOTS (sparselu)
1 8 16 32 48
0
10
20
30
CPU cores
Speedup sparselu (opteron)
gcc
icc
clang
go-const-goroutines
go-dispatch
go-notaskpool
go-simplequeue
gccgo-const-goroutines
gccgo-dispatch
gccgo-notaskpool
Figure: Speedup sparselu -n 50 -m 100, parallel
Problem: Recursion (dependencies!)
Memory
opteron
0
0.5
1
1.5
·105
RSS[Kbytes] spc-par 10000 1000, 4 Threads
gcc
icc
clang
go-notaskpool
go-const-goroutines
gccgo-notaskpool
gccgo-const-goroutines
Figure: Memory comparison (Resident Set Size), spc parallel
Side effect: Possible heap corruption bug in Go 1.4?
Questions?
Thank you!
Image credits
Icon N-Queens problem: Colin M.L. Burnett, Wikimedia Commons,
(GFDL & BSD & GPL)
http://commons.wikimedia.org/wiki/File:Chess_d45.svg
(2015-03-09)

More Related Content

What's hot

jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
Jimmy Schementi
 
Introduction to nand2 tetris
Introduction to nand2 tetrisIntroduction to nand2 tetris
Introduction to nand2 tetris
Yodalee
 
Event Loop in Javascript
Event Loop in JavascriptEvent Loop in Javascript
Event Loop in Javascript
DiptiGandhi4
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
Igalia
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010
Dirkjan Bussink
 
Multi qubit entanglement
Multi qubit entanglementMulti qubit entanglement
Multi qubit entanglement
Vijayananda Mohire
 
なぜ検索しなかったのか
なぜ検索しなかったのかなぜ検索しなかったのか
なぜ検索しなかったのか
N Masahiro
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)
Sean Krail
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
Pavel Klimiankou
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
Till Blume
 
ClojureScript - A functional Lisp for the browser
ClojureScript - A functional Lisp for the browserClojureScript - A functional Lisp for the browser
ClojureScript - A functional Lisp for the browser
Falko Riemenschneider
 
MFC Prog
MFC ProgMFC Prog
Introduction to RevKit
Introduction to RevKitIntroduction to RevKit
Introduction to RevKit
Mathias Soeken
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
Kwanghee Choi
 
Compiler presention
Compiler presentionCompiler presention
Compiler presention
Faria Priya
 
Gameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyGameboy emulator in rust and web assembly
Gameboy emulator in rust and web assembly
Yodalee
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)
kao kuo-tung
 
[COSCUP 2018] uTensor C++ Code Generator
[COSCUP 2018] uTensor C++ Code Generator[COSCUP 2018] uTensor C++ Code Generator
[COSCUP 2018] uTensor C++ Code Generator
Yin-Chen Liao
 
Toy Model Overview
Toy Model OverviewToy Model Overview
Toy Model Overview
Brandon McKinzie
 
Yacf
YacfYacf

What's hot (20)

jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
 
Introduction to nand2 tetris
Introduction to nand2 tetrisIntroduction to nand2 tetris
Introduction to nand2 tetris
 
Event Loop in Javascript
Event Loop in JavascriptEvent Loop in Javascript
Event Loop in Javascript
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010
 
Multi qubit entanglement
Multi qubit entanglementMulti qubit entanglement
Multi qubit entanglement
 
なぜ検索しなかったのか
なぜ検索しなかったのかなぜ検索しなかったのか
なぜ検索しなかったのか
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
ClojureScript - A functional Lisp for the browser
ClojureScript - A functional Lisp for the browserClojureScript - A functional Lisp for the browser
ClojureScript - A functional Lisp for the browser
 
MFC Prog
MFC ProgMFC Prog
MFC Prog
 
Introduction to RevKit
Introduction to RevKitIntroduction to RevKit
Introduction to RevKit
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
Compiler presention
Compiler presentionCompiler presention
Compiler presention
 
Gameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyGameboy emulator in rust and web assembly
Gameboy emulator in rust and web assembly
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)
 
[COSCUP 2018] uTensor C++ Code Generator
[COSCUP 2018] uTensor C++ Code Generator[COSCUP 2018] uTensor C++ Code Generator
[COSCUP 2018] uTensor C++ Code Generator
 
Toy Model Overview
Toy Model OverviewToy Model Overview
Toy Model Overview
 
Yacf
YacfYacf
Yacf
 

Similar to Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
Intel IT Center
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
Tzung-Bi Shih
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
Patrick Diehl
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Badoo Development
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
MarlonMagtibay2
 
Job Queue in Golang
Job Queue in GolangJob Queue in Golang
Job Queue in Golang
Bo-Yi Wu
 
Functional go
Functional goFunctional go
Functional go
Geison Goes
 
Concurrency in go
Concurrency in goConcurrency in go
Concurrency in go
borderj
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
Dhammpal Ramtake
 
The Ring programming language version 1.5.3 book - Part 89 of 184
The Ring programming language version 1.5.3 book - Part 89 of 184The Ring programming language version 1.5.3 book - Part 89 of 184
The Ring programming language version 1.5.3 book - Part 89 of 184
Mahmoud Samir Fayed
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional features
Rafal Rybacki
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
EDB
 
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Tzung-Bi Shih
 
Ruby Functional Programming
Ruby Functional ProgrammingRuby Functional Programming
Ruby Functional Programming
Geison Goes
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
Patrick Vergain
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
Fabian Pedregosa
 

Similar to Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go (20)

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
Job Queue in Golang
Job Queue in GolangJob Queue in Golang
Job Queue in Golang
 
Functional go
Functional goFunctional go
Functional go
 
Concurrency in go
Concurrency in goConcurrency in go
Concurrency in go
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
 
The Ring programming language version 1.5.3 book - Part 89 of 184
The Ring programming language version 1.5.3 book - Part 89 of 184The Ring programming language version 1.5.3 book - Part 89 of 184
The Ring programming language version 1.5.3 book - Part 89 of 184
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional features
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
 
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
 
Ruby Functional Programming
Ruby Functional ProgrammingRuby Functional Programming
Ruby Functional Programming
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 

Recently uploaded

SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 

Recently uploaded (20)

SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 

Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

  • 1. Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines Porting the Barcelona OpenMP Tasks Suite to Go Artjom Simon https://github.com/artjomsimon/go-bots Know Your Gophers 2015-05-12
  • 2. Traditional approach in C Cilk: cilk_spawn task(); [...] cilk_sync; OpenMP: #pragma omp parallel { #pragma omp task [...] #pragma omp taskwait [...] }
  • 3. Go: Parallel For Loop Pattern1 queue := make(chan int) done := make(chan bool) NP := runtime.GOMAXPROCS(0) go func() { for i := 0; i < n; i++ { queue <- i } close(queue) }() for i := 0; i < NP; i++ { go func() { for i := range queue { work(i) } done<-true }() } for i := 0; i < NP; i++ { <-done } 1 Benchmarking Usability and Performance of Multicore Languages, PDF: http://arxiv.org/pdf/1302.2837v2
  • 4. Barcelona OpenMP Tasks Suite2 2 https://github.com/alcides/bots
  • 5. ...used in academic publications3 3 http://www.sarc-ip.org/files/null/Workshop/1234128788173_ _TSchedStrat-iwomp08.pdf
  • 6. Micro benchmarks 1 8 16 32 48 1 8 16 32 48 OMP_NUM_THREADS Speeduprel.toseq. spc (opteron) n=1000µs n=100µs n=10µs Figure: Speedup spc (icc), 10 000 Tasks
  • 7. Task pools: Variations • notaskpool Start Goroutines as needed, no limitation, uses WaitGroup for synchronization • simple-queue Buffered channel of func()s holds task queue. n goroutines receive the func()s and execute them • goroutines-dispatcher Dispatcher function, executing tasks in Goroutine only if a global counter of running goroutines is < n • const-goroutines n goroutines remove tasks from a double-linked list
  • 8. Micro benchmarks 1 8 16 32 48 1 8 16 32 48 OMP_NUM_THREADS Speeduprel.zusequentiell spc (opteron) gcc icc clang go-notaskpool go-simple-queue go-const-goroutines go-goroutine-dispatch Figure: Speedup spc, n=100µs, 10 000 Tasks
  • 9. BOTS: nqueens • N-Queens problem with n=12 • Recursive backtracking search • No cut-off when creating tasks
  • 10. Ergebnisse: BOTS (nqueens) 1 8 16 32 48 0 5 10 CPU cores Speedup nqueens (opteron) gcc icc clang go-const-goroutines go-dispatch go-notaskpool gccgo-const-goroutines gccgo-dispatch gccgo-notaskpool Figure: Speedup for nqueens -n 12, parallel
  • 11. BOTS: sparselu • LU factorization of a sparse block matrix • 50x50-Matrix, 100x100 sub block matrices
  • 12. Results: BOTS (sparselu) 1 8 16 32 48 0 10 20 30 CPU cores Speedup sparselu (opteron) gcc icc clang go-const-goroutines go-dispatch go-notaskpool go-simplequeue gccgo-const-goroutines gccgo-dispatch gccgo-notaskpool Figure: Speedup sparselu -n 50 -m 100, parallel
  • 14. Memory opteron 0 0.5 1 1.5 ·105 RSS[Kbytes] spc-par 10000 1000, 4 Threads gcc icc clang go-notaskpool go-const-goroutines gccgo-notaskpool gccgo-const-goroutines Figure: Memory comparison (Resident Set Size), spc parallel
  • 15. Side effect: Possible heap corruption bug in Go 1.4?
  • 18. Image credits Icon N-Queens problem: Colin M.L. Burnett, Wikimedia Commons, (GFDL & BSD & GPL) http://commons.wikimedia.org/wiki/File:Chess_d45.svg (2015-03-09)