SlideShare a Scribd company logo
Realizing Robust and Scalable Evolutionary
Algorithms toward Exascale Era
Masaharu Munetomo
Information Initiative Center,
Hokkaido University, Sapporo, JAPAN.
Toward the next generations
supercomputing
• Next year (2012): 10 PFlops systems will be installed
• “K” Computer (Fujitsu) : 10PFlops
• Sequoia (IBM) : 20PFlops
• Near Future (~2018): Toward exascale (1018
)
• International Exa-Scale Software Project
Potential Architecture in Exa-Flops Era
• # of nodes: O(106
)
• # of concurrency
per node: O(103
)
• Total concurrency: O(109
)
• Memory BW: 400GB/s
• Inter-node BW: 50GB/s
(Jack Dongarra
Presentation@SC10)
Uniform or Heterogeneous
• Uniform Architecture: using cores with the same architecture
• PROS: Natural approach, easy to program
• CONS: Needs more power compared with heterogeneous architecture
• Heterogeneous Architecture: CPU + Accelerator with different architecture
• PROS: Energy efficiency, concurrency
• CONS: Difficult to program and optimize compared with uniform one
Challenges toward Exa-Scale
• Improvement of bandwidth and delay of the interconnect are not enough
compared to that by increasing the number of cores
• It is difficult to develop massively parallel library for the many-core
architecture
• It is also difficult to develop scalable algorithms for many-core
• In the field of high performance computing, their efforts concentrate on
parallel numerical library such as large-scale matrix calculations. Extreme
scale parallel optimization is not considered.
Scalability of Master-Worker Model
• When delay is proportional to # of processors
• When delay is proportional to log of # of processors (ideal situation)
Extremely Parallel Master-Slave Model
• P* should be around
100,000 ~ 1,000,000
• X
should be more
than 1010
!!
• OK
• It is preferable if we can
hide communication
overheads.
Toward exa-scale massive parallelization
• Communication latency should be at most O(log n)
• Network hardware architecture: Hypercube, etc.
• Network software stack: Optimizing collective communications, etc.
• Algorithms that reduces communications
• Employing heterogeneous architecture: GPGPU, etc.
Massive parallelization of Evolutionary
computation
• Master-Worker Model: Same as general parallelized algorithms
• Island models: difficult to implement massive parallelism as it is
• Localize inter-node communications such as cellular GA
• Massive parallelization of advanced methods like EDAs
Dependency (communications) among variables should be considered→
   ex) DBOA, pLINC, pD5 etc.
• Massive parallelization on cloud systems: MapReduce GA, ECGA, etc.
Research toward exa-scale ECs
• We should assume more than million-way parallelization
• Making communication overheads as less as possible
• Designing algorithms with less communications necessary
• Dependency on the target application problems
• When fitness evaluations are costly, simply parallelize them is enough
• Parallelization of “intelligent” algorithms are necessary to consider
• Analyzing interactions among genes to solve complex problems
Promising approach toward massive
parallelization
• Massively parallel cellular GAs with local communications on many-core
• Massive parallelization of linkage identification: pLINC, pLIEM, pD5
• Parallelization of EDAs: pBOA, gBOA (A GPGPU implementation)
• GPGPU implementations: gBOA, MAXSAT, MINLP
• MapReduce implementations: MapReduce GA, ECGA, linkage identification
A cellular GA implemented on CellBE [Asim
2009]
• A cellular GA (cGA) to solve Capacitated Vehicle Routing
Problem (CVRP) implemented on a many-core architecture,
Cell BroadBand Engine (Sony, IBM, Toshiba).
• Cellular GA was employed due
to limited communication BW
between SPE and PPE.
Parallelization of linkage identification
techniques
• Linkage identification techniques are based on pair-wise perturbations to
detect nonlinearity/non-monotonicity.
• It is relatively easy to parallelize by
assigning calculation of each pair
to each processors
• pLINC, pLIEM, pD5
have been proposed.
• We have succeeded in solving a half-million bit problems by employing pD5
A half-million-bit optimization with pD5
[Tsuji
2004]
• A 500,000-bit problem
(5bit trap x 100,000 )
• # of local optima
= 2100,000
-1 10≒ 30,000
• HITACHI SR-11000
model K1 (5.4TFlops)
• 32 days by 1 node (16 CPU)
• 1 day by whole system
(40 nodes)
pBOA on GPGPU [Asim 2008]
• Implementation of parallel Bayesian Optimization
Algorithm on NVIDIA CUDA.
Massive Parallelization over the Cloud:
MapReduce
• MapReduce is a simple but powerful approach to realize massive
parallelization over Cloud computing systems
• There are several papers have been published on MapReduce
implementations on GA, ECGA.
• MapReduce ECGA: Calculation of Marginal Product Model (MPM) and
generating offsprings can be parallelized via
“Map” process.
• Fault tolerance can be handled byMapReduce
for massive parallelization
Concluding Remarks
• Designing robust and scalable algorithms is not easy, since analysis of
linkage or interactions among genes should be necessary which usually
needs at least O(l2
) computational overheads and communications among
processors.
• In order to adapt to exa-scale massively parallel processing cores of more
than O(106
), we need to reduce communication overheads of each message
exchange less than O(log n). Otherwise, the parallel algorithm cannot scale
well to such massively parallel architectures.
• We also need to adapt to modern architecture and systems such as
heterogeneous architecture with GPGPUs and cloud computing environment.

More Related Content

What's hot

Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Linda J
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
Jaey Jeong
 
HyperCuP Lightweight Implementation
HyperCuP Lightweight ImplementationHyperCuP Lightweight Implementation
HyperCuP Lightweight Implementation
Slawek
 
Circuit Simplifier
Circuit SimplifierCircuit Simplifier
Circuit Simplifier
Vineet Markan
 
Extracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated applicationExtracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated application
Jônatas Paganini
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Fme goes nuclear (part 2)
Fme goes nuclear (part 2)Fme goes nuclear (part 2)
Fme goes nuclear (part 2)
Safe Software
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
HPCC Systems
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
Matthias Feys
 
Nanosatellite Components Catalogue German Orbital Systems
Nanosatellite Components Catalogue German Orbital SystemsNanosatellite Components Catalogue German Orbital Systems
Nanosatellite Components Catalogue German Orbital Systems
IKosenkov
 
Container orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platformsContainer orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platforms
FogGuru MSCA Project
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
Wim Vanderbauwhede
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
MLconf
 
Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow
Rajiv Shah
 
COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)Aniket Tekawade
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
Bishnu Rawal
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Lalit Jain
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 

What's hot (20)

Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 
HyperCuP Lightweight Implementation
HyperCuP Lightweight ImplementationHyperCuP Lightweight Implementation
HyperCuP Lightweight Implementation
 
Circuit Simplifier
Circuit SimplifierCircuit Simplifier
Circuit Simplifier
 
Extracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated applicationExtracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated application
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Fme goes nuclear (part 2)
Fme goes nuclear (part 2)Fme goes nuclear (part 2)
Fme goes nuclear (part 2)
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Nanosatellite Components Catalogue German Orbital Systems
Nanosatellite Components Catalogue German Orbital SystemsNanosatellite Components Catalogue German Orbital Systems
Nanosatellite Components Catalogue German Orbital Systems
 
Container orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platformsContainer orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platforms
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow
 
COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 

Similar to Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era

FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
Wim Vanderbauwhede
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
mbreternitz
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
Facultad de Informática UCM
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
khstandrews
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
Mateusz Dymczyk
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
Eurotech Aurora
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Xavier Llorà
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
Arnaud Rachez
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
 
EXASXALE COMPUTING
EXASXALE COMPUTINGEXASXALE COMPUTING
EXASXALE COMPUTING
balakrishnan Bsk
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
inside-BigData.com
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
Ganesan Narayanasamy
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
Koichi Shirahata
 

Similar to Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era (20)

FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
EXASXALE COMPUTING
EXASXALE COMPUTINGEXASXALE COMPUTING
EXASXALE COMPUTING
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 

More from Masaharu Munetomo

APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)
Masaharu Munetomo
 
インタークラウドシステムの実用化に向けて
インタークラウドシステムの実用化に向けてインタークラウドシステムの実用化に向けて
インタークラウドシステムの実用化に向けて
Masaharu Munetomo
 
研究者のためのアカデミックインタークラウド
研究者のためのアカデミックインタークラウド研究者のためのアカデミックインタークラウド
研究者のためのアカデミックインタークラウド
Masaharu Munetomo
 
遺伝的アルゴリズムにおけるリンケージ同定
遺伝的アルゴリズムにおけるリンケージ同定遺伝的アルゴリズムにおけるリンケージ同定
遺伝的アルゴリズムにおけるリンケージ同定
Masaharu Munetomo
 
進化計算シンポジウム200712
進化計算シンポジウム200712進化計算シンポジウム200712
進化計算シンポジウム200712Masaharu Munetomo
 
分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術
Masaharu Munetomo
 
20110824弱小クラウド連合は大規模クラウドに勝てるか
20110824弱小クラウド連合は大規模クラウドに勝てるか20110824弱小クラウド連合は大規模クラウドに勝てるか
20110824弱小クラウド連合は大規模クラウドに勝てるかMasaharu Munetomo
 
研究支援に係るアカデミッククラウド システムの調査検討
研究支援に係るアカデミッククラウド システムの調査検討研究支援に係るアカデミッククラウド システムの調査検討
研究支援に係るアカデミッククラウド システムの調査検討
Masaharu Munetomo
 
分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術
Masaharu Munetomo
 
ビッグデータ時代のアカデミッククラウド
ビッグデータ時代のアカデミッククラウドビッグデータ時代のアカデミッククラウド
ビッグデータ時代のアカデミッククラウド
Masaharu Munetomo
 
北海道大学情報基盤センター10周年記念講演スライド(公開版)
北海道大学情報基盤センター10周年記念講演スライド(公開版)北海道大学情報基盤センター10周年記念講演スライド(公開版)
北海道大学情報基盤センター10周年記念講演スライド(公開版)
Masaharu Munetomo
 
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
Masaharu Munetomo
 
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
Masaharu Munetomo
 

More from Masaharu Munetomo (13)

APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)
 
インタークラウドシステムの実用化に向けて
インタークラウドシステムの実用化に向けてインタークラウドシステムの実用化に向けて
インタークラウドシステムの実用化に向けて
 
研究者のためのアカデミックインタークラウド
研究者のためのアカデミックインタークラウド研究者のためのアカデミックインタークラウド
研究者のためのアカデミックインタークラウド
 
遺伝的アルゴリズムにおけるリンケージ同定
遺伝的アルゴリズムにおけるリンケージ同定遺伝的アルゴリズムにおけるリンケージ同定
遺伝的アルゴリズムにおけるリンケージ同定
 
進化計算シンポジウム200712
進化計算シンポジウム200712進化計算シンポジウム200712
進化計算シンポジウム200712
 
分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術
 
20110824弱小クラウド連合は大規模クラウドに勝てるか
20110824弱小クラウド連合は大規模クラウドに勝てるか20110824弱小クラウド連合は大規模クラウドに勝てるか
20110824弱小クラウド連合は大規模クラウドに勝てるか
 
研究支援に係るアカデミッククラウド システムの調査検討
研究支援に係るアカデミッククラウド システムの調査検討研究支援に係るアカデミッククラウド システムの調査検討
研究支援に係るアカデミッククラウド システムの調査検討
 
分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術分散クラウドシステムにおける遠隔連携技術
分散クラウドシステムにおける遠隔連携技術
 
ビッグデータ時代のアカデミッククラウド
ビッグデータ時代のアカデミッククラウドビッグデータ時代のアカデミッククラウド
ビッグデータ時代のアカデミッククラウド
 
北海道大学情報基盤センター10周年記念講演スライド(公開版)
北海道大学情報基盤センター10周年記念講演スライド(公開版)北海道大学情報基盤センター10周年記念講演スライド(公開版)
北海道大学情報基盤センター10周年記念講演スライド(公開版)
 
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
研究支援のためのアカデミッククラウド(CloudWeek2013@Hokkaido University)
 
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
Hokkaido University Academic Cloud: Largest Academic Cloud System in Japan
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era

  • 1. Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era Masaharu Munetomo Information Initiative Center, Hokkaido University, Sapporo, JAPAN.
  • 2. Toward the next generations supercomputing • Next year (2012): 10 PFlops systems will be installed • “K” Computer (Fujitsu) : 10PFlops • Sequoia (IBM) : 20PFlops • Near Future (~2018): Toward exascale (1018 ) • International Exa-Scale Software Project
  • 3. Potential Architecture in Exa-Flops Era • # of nodes: O(106 ) • # of concurrency per node: O(103 ) • Total concurrency: O(109 ) • Memory BW: 400GB/s • Inter-node BW: 50GB/s (Jack Dongarra Presentation@SC10)
  • 4. Uniform or Heterogeneous • Uniform Architecture: using cores with the same architecture • PROS: Natural approach, easy to program • CONS: Needs more power compared with heterogeneous architecture • Heterogeneous Architecture: CPU + Accelerator with different architecture • PROS: Energy efficiency, concurrency • CONS: Difficult to program and optimize compared with uniform one
  • 5. Challenges toward Exa-Scale • Improvement of bandwidth and delay of the interconnect are not enough compared to that by increasing the number of cores • It is difficult to develop massively parallel library for the many-core architecture • It is also difficult to develop scalable algorithms for many-core • In the field of high performance computing, their efforts concentrate on parallel numerical library such as large-scale matrix calculations. Extreme scale parallel optimization is not considered.
  • 6. Scalability of Master-Worker Model • When delay is proportional to # of processors • When delay is proportional to log of # of processors (ideal situation)
  • 7. Extremely Parallel Master-Slave Model • P* should be around 100,000 ~ 1,000,000 • X should be more than 1010 !! • OK • It is preferable if we can hide communication overheads.
  • 8. Toward exa-scale massive parallelization • Communication latency should be at most O(log n) • Network hardware architecture: Hypercube, etc. • Network software stack: Optimizing collective communications, etc. • Algorithms that reduces communications • Employing heterogeneous architecture: GPGPU, etc.
  • 9. Massive parallelization of Evolutionary computation • Master-Worker Model: Same as general parallelized algorithms • Island models: difficult to implement massive parallelism as it is • Localize inter-node communications such as cellular GA • Massive parallelization of advanced methods like EDAs Dependency (communications) among variables should be considered→    ex) DBOA, pLINC, pD5 etc. • Massive parallelization on cloud systems: MapReduce GA, ECGA, etc.
  • 10. Research toward exa-scale ECs • We should assume more than million-way parallelization • Making communication overheads as less as possible • Designing algorithms with less communications necessary • Dependency on the target application problems • When fitness evaluations are costly, simply parallelize them is enough • Parallelization of “intelligent” algorithms are necessary to consider • Analyzing interactions among genes to solve complex problems
  • 11. Promising approach toward massive parallelization • Massively parallel cellular GAs with local communications on many-core • Massive parallelization of linkage identification: pLINC, pLIEM, pD5 • Parallelization of EDAs: pBOA, gBOA (A GPGPU implementation) • GPGPU implementations: gBOA, MAXSAT, MINLP • MapReduce implementations: MapReduce GA, ECGA, linkage identification
  • 12. A cellular GA implemented on CellBE [Asim 2009] • A cellular GA (cGA) to solve Capacitated Vehicle Routing Problem (CVRP) implemented on a many-core architecture, Cell BroadBand Engine (Sony, IBM, Toshiba). • Cellular GA was employed due to limited communication BW between SPE and PPE.
  • 13. Parallelization of linkage identification techniques • Linkage identification techniques are based on pair-wise perturbations to detect nonlinearity/non-monotonicity. • It is relatively easy to parallelize by assigning calculation of each pair to each processors • pLINC, pLIEM, pD5 have been proposed. • We have succeeded in solving a half-million bit problems by employing pD5
  • 14. A half-million-bit optimization with pD5 [Tsuji 2004] • A 500,000-bit problem (5bit trap x 100,000 ) • # of local optima = 2100,000 -1 10≒ 30,000 • HITACHI SR-11000 model K1 (5.4TFlops) • 32 days by 1 node (16 CPU) • 1 day by whole system (40 nodes)
  • 15. pBOA on GPGPU [Asim 2008] • Implementation of parallel Bayesian Optimization Algorithm on NVIDIA CUDA.
  • 16. Massive Parallelization over the Cloud: MapReduce • MapReduce is a simple but powerful approach to realize massive parallelization over Cloud computing systems • There are several papers have been published on MapReduce implementations on GA, ECGA. • MapReduce ECGA: Calculation of Marginal Product Model (MPM) and generating offsprings can be parallelized via “Map” process. • Fault tolerance can be handled byMapReduce for massive parallelization
  • 17. Concluding Remarks • Designing robust and scalable algorithms is not easy, since analysis of linkage or interactions among genes should be necessary which usually needs at least O(l2 ) computational overheads and communications among processors. • In order to adapt to exa-scale massively parallel processing cores of more than O(106 ), we need to reduce communication overheads of each message exchange less than O(log n). Otherwise, the parallel algorithm cannot scale well to such massively parallel architectures. • We also need to adapt to modern architecture and systems such as heterogeneous architecture with GPGPUs and cloud computing environment.

Editor's Notes

  1. I’m Masaharu Munetomo from information initiative center of Hokkaido university, one of the Japanese national supercomputing centers. This talk is to review current status of massively parallel evolutionary algorithms toward exa-scale era. Robustness means that an algorithm can be applicable to wide-spectrum of applications by analyzing interactions among genes. Realizing robustness and scalability at the same time is considered difficult generally.