SlideShare a Scribd company logo
Mapping Parallel Programs into Hierarchical
Distributed Computer Systems
Prof. Victor G. Khoroshevsky and Mikhail G. Kurnosov
Computer Systems Laboratory,
The A.V. Rzhanov Institute of Semiconductor Physics of Siberian Branch of
Russian Academy of Sciences,
13 Lavrentyev ave., 630090 Novosibirsk, Russia
E-mail: mkurnosov@isp.nsc.ru
4th International Conference on Software and Data Technologies (ICSOFT 2009)
Sofia, Bulgaria, 26 - 29 July, 2009
Mapping High-Performance Linpack into
hierarchical computer cluster:
Mapping by standard MPI-tools (mpiexec) –
execution time 118 sec. (44 GFLOPS)
Optimized mapping –
execution time 100 sec. (53 GFLOPS)
Mapping Parallel Programs into
Hierarchical Distributed Computer Systems
High-Performance Linpack task graph
(NP=8, PMAP=1, BCAST=5)
Computer cluster with
hierarchical organization
Two SMP-nodes: 2 x Intel Xeon 5150
2ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
Level 1
Level 2
Related Work
3ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
1. Mapping parallel programs into computer systems (CS) with a
fixed network topology (hybercube, 3D-torus, mesh, etc). A parallel program
represented by a task graph:
(Yu, 2006), (Chen et al. 2006), (Bhanot et al. 2005), (Jose, 1999),
(Ahmad, 1997), (Kalinowski, 1994), (Yau, 1993), (Ercal et al. 1990), (Lee, 1989),
(Bokhari, 1981).
2. Mapping parallel programs into CSs with arbitrary topology. A parallel
program represented by unweighted task graph:
(Ucar et al., 2006), (Prakash et al., 2004), (Miquel et al., 2003), (Träff, 2002),
(Moh, 2001), (Perego, 1998), (Lee, 1989).
Algorithms considering a hierarchical organization of modern distributed
computer systems are needed.
The objective of our research – is development of models and algorithms for
mapping parallel programs into modern hierarchical computer systems
(such as, multicore computer clusters).
Model of Hierarchical Organization of Distributed Computer System
Example of hierarchical organization of computer cluster:
N = 12; L = 3; n23 = 2; C23 = {9, 10, 11, 12}; g(3, 3, 4) = 2; z(1, 7) = 1
Denotations:
C = {1, 2, …, N} – is a set of processor cores;
L – is a number of levels in communication network;
nl – is a number of elements placed at level l ∈ {1, 2, …, L};
nlk – is a number of children of element k ∈ {1, 2, …, nl} at level l;
Сlk – is a set of processor cores belonging to the descendants of element k at level l; clk = |Clk|;
bl – is a bandwidth of communication channels at level l (bit/sec.).
4ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
Given a task graph G = (V, E) and a description of hierarchical organization of
computer system (CS):
• V = {1, 2, …, M} – is a set of parallel processes;
• E ⊆ V × V – is a set of inter-process communications;
• dij – is a volume of data transmitted between process i and j for a program execution time;
• bz(p, q) – is a bandwidth of communication channel between cores p and q;
Mapping – is a function f : V → C, which is defined by values of
Objective – is to minimize a program execution time T(X).
The Problem of Mapping Parallel Programs into Hierarchical
Distributed Computer Systems
)(1 1 1
),( minmax)(
ijx
M
j
N
p
N
q
qpzijjqip
Vi
bdxxXT →






⋅⋅= ∑ ∑∑
= = =∈
,1
1
∑=
=
N
j
ijx ,,...,2,1 Mi =
,1
1
∑=
≤
M
i
ijx ,,...,2,1 Nj =
},1,0{∈ijx ,Vi∈ .Cj ∈
Subject to the constraints:



≠
=
=
.)(else,0
;)(if,1
jif
jif
xij
5ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
Task graph partitioning:
The Heuristic Algorithm TMMGP
6ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
b3
b3
b2
b3
b1
1V′
2V ′
3V ′
 1LcMk =
Step 1 –
Partitioning
Step 2 –
Mapping
Task Graph Partitioning in the TMMGP algorithm
7ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
… …
1. Coarse graph:
Heavy Edge Matching
(Karypis, Kumar, 1998)
2. Partition graph Gm into k subsets by
recursive bisection (Schloegel et al. 2003)
3. Refine partition
by FM heuristic
(Fiduccia,
Mattheyses, 1982)
A computational complexity of TMMGP algorithm is O(|E|log2k + M)
Software Tools for Mapping MPI Programs
8ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
Experiments Organization
9ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
MPI programs:
• NAS Parallel Benchmarks (NPB);
• High-Performance Linpack (HPL).
Computer clusters:
• Cluster Xeon16: 4 nodes (2 x Intel Xeon 5150), interconnect: Gigabit/Fast Ethernet;
• Cluster Opteron10: 5 nodes (2 x AMD Opteron 248), interconnect: Gigabit/Fast Ethernet.
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
HPL task graph:
16 processes, PMAP=0, BCAST=5
NPB Conjugate Gradient task graph:
16 processes, CLASS B
NPB Multigrid task graph:
16 processes, CLASS B
Experiment Results
10ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
The execution time of TMMGP algorithm on Intel Core 2 Duo 2.13 GHz processor is less then 1 sec.
Cluster
interconnect
T(XRR), sec. T(XTMMGP), sec.
Speedup
T(XRR) / T(XTMMGP)
High-Performance Linpack
Fast Ethernet 1108.69 911.81 1.22
Gigabit
Ethernet
263.15 231.72 1.14
NPB Conjugate Gradient
Fast Ethernet 726.02 400.36 1.81
Gigabit
Ethernet
97.56 42.05 2.32
NPB Multigrid
Fast Ethernet 23.94 23.90 1.00
Gigabit
Ethernet
4.06 4.03 1.00
• T(XRR) – is the execution time of MPI benchmark with mapping by round robin algorithm
of mpiexec tool (MPICH2 1.0.6).
• T(XTMMGT) – is the execution time of MPI benchmark with mapping by TMMGP algorithm.
The execution time of MPI benchmarks on Xeon16 cluster
Conclusions and Future Works
11ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
Conclusions
• It is required to take into account a hierarchical organization of modern
computer systems and structures of parallel programs in mapping algorithms.
• The proposed algorithm TMMGP allows to reduce execution time of
MPI-programs on 40% in average.
• New algorithms for mapping parallel programs with full task graphs are
required.
Future Works
• Development of new algorithms for mapping parallel programs into arbitrary
subsystems of hierarchical distributed computer systems.
• Integrating the mapping algorithm TMMGP with mpiexec tool and resource
management systems (such as TORQUE).
• Application of the descried approach for optimizing MPI collective operations.
Mapping Parallel Programs into Hierarchical
Distributed Computer Systems
Victor G. Khoroshevsky and Mikhail G. Kurnosov
Computer Systems Laboratory,
The A.V. Rzhanov Institute of Semiconductor Physics of
Siberian Branch of Russian Academy of Sciences,
13 Lavrentyev ave., 630090 Novosibirsk, Russia
E-mail: mkurnosov@isp.nsc.ru
4th International Conference on Software and Data Technologies (ICSOFT 2009)
Sofia, Bulgaria, 26 - 29 July, 2009
Thank You For Your Attention
Backup Slides
ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
The k-way Graph Partitioning Problem
ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
The example of 3-way graph partitioning:
V’ = {1, 2, …, 12}; k = 3; s = 3;
W(1, 2) = 3; W(1, 3) = 2; W(2, 3) = 4.
It is required to partition graph G’ = (V’, E’) into k disjoint subsets such that
maximal sum of edge-weights incident to any subset is minimized and |V’i| ≤ s.
• w(u, v) – is a weight of edge (u, v) ∈ E’;
• W(i, j) – is an additional weight for edges incident to subsets i and j;
• c(u, v, i, j) = w(u, v)W(i, j) – is a total weight of edge (u, v) incident to subsets i and j.
kVVV ′′′ ,...,, 21
The approximate partition:
edge-weights(V’1) = w(1, 5)W(1, 2) +
+ w(6, 8)W(1, 3) + w(2, 3)W(1, 3) = 22;
edge-weights(V’2) = 40
edge-weights(V’3) = 38
1V′
2V′
3V′),,(/),,,( jiLguv bdjivuc =
Heavy Edge Matching algorithm
ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Coarser graphMatching (source graph)
HEM
Graph Bisection
ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
5
3
5
6
4
2
5
3
2
1
4
Initial vertexBisection

More Related Content

What's hot

Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
journalBEEI
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
佳蓉 倪
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
Amgad Muhammad
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
Universitat Politècnica de Catalunya
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Preferred Networks
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
Serhii Havrylov
 
Learning Communication with Neural Networks
Learning Communication with Neural NetworksLearning Communication with Neural Networks
Learning Communication with Neural Networks
hytae
 
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
IOSR Journals
 
ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network
신동 강
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Ryo Takahashi
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
ArchiLab 7
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
Enhancement and Analysis of Chaotic Image Encryption Algorithms
Enhancement and Analysis of Chaotic Image Encryption Algorithms Enhancement and Analysis of Chaotic Image Encryption Algorithms
Enhancement and Analysis of Chaotic Image Encryption Algorithms
cscpconf
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTM
Kyuri Kim
 

What's hot (20)

Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
 
Learning Communication with Neural Networks
Learning Communication with Neural NetworksLearning Communication with Neural Networks
Learning Communication with Neural Networks
 
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
 
ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Enhancement and Analysis of Chaotic Image Encryption Algorithms
Enhancement and Analysis of Chaotic Image Encryption Algorithms Enhancement and Analysis of Chaotic Image Encryption Algorithms
Enhancement and Analysis of Chaotic Image Encryption Algorithms
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTM
 

Viewers also liked

ELE2611 Classe 7 - Circuits non-linéaires statiques
ELE2611 Classe 7 - Circuits non-linéaires statiquesELE2611 Classe 7 - Circuits non-linéaires statiques
ELE2611 Classe 7 - Circuits non-linéaires statiques
Jerome LE NY
 
LPIC1 10 03 cron
LPIC1 10 03 cronLPIC1 10 03 cron
LPIC1 10 03 cronNoël
 
Le dessalement,l'alternative durable
Le dessalement,l'alternative durableLe dessalement,l'alternative durable
Le dessalement,l'alternative durable
Degrémont
 
Tema LOE
Tema LOETema LOE
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
Divya Grover
 
Real time pedestrian detection, tracking, and distance estimation
Real time pedestrian detection, tracking, and distance estimationReal time pedestrian detection, tracking, and distance estimation
Real time pedestrian detection, tracking, and distance estimation
omid Asudeh
 
A la découverte du Web sémantique
A la découverte du Web sémantiqueA la découverte du Web sémantique
A la découverte du Web sémantique
Gautier Poupeau
 

Viewers also liked (7)

ELE2611 Classe 7 - Circuits non-linéaires statiques
ELE2611 Classe 7 - Circuits non-linéaires statiquesELE2611 Classe 7 - Circuits non-linéaires statiques
ELE2611 Classe 7 - Circuits non-linéaires statiques
 
LPIC1 10 03 cron
LPIC1 10 03 cronLPIC1 10 03 cron
LPIC1 10 03 cron
 
Le dessalement,l'alternative durable
Le dessalement,l'alternative durableLe dessalement,l'alternative durable
Le dessalement,l'alternative durable
 
Tema LOE
Tema LOETema LOE
Tema LOE
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Real time pedestrian detection, tracking, and distance estimation
Real time pedestrian detection, tracking, and distance estimationReal time pedestrian detection, tracking, and distance estimation
Real time pedestrian detection, tracking, and distance estimation
 
A la découverte du Web sémantique
A la découverte du Web sémantiqueA la découverte du Web sémantique
A la découverte du Web sémantique
 

Similar to Mapping Parallel Programs into Hierarchical Distributed Computer Systems

New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
IJERA Editor
 
Dc project 1
Dc project 1Dc project 1
Dc project 1
shwetha mk
 
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
Kim Hammar
 
TUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESSTTUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESST
multimediaeval
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
supratikmondal6
 
Optimized Network-coded Scalable Video Multicasting over eMBMS Networks
Optimized Network-coded Scalable Video Multicasting over eMBMS NetworksOptimized Network-coded Scalable Video Multicasting over eMBMS Networks
Optimized Network-coded Scalable Video Multicasting over eMBMS Networks
Andrea Tassi
 
CSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and MaintenanceCSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and Maintenance
Sumaiya Ismail
 
IJCCI2023.pdf
IJCCI2023.pdfIJCCI2023.pdf
IJCCI2023.pdf
Gabriella Casalino
 
Unsupervised learning networks
Unsupervised learning networksUnsupervised learning networks
Unsupervised learning networks
Dr. C.V. Suresh Babu
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Low complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acsLow complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acs
IAEME Publication
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
Alexander Decker
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
eSAT Publishing House
 
Parallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using openclParallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using opencl
eSAT Journals
 
Data acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor NetworkData acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor Network
Rutvik Pensionwar
 
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITSEXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
VLSICS Design
 
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
ijngnjournal
 
Implementation and validation of multiplier less fpga based digital filter
Implementation and validation of multiplier less fpga based digital filterImplementation and validation of multiplier less fpga based digital filter
Implementation and validation of multiplier less fpga based digital filter
IAEME Publication
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Chenghao Jin
 

Similar to Mapping Parallel Programs into Hierarchical Distributed Computer Systems (20)

New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
 
Dc project 1
Dc project 1Dc project 1
Dc project 1
 
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
Kim Hammar & Konstantin Sozinov - Distributed LSTM training - Predicting Huma...
 
TUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESSTTUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESST
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Optimized Network-coded Scalable Video Multicasting over eMBMS Networks
Optimized Network-coded Scalable Video Multicasting over eMBMS NetworksOptimized Network-coded Scalable Video Multicasting over eMBMS Networks
Optimized Network-coded Scalable Video Multicasting over eMBMS Networks
 
CSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and MaintenanceCSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and Maintenance
 
IJCCI2023.pdf
IJCCI2023.pdfIJCCI2023.pdf
IJCCI2023.pdf
 
Unsupervised learning networks
Unsupervised learning networksUnsupervised learning networks
Unsupervised learning networks
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Low complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acsLow complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acs
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 
Parallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using openclParallel knn on gpu architecture using opencl
Parallel knn on gpu architecture using opencl
 
Data acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor NetworkData acquisition and storage in Wireless Sensor Network
Data acquisition and storage in Wireless Sensor Network
 
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITSEXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
 
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
Implementation of Pipelined Architecture for Physical Downlink Channels of 3G...
 
Implementation and validation of multiplier less fpga based digital filter
Implementation and validation of multiplier less fpga based digital filterImplementation and validation of multiplier less fpga based digital filter
Implementation and validation of multiplier less fpga based digital filter
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 

More from Mikhail Kurnosov

Векторизация кода (семинар 2)
Векторизация кода (семинар 2)Векторизация кода (семинар 2)
Векторизация кода (семинар 2)
Mikhail Kurnosov
 
Векторизация кода (семинар 3)
Векторизация кода (семинар 3)Векторизация кода (семинар 3)
Векторизация кода (семинар 3)
Mikhail Kurnosov
 
Векторизация кода (семинар 1)
Векторизация кода (семинар 1)Векторизация кода (семинар 1)
Векторизация кода (семинар 1)
Mikhail Kurnosov
 
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
Mikhail Kurnosov
 
Лекция 7. Стандарт OpenMP (подолжение)
Лекция 7. Стандарт OpenMP (подолжение)Лекция 7. Стандарт OpenMP (подолжение)
Лекция 7. Стандарт OpenMP (подолжение)
Mikhail Kurnosov
 
Лекция 6. Стандарт OpenMP
Лекция 6. Стандарт OpenMPЛекция 6. Стандарт OpenMP
Лекция 6. Стандарт OpenMP
Mikhail Kurnosov
 
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
Mikhail Kurnosov
 
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
Mikhail Kurnosov
 
Лекция 5. B-деревья (B-trees, k-way merge sort)
Лекция 5. B-деревья (B-trees, k-way merge sort)Лекция 5. B-деревья (B-trees, k-way merge sort)
Лекция 5. B-деревья (B-trees, k-way merge sort)
Mikhail Kurnosov
 
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
Mikhail Kurnosov
 
Лекция 4. Префиксные деревья (tries, prefix trees)
Лекция 4. Префиксные деревья (tries, prefix trees)Лекция 4. Префиксные деревья (tries, prefix trees)
Лекция 4. Префиксные деревья (tries, prefix trees)
Mikhail Kurnosov
 
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
Mikhail Kurnosov
 
Лекция 3. АВЛ-деревья (AVL trees)
Лекция 3. АВЛ-деревья (AVL trees)Лекция 3. АВЛ-деревья (AVL trees)
Лекция 3. АВЛ-деревья (AVL trees)
Mikhail Kurnosov
 
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
Mikhail Kurnosov
 
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
Mikhail Kurnosov
 
Лекция 1. Амортизационный анализ (amortized analysis)
Лекция 1. Амортизационный анализ (amortized analysis)Лекция 1. Амортизационный анализ (amortized analysis)
Лекция 1. Амортизационный анализ (amortized analysis)
Mikhail Kurnosov
 
Семинар 12. Параллельное программирование на MPI (часть 5)
Семинар 12. Параллельное программирование на MPI (часть 5)Семинар 12. Параллельное программирование на MPI (часть 5)
Семинар 12. Параллельное программирование на MPI (часть 5)
Mikhail Kurnosov
 
Лекция 11. Методы разработки алгоритмов
Лекция 11. Методы разработки алгоритмовЛекция 11. Методы разработки алгоритмов
Лекция 11. Методы разработки алгоритмов
Mikhail Kurnosov
 
Семинар 11. Параллельное программирование на MPI (часть 4)
Семинар 11. Параллельное программирование на MPI (часть 4)Семинар 11. Параллельное программирование на MPI (часть 4)
Семинар 11. Параллельное программирование на MPI (часть 4)
Mikhail Kurnosov
 
Лекция 10. Графы. Остовные деревья минимальной стоимости
Лекция 10. Графы. Остовные деревья минимальной стоимостиЛекция 10. Графы. Остовные деревья минимальной стоимости
Лекция 10. Графы. Остовные деревья минимальной стоимости
Mikhail Kurnosov
 

More from Mikhail Kurnosov (20)

Векторизация кода (семинар 2)
Векторизация кода (семинар 2)Векторизация кода (семинар 2)
Векторизация кода (семинар 2)
 
Векторизация кода (семинар 3)
Векторизация кода (семинар 3)Векторизация кода (семинар 3)
Векторизация кода (семинар 3)
 
Векторизация кода (семинар 1)
Векторизация кода (семинар 1)Векторизация кода (семинар 1)
Векторизация кода (семинар 1)
 
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
Лекция 7. Декартовы деревья (Treaps, дучи, дерамиды)
 
Лекция 7. Стандарт OpenMP (подолжение)
Лекция 7. Стандарт OpenMP (подолжение)Лекция 7. Стандарт OpenMP (подолжение)
Лекция 7. Стандарт OpenMP (подолжение)
 
Лекция 6. Стандарт OpenMP
Лекция 6. Стандарт OpenMPЛекция 6. Стандарт OpenMP
Лекция 6. Стандарт OpenMP
 
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
Лекция 5. Основы параллельного программирования (Speedup, Amdahl's law, Paral...
 
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
Лекция 6. Фибоначчиевы кучи (Fibonacci heaps)
 
Лекция 5. B-деревья (B-trees, k-way merge sort)
Лекция 5. B-деревья (B-trees, k-way merge sort)Лекция 5. B-деревья (B-trees, k-way merge sort)
Лекция 5. B-деревья (B-trees, k-way merge sort)
 
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
Лекция 4. Векторизация кода (Code vectorization: SSE, AVX)
 
Лекция 4. Префиксные деревья (tries, prefix trees)
Лекция 4. Префиксные деревья (tries, prefix trees)Лекция 4. Префиксные деревья (tries, prefix trees)
Лекция 4. Префиксные деревья (tries, prefix trees)
 
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
Лекция 3. Оптимизация доступа к памяти (Memory access optimization, cache opt...
 
Лекция 3. АВЛ-деревья (AVL trees)
Лекция 3. АВЛ-деревья (AVL trees)Лекция 3. АВЛ-деревья (AVL trees)
Лекция 3. АВЛ-деревья (AVL trees)
 
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
Лекция 2. Оптимизация ветвлений и циклов (Branch prediction and loop optimiz...
 
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
Лекция 2. Красно-чёрные деревья (Red-black trees). Скошенные деревья (Splay t...
 
Лекция 1. Амортизационный анализ (amortized analysis)
Лекция 1. Амортизационный анализ (amortized analysis)Лекция 1. Амортизационный анализ (amortized analysis)
Лекция 1. Амортизационный анализ (amortized analysis)
 
Семинар 12. Параллельное программирование на MPI (часть 5)
Семинар 12. Параллельное программирование на MPI (часть 5)Семинар 12. Параллельное программирование на MPI (часть 5)
Семинар 12. Параллельное программирование на MPI (часть 5)
 
Лекция 11. Методы разработки алгоритмов
Лекция 11. Методы разработки алгоритмовЛекция 11. Методы разработки алгоритмов
Лекция 11. Методы разработки алгоритмов
 
Семинар 11. Параллельное программирование на MPI (часть 4)
Семинар 11. Параллельное программирование на MPI (часть 4)Семинар 11. Параллельное программирование на MPI (часть 4)
Семинар 11. Параллельное программирование на MPI (часть 4)
 
Лекция 10. Графы. Остовные деревья минимальной стоимости
Лекция 10. Графы. Остовные деревья минимальной стоимостиЛекция 10. Графы. Остовные деревья минимальной стоимости
Лекция 10. Графы. Остовные деревья минимальной стоимости
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Mapping Parallel Programs into Hierarchical Distributed Computer Systems

  • 1. Mapping Parallel Programs into Hierarchical Distributed Computer Systems Prof. Victor G. Khoroshevsky and Mikhail G. Kurnosov Computer Systems Laboratory, The A.V. Rzhanov Institute of Semiconductor Physics of Siberian Branch of Russian Academy of Sciences, 13 Lavrentyev ave., 630090 Novosibirsk, Russia E-mail: mkurnosov@isp.nsc.ru 4th International Conference on Software and Data Technologies (ICSOFT 2009) Sofia, Bulgaria, 26 - 29 July, 2009
  • 2. Mapping High-Performance Linpack into hierarchical computer cluster: Mapping by standard MPI-tools (mpiexec) – execution time 118 sec. (44 GFLOPS) Optimized mapping – execution time 100 sec. (53 GFLOPS) Mapping Parallel Programs into Hierarchical Distributed Computer Systems High-Performance Linpack task graph (NP=8, PMAP=1, BCAST=5) Computer cluster with hierarchical organization Two SMP-nodes: 2 x Intel Xeon 5150 2ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov Level 1 Level 2
  • 3. Related Work 3ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov 1. Mapping parallel programs into computer systems (CS) with a fixed network topology (hybercube, 3D-torus, mesh, etc). A parallel program represented by a task graph: (Yu, 2006), (Chen et al. 2006), (Bhanot et al. 2005), (Jose, 1999), (Ahmad, 1997), (Kalinowski, 1994), (Yau, 1993), (Ercal et al. 1990), (Lee, 1989), (Bokhari, 1981). 2. Mapping parallel programs into CSs with arbitrary topology. A parallel program represented by unweighted task graph: (Ucar et al., 2006), (Prakash et al., 2004), (Miquel et al., 2003), (Träff, 2002), (Moh, 2001), (Perego, 1998), (Lee, 1989). Algorithms considering a hierarchical organization of modern distributed computer systems are needed. The objective of our research – is development of models and algorithms for mapping parallel programs into modern hierarchical computer systems (such as, multicore computer clusters).
  • 4. Model of Hierarchical Organization of Distributed Computer System Example of hierarchical organization of computer cluster: N = 12; L = 3; n23 = 2; C23 = {9, 10, 11, 12}; g(3, 3, 4) = 2; z(1, 7) = 1 Denotations: C = {1, 2, …, N} – is a set of processor cores; L – is a number of levels in communication network; nl – is a number of elements placed at level l ∈ {1, 2, …, L}; nlk – is a number of children of element k ∈ {1, 2, …, nl} at level l; Сlk – is a set of processor cores belonging to the descendants of element k at level l; clk = |Clk|; bl – is a bandwidth of communication channels at level l (bit/sec.). 4ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
  • 5. Given a task graph G = (V, E) and a description of hierarchical organization of computer system (CS): • V = {1, 2, …, M} – is a set of parallel processes; • E ⊆ V × V – is a set of inter-process communications; • dij – is a volume of data transmitted between process i and j for a program execution time; • bz(p, q) – is a bandwidth of communication channel between cores p and q; Mapping – is a function f : V → C, which is defined by values of Objective – is to minimize a program execution time T(X). The Problem of Mapping Parallel Programs into Hierarchical Distributed Computer Systems )(1 1 1 ),( minmax)( ijx M j N p N q qpzijjqip Vi bdxxXT →       ⋅⋅= ∑ ∑∑ = = =∈ ,1 1 ∑= = N j ijx ,,...,2,1 Mi = ,1 1 ∑= ≤ M i ijx ,,...,2,1 Nj = },1,0{∈ijx ,Vi∈ .Cj ∈ Subject to the constraints:    ≠ = = .)(else,0 ;)(if,1 jif jif xij 5ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
  • 6. Task graph partitioning: The Heuristic Algorithm TMMGP 6ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov b3 b3 b2 b3 b1 1V′ 2V ′ 3V ′  1LcMk = Step 1 – Partitioning Step 2 – Mapping
  • 7. Task Graph Partitioning in the TMMGP algorithm 7ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov … … 1. Coarse graph: Heavy Edge Matching (Karypis, Kumar, 1998) 2. Partition graph Gm into k subsets by recursive bisection (Schloegel et al. 2003) 3. Refine partition by FM heuristic (Fiduccia, Mattheyses, 1982) A computational complexity of TMMGP algorithm is O(|E|log2k + M)
  • 8. Software Tools for Mapping MPI Programs 8ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
  • 9. Experiments Organization 9ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov MPI programs: • NAS Parallel Benchmarks (NPB); • High-Performance Linpack (HPL). Computer clusters: • Cluster Xeon16: 4 nodes (2 x Intel Xeon 5150), interconnect: Gigabit/Fast Ethernet; • Cluster Opteron10: 5 nodes (2 x AMD Opteron 248), interconnect: Gigabit/Fast Ethernet. 13 14 15 16 9 10 11 12 5 6 7 8 1 2 3 4 13 14 15 16 9 10 11 12 5 6 7 8 1 2 3 4 HPL task graph: 16 processes, PMAP=0, BCAST=5 NPB Conjugate Gradient task graph: 16 processes, CLASS B NPB Multigrid task graph: 16 processes, CLASS B
  • 10. Experiment Results 10ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov The execution time of TMMGP algorithm on Intel Core 2 Duo 2.13 GHz processor is less then 1 sec. Cluster interconnect T(XRR), sec. T(XTMMGP), sec. Speedup T(XRR) / T(XTMMGP) High-Performance Linpack Fast Ethernet 1108.69 911.81 1.22 Gigabit Ethernet 263.15 231.72 1.14 NPB Conjugate Gradient Fast Ethernet 726.02 400.36 1.81 Gigabit Ethernet 97.56 42.05 2.32 NPB Multigrid Fast Ethernet 23.94 23.90 1.00 Gigabit Ethernet 4.06 4.03 1.00 • T(XRR) – is the execution time of MPI benchmark with mapping by round robin algorithm of mpiexec tool (MPICH2 1.0.6). • T(XTMMGT) – is the execution time of MPI benchmark with mapping by TMMGP algorithm. The execution time of MPI benchmarks on Xeon16 cluster
  • 11. Conclusions and Future Works 11ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov Conclusions • It is required to take into account a hierarchical organization of modern computer systems and structures of parallel programs in mapping algorithms. • The proposed algorithm TMMGP allows to reduce execution time of MPI-programs on 40% in average. • New algorithms for mapping parallel programs with full task graphs are required. Future Works • Development of new algorithms for mapping parallel programs into arbitrary subsystems of hierarchical distributed computer systems. • Integrating the mapping algorithm TMMGP with mpiexec tool and resource management systems (such as TORQUE). • Application of the descried approach for optimizing MPI collective operations.
  • 12. Mapping Parallel Programs into Hierarchical Distributed Computer Systems Victor G. Khoroshevsky and Mikhail G. Kurnosov Computer Systems Laboratory, The A.V. Rzhanov Institute of Semiconductor Physics of Siberian Branch of Russian Academy of Sciences, 13 Lavrentyev ave., 630090 Novosibirsk, Russia E-mail: mkurnosov@isp.nsc.ru 4th International Conference on Software and Data Technologies (ICSOFT 2009) Sofia, Bulgaria, 26 - 29 July, 2009 Thank You For Your Attention
  • 13. Backup Slides ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov
  • 14. The k-way Graph Partitioning Problem ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov The example of 3-way graph partitioning: V’ = {1, 2, …, 12}; k = 3; s = 3; W(1, 2) = 3; W(1, 3) = 2; W(2, 3) = 4. It is required to partition graph G’ = (V’, E’) into k disjoint subsets such that maximal sum of edge-weights incident to any subset is minimized and |V’i| ≤ s. • w(u, v) – is a weight of edge (u, v) ∈ E’; • W(i, j) – is an additional weight for edges incident to subsets i and j; • c(u, v, i, j) = w(u, v)W(i, j) – is a total weight of edge (u, v) incident to subsets i and j. kVVV ′′′ ,...,, 21 The approximate partition: edge-weights(V’1) = w(1, 5)W(1, 2) + + w(6, 8)W(1, 3) + w(2, 3)W(1, 3) = 22; edge-weights(V’2) = 40 edge-weights(V’3) = 38 1V′ 2V′ 3V′),,(/),,,( jiLguv bdjivuc =
  • 15. Heavy Edge Matching algorithm ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Coarser graphMatching (source graph) HEM
  • 16. Graph Bisection ICSOFT 2009, July 26 – 29, 2009, Sofia, Bulgaria Mikhail Kurnosov 5 3 5 6 4 2 5 3 2 1 4 Initial vertexBisection