SlideShare a Scribd company logo
Parallel Computing: State-of-the Art Perspective,
E.H. D’Hollander, G.R. Joubert, F.J. Peters, D.
Trystram (Eds.), Elsevier, 1996
A programming environment for heterogeneous network
computing with transparent workload redistribution
M. Angelaccio, M. Cermele and M. Colajanni
Dipartimento di Informatica, Sistemi e Produzione
Università di Roma “Tor Vergata”, Via della Ricerca Scientifica, Roma, Italy
The project presented in this paper aims to extend the SPMD programming
paradigm to a computational platform composed of a network of heterogeneous
workstations with time-varying conditions. Presently, packages such as PVM and
MPI allow us to use a network of distributed nodes as a single parallel machine but
do not overcome the potential inefficiencies due both to heterogeneity and the
unpredictable variability of usually shared resources. The aim of this paper is to
illustrate an environment that both supports SPMD programming on a network of
workstations, and also provides transparent dynamic data re-distribution. Our
experiments demonstrate that a workload re-distribution support is necessary to
achieve a satisfactory efficiency when the computational platform is subject to heavy
modifications.
1. INTRODUCTION
SPMD programming is a widely adopted paradigm for a large class of problems.
Nevertheless, it becomes hard to preserve efficiency when the computing platform is
highly irregular and subject to dynamically varying conditions. The SPMD
programming paradigm, in fact, requires the choice of a specific data decomposition,
and the insertion of primitives in a decomposition dependent way. This approach
yields parallel programs that correspond to a single data distribution and guarantee
adequate efficiency only for regular problems running on homogeneous static
platforms. On the other hand, there are several cases where both data
decomposition and hardware platform are subject to dynamic variations. For
example, in all the problems such as molecular dynamics in which the workload
intrinsically changes at run-time; in case of heterogeneous network computing to
adjust load balancing when the available resources dynamically vary; in the recovery
of parallel programs in the presence of faulty nodes provided that a run-time
process/data reconfiguration support is available. In all these cases the use of static
environments would lead to serious inefficiencies that can be avoided by adapting
the workload distribution (in this case corresponding to data decomposition) to the
modified framework. This can be obtained by decomposition and machine
independent (DMI) parallel programs that do not require specification of data
decomposition and target machine at compile-time.
Presently, two main frameworks (i.e. PVM and MPI) allow us to use a network of
distributed nodes as a single parallel machine, thus yielding the design of machine
independent (MI) programs. These packages finely hide differences among the
nodes of a distributed platform from the programmer, but they do not overcome the
potential inefficiencies due to heterogeneity and unpredictable variability of usually
shared resources. At the moment, the solution to this problem is completely left to
the programmer who has to face any random modification of the computing platform.
The intent of our project called DAME (DAta Migration Environment) is twofold:
firstly to write DMI programs in an explicit message passing environment, secondly to
support dynamic data re-distribution. The first goal has been accomplished by the
parallel run-time library PLUS the theoretical foundations of which are in [1]. PLUS
provides a set of DMI collective primitives that allow the design and implementation of
programs in which the distribution attributes can be settled at run-time. The second
goal has been achieved by a transparent mechanism that, at regular intervals,
checks the status of the platform and, if necessary, autonomously provides suitable
data migrations from overloaded to under-loaded nodes.
The paper is organised as follows. Section 2 presents the DAME project focusing
on its aims and comparing them to related frameworks. Section 3 outlines the virtual
architecture and its effects on data decomposition. Section 4 describes the
programming model provided by PLUS and the interactions among the DAME
components. Section 5 presents experimental results on a computational platform
composed of a network of workstations.
2. THE DAME PROJECT
DAME is an environment that supports SPMD programming by means of
primitives that identify node properties (such as memory, current computational
power, etc.), facilitate node grouping operations, and support inter/intra group
communications. DAME provides double independence: from machines and from
data distribution. For SPMD programs the amount of computation performed by each
processing unit is usually proportional to the size of data owned. Therefore at the
beginning, DAME automatically distributes data by taking into account the
differences among current computational power of each workstation. At run-time,
DAME provides a dynamic data balancing support to preserve efficiency on a
platform subject to modification without forcing the programmer to manage
potentially complex operations such as workload monitoring, process
synchronisation, data migrations, and so on.
Literature presents various examples of strategies for load balancing. Task
migration strategies for highly parallel computers are shown in [7], whereas optimal
scheduling algorithms for network computing are presented in [4]. Piranha
dynamically adapts Linda computations to the number of available workstations [2].
Nedeljkovic and Quinn propose a modification of the run-time system of Dataparallel
C (DPC) by adapting it to heterogeneous networks and providing transparent
workload migration [6]. Automatic Data Movement (ADM) furnishes a set of functions
that help the programmer to achieve load balancing by means of data migration [3].
By comparing DAME to the existing strategies for SPMD applications, it should be
noted that ADM is not yet transparent to the programmer, whereas DPC presents
some similarities even if it is carried out in a completely different way. In particular,
the programming language provided by this latter is Dataparallel C, a SIMD language
oriented to virtual processors without explicit communication primitives, whereas
DAME supports PLUS, a decomposition independent message-passing language for
SPMD computations. In addition, DAME achieves dynamic load balancing by data
migration only instead of virtual parallel processor migration, as needed in DPC.
Moreover, since DAME is partially built over PVM [5], it inherits all the portability
advantages of this latter framework.
3. VIRTUAL COMPUTATIONAL ARCHITECTURE
DAME supports a virtual mesh topology because SPMD programming is
considerably simplified if an underlying regular platform is assumed. Nevertheless,
workstations are heterogeneous and irregularly connected. Their topology is usually
composed of a main backbone that connects several subnets by means of some
bridges (Figure 1.a). Even if widely used protocols such as TCP-IP provide complete
interconnection among nodes, efficiency issues suggest that we should cluster
together nodes that are more quickly connected among each other.
To this purpose, DAME groups together nodes of the same physical subnet to
form the rows of the virtual mesh topology (the so called row subnets). In addition,
DAME emulates a regular platform (i.e. each group with the same number of nodes)
by splitting some nodes into several virtual nodes whose number depends on the
offered computational power of each workstation.
A B C D
E F
H I
A B D
G
1 A2 C1
H1 H2 I1 I2 I3
E1
C2
E2 E3 E4 F1 F2
G
Figure 1.a. Actual network. Figure 1.b. Virtual network.
For example, once the computational parameters have been evaluated, DAME
maps the irregular physical network of Figure 1.a into the virtual mesh of Figure 1.b.
The virtual mesh seems the best compromise because it introduces fewer virtual
links (grey lines in Figure 1.b) than unbounded degree topologies and it does not
represent a severe limitation since several practical applications can be immediately
mapped over such domain or can be easily reduced to it.
As a consequence of this virtual topology definition, DAME always maps the data
domain onto a mn virtual mesh (e.g. 36 in Figure 1.b). For example in the case of
2D matrix domain, the partition algorithm decomposes the matrix into m groups of
rows and n groups of columns (Figure 2.a). In such way, a programmer deals with
virtual nodes/decomposition and can adopt the usual SPMD paradigm for 2D regular
topologies. Figure 2.b shows the actual mapping of data on the physical network:
each node has an amount of data proportional to the offered computational power
thus implying a very irregular topology. The dynamic load balancing support that
causes data migration and run-time modifications of the physical data distribution
does not require any adjustment on the high level code oriented to virtual nodes
thanks to the decomposition-independent paradigm provided by the PLUS run-time
library underlying DAME. The PLUS language, in fact, overcomes the difficulties of
programming on irregular and variable domains by providing a suitable set of
functions whose syntax appears quite similar to that of traditional data-parallel
primitives. The DMI PLUS primitives are characterised by a semantic flexibility, in the
sense that they self-adapt their effect to any data distribution.
A B D
G
1 A2
C1
H1 H2 I1 I2 I3
E1
C2
E2
E3 E4 F1 F2
B D
G
A
H I
C
E F
Figure 2.a. Virtual data decomposition. Figure 2.b. Actual data decomposition.
4. DAME COMPONENTS
DAME is organised into two logical components: master and computing nodes.
The whole evolution of programs is governed by the master that is a process
resident in one node. Since the master is idle during most of program execution, one
node (possibly, the most powerful) carries on the double activity of master and
computing node. The master starts the PVM demon on each workstation, and groups
nodes according to the network configuration. The static data distribution is carried
out by a “data balancing algorithm” on the basis of the network monitor function that
quantifies the current computational power of each workstation (in Figure 3 these
activities are evidenced by the grey arrows). Afterwards, each node can start the
execution of the parallel code.
During program execution, a plus_check() call guarantees load balancing by
performing, if necessary, a data migration. In such a case, the program execution is
interrupted, information about current computational power is collected by the
network monitor and, if heavy modifications have occurred, dynamic data distribution
algorithm is executed (in Figure 3 these activities are evidenced by the black arrows).
The re-distribution is not performed by the master that only indicates to each node
which data are to be sent and to be received. In such a way, each row subnet can
concurrently re-distribute data among its nodes. For the sake of efficiency we
distinguish between local and global reconfiguration, in the sense that data
exchange can happen only among nodes belonging to the same row subnet (local)
or among row subnets (global). The scalability requirement is satisfied since, if we
increase the number of nodes, the complexity of load balancing grows in proportion
to the square root of the number of nodes.
Each node behaves as in an usual SPMD programming environment. The
programmer should insert communication PLUS primitives as he would with a
regular virtual mesh. The decomposition can be settled and/or modified at run-time
by means of the plus_check() primitive that can be called either by the programmer or
automatically by the system if heavy and unexpected events require the suddenly re-
evaluation of data partition.
The node program is written in C enriched by the PLUS primitives. The Figure 3
illustrates a typical aspect of a PLUS code and how the different DAME components
interact. The self-adapting characteristic of the PLUS primitives cannot be illustrated
because it influences an underlying level.
Figure 3. Template of a PLUS node program and interactions among function calls
and DAME components.
The PLUS primitives can be divided into four groups Some of them are currently
built on top of PVM [5] thus representing an auxiliary layer.
Identification primitives. Usually called once before the main loop of the program,
they return global (such as number of nodes involved in computation, number of row
subnets) and local information (such as position of each node in the mesh, its
number of row subnet).
Loop dependent primitives. Used inside the main loop, they can be distinguished
between owner compute functions that determine the owners of a given set of data,
and indexing rules that allows the programmer to access to local data by means of
their global indexes in the original data structure. These primitives are the
fundamental basis that supports the decomposition independence paradigm of
PLUS since the programmer is never required to exactly express where data are
located.
Communication primitives. They conform to the PVM standard by supporting
several types of data exchange among nodes and among row subnets (such as fan-
in, fan-out, gathering). Some primitives are implemented by means of PVM
functions, others are designed and implemented ex novo.
DAME interface primitives. They represent the only non-transparent interface
between a traditional SPMD code and the irregular computational platform. At
present, three primitives belong to this class: plus_init(), plus_end(), and plus_check().
5. EXPERIMENTAL RESULTS
DAME is currently implemented on a Ethernet-based local area network
composed of four HP-9000, four Sun Sparc-Stations and one IBM RISC-6000 that
are connected as in Figure 1.a. Experiments were carried out on dedicated network
and workstations. In some experiments, though, some synthetic overheads were
added to the computational platform with the aim of emulating network and/or
machine contention. We have run several SPMD numerical algorithms such as
matrix multiplication, Gaussian and Cholesky factorisation, block Jacobi. For the
sake of room, we restrict ourselves here to the LU factorisation algorithm the results
of which are representative of the performance achieved by DAME. We evaluate
efficacy of the supports for irregular data decomposition, virtual network and dynamic
data re-distribution.
The first set of experiments has been carried out on a dedicated computational
platform. The aim is to demonstrate that the DAME supports do not add heavy
overheads to the execution times under static condition. Before starting computation,
the irregular data decomposition support partitions the workload in a way
proportional to the current computational power of each workstation. It has been
verified that for any number of machines and data dimension, DAME execution times
are lower than those achieved by using a workload equally partitioned among nodes.
In particular, Figure 4 shows the execution time (in seconds) of a parallel algorithm
for the factorisation of a dense matrix running on different numbers of workstations
under the hypothesis that no modification occurs in the computational platform. This
figure shows that considerable speed-up is achieved until four workstations are
involved, thus demonstrating that the irregular data decomposition and virtual
network supports do not degrade performance. The loss of efficiency for a higher
number of nodes is due both to an increased number of communications, and mainly
to the fact that the additional workstations belong to different physical subnets
connected through bridges.
1100900700500300100
0
40
80
120
160
1 Node
2 Nodes
4 Nodes
6 Nodes
8 Nodes
Matrix Size
Execution Time
1100900700500300100
0
20
40
60
80
100
2 Nodes
4 Nodes
8 Nodes
2-plus _check
4-plus _check
8-plus _check
Matrix Size
Execution Time
Figure 4. Execution times for LU factorisation Figure 5. Overhead of one plus_check()
call
of a dense matrix with varying dimensions without data migration.
(dedicated computational platform). (dedicated computational platform).
The efficacy of the dynamic data re-distribution support has to be evaluated under
static and dynamic condition. A trade-off exists between the performance
degradation due to load unbalance and the overhead due to the execution of the
plus_check() primitive. The latter consists of four phases: process synchronisation,
network monitoring, decision algorithm, and data re-distribution. Since DAME
efficiently implements the second and third phases, the main costly factors of the
plus_check() execution are process synchronisation and data re-distribution.
Figure 5 shows the execution times of a DAME program with and without
plus_check() call, respectively. Since no modification occurs in the computational
platform, no data re-distribution is carried out. Therefore, the gap between the two
curves evidences the cost of the first three phases. In particular, the light differences
demonstrate the scalability of the plus_check() primitive: the introduced overhead, in
fact, does not increase for higher number of nodes. It should be noted, though, that
this low overhead is also due to the characteristics of the considered SPMD
algorithm which implicitly synchronises the different processes at the end of each
iteration, if the workload is well balanced.
Figure 6 shows the execution time tex of the same parallel algorithm when some
modification of the computational power of workstations occurs. To evaluate the
impact of data re-distribution only, we preserve the global power of the
computational platform. In particular, at time tex/4, one workstation is burdened with
three synthetic workloads that cause a loss of power equal to 10%, 30% and 50%,
respectively. At the same time, some other workstations gain an analogous amount
of power. In this experiment the DAME program executes only one plus_check() call at
time tex/2. The (plain) curves point out the importance of a dynamic data migration
support especially when the occurred modifications are heavy (for the considered
algorithm, at least 30%) and/or the computational cost of the problem is high (i.e. in
case of long execution times).
1100900700500300100
0
20
40
60
80
100
120
10 %
10 %-p lu s_check
30 %
30 %-p lu s_check
50 %
50 %-p lu s_check
Matrix Size
Execution Time
1100900700500300100
0
20
40
60
80
100
120
10 %
10 %-p lu s_check
30 %
30 %-p lu s_check
50 %
50 %-p lu s_check
Matrix Size
Execution Time
Figure 6. Execution times with and without Figure 7. Execution times with and without
data migration for different variations of the data migration for different variations of the
computational platform (1 plus_check() call). computational platform (3 plus_check() calls).
Figure 7 illustrates the same experiments under a different frequency of the
plus_check() call, that is at time tex/4, tex/2 and 3tex/4. In this case, the modification of
the computational power occurs at time tex/8. We can observe that the additional
overhead caused by the multiple occurrence of the plus_check() call is widely
compensated if heavy modifications occur in the platform: the execution time is
reduced if a power variation of at least 30% occurs, whereas a longer execution time
is observed when the modifications are light (less than 30%).
In addition, by considering Figure 6 and 7 together, we can observe that three
plus_check() calls improve performance of the 50%-modification case to the extent that
the resulting execution time is lower than the unbalanced 30%-modification case
(compare 30% and 50%-plus_check curves in the two figures). It should be noted,
though, that here the checkpoint frequency is empirically solved once known the
program execution time. The optimal checkpoint insertion for any kind of SPMD
algorithm is one of the open problem that is still under study.
6. CONCLUSIONS
The DAME project presented in this paper aims to face some of the intrinsic
difficulties of SPMD programming on heterogeneous and time-varying network
platforms. DAME supplies the programmer with four kinds of transparent supports: a
run-time library (PLUS) of decomposition and machine independent primitives; a
virtual mesh abstraction that hides irregularities of the network; a static mechanism
that automatically distributes workload in a way which is proportional to the current
computational power of each workstation; a dynamic and transparent data migration
support that masks any modification of the underlying platform. The satisfying
experimental results shown by all these supports demonstrate that DAME is a
theoretical-based and efficacious framework for SPMD network computing and it
preserves efficiency when the platform is subject to dynamic variations.
References
[1] M. Angelaccio, M. Colajanni, “Unifying and optimizing parallel linear algebra
algorithms”, IEEE Trans. on Parallel and Distributed Systems, v. 4, no. 12, pp.
1382-1397, Dec. 1993.
[2] N. Carriero, D. Kaminsky, “Adaptive parallelism and Piranha”, IEEE Computer, v.
28, no. 1, Jan. 1995.
[3] J. Casas, R. Konuru, S.W. Otto, R. Prouty, J. Walpole, “Adaptive load migration
systems for PVM”, Proc. of Supercomputing ’94, pp. 390-399, Nov. 1994.
[4] K. Efe, V. Krishnamoorty, “Optimal scheduling of compute-intensive tasks on a
network of workstations”, IEEE Trans. on Parallel and Distributed Systems, v. 6,
no. 6, pp. 668-673, June 1995.
[5] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM
3.0 User’s Guides and Reference Manual, Feb. 1993 (available via ftp).
[6] N. Nedeijkovic, M.J. Quinn, “Data-parallel programming on a network of
heterogeneous workstations”, Concurrency: Practice and Experience, v. 5, no. 4,
pp. 257-268, June 1993.
[7] M.H. Willebeek-Le Mair, A.P. Reeves, “Strategies for dynamic load balancing on
highly parallel computers“, IEEE Trans. on Parallel and Distributed Systems, v. 4,
no. 9, pp. 979-993, Sept. 1993.

More Related Content

What's hot

PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
Nexgen Technology
 
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
Maria Stylianou
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
Vajira Thambawita
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...Shakas Technologies
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
Vajira Thambawita
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
Pankaj Kumar Jain
 
Conference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5GConference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5G
Ericsson
 
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions ManualDistributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
kyxeminut
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
Ericsson
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
Syed Zaid Irshad
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
Hanif Durad
 
11 construction productivity and cost estimation using artificial
11 construction productivity and cost estimation using artificial 11 construction productivity and cost estimation using artificial
11 construction productivity and cost estimation using artificial
Vivan17
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
Raymond Tay
 

What's hot (18)

PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
53
5353
53
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
Todtree
TodtreeTodtree
Todtree
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 
Conference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5GConference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5G
 
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions ManualDistributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
 
DDBMS
DDBMSDDBMS
DDBMS
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
3 design
3 design3 design
3 design
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
11 construction productivity and cost estimation using artificial
11 construction productivity and cost estimation using artificial 11 construction productivity and cost estimation using artificial
11 construction productivity and cost estimation using artificial
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 

Viewers also liked

Animales
AnimalesAnimales
Animales
escuela27de15
 
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيدحديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
عبد الفتاح درويش الرشايده
 
Motion Object Detection Using BGS Technique
Motion Object Detection Using BGS TechniqueMotion Object Detection Using BGS Technique
Motion Object Detection Using BGS Technique
ijcoa
 
Plano de ensino
Plano de ensinoPlano de ensino
Plano de ensino
Diogo Chalfun
 
changeup-final-draft
changeup-final-draftchangeup-final-draft
changeup-final-draftTerra Milles
 
Word origins
Word originsWord origins
Word origins
Yulia Shevchenko
 
JPK ACCREDITION CERT
JPK ACCREDITION CERTJPK ACCREDITION CERT
JPK ACCREDITION CERTMohd Asmi
 
La celula y el microscopio (9)
La celula y el microscopio (9)La celula y el microscopio (9)
La celula y el microscopio (9)
escuela27de15
 
ألفاظ ومفاهيم خاطئة
ألفاظ ومفاهيم خاطئةألفاظ ومفاهيم خاطئة
ألفاظ ومفاهيم خاطئة
م . محمد سليم محمد
 
Institucion
InstitucionInstitucion
Institucion
Jenny Rojas
 

Viewers also liked (12)

Animales
AnimalesAnimales
Animales
 
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيدحديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
حديث الحقائق عن قدر سيد الخلائق لفضيلة الشيخ فوزي محمد أبوزيد
 
Motion Object Detection Using BGS Technique
Motion Object Detection Using BGS TechniqueMotion Object Detection Using BGS Technique
Motion Object Detection Using BGS Technique
 
Plano de ensino
Plano de ensinoPlano de ensino
Plano de ensino
 
changeup-final-draft
changeup-final-draftchangeup-final-draft
changeup-final-draft
 
Word origins
Word originsWord origins
Word origins
 
JPK ACCREDITION CERT
JPK ACCREDITION CERTJPK ACCREDITION CERT
JPK ACCREDITION CERT
 
La celula y el microscopio (9)
La celula y el microscopio (9)La celula y el microscopio (9)
La celula y el microscopio (9)
 
ألفاظ ومفاهيم خاطئة
ألفاظ ومفاهيم خاطئةألفاظ ومفاهيم خاطئة
ألفاظ ومفاهيم خاطئة
 
school rules
school rulesschool rules
school rules
 
Place Value Review
Place Value ReviewPlace Value Review
Place Value Review
 
Institucion
InstitucionInstitucion
Institucion
 

Similar to Chapter on Book on Cloud Computing 96

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014
ijcite
 
Elastic neural network method for load prediction in cloud computing grid
Elastic neural network method for load prediction in cloud computing gridElastic neural network method for load prediction in cloud computing grid
Elastic neural network method for load prediction in cloud computing grid
IJECEIAES
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
IJERA Editor
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
nexgentechnology
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
Nexgen Technology
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
nexgentech15
 
A load balancing strategy for reducing data loss risk on cloud using remodif...
A load balancing strategy for reducing data loss risk on cloud  using remodif...A load balancing strategy for reducing data loss risk on cloud  using remodif...
A load balancing strategy for reducing data loss risk on cloud using remodif...
IJECEIAES
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
cscpconf
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
1crore projects
 
Just in-time code offloading for wearable computing
Just in-time code offloading for wearable computingJust in-time code offloading for wearable computing
Just in-time code offloading for wearable computing
redpel dot com
 
Just in-time code offloading for wearable computing
Just in-time code offloading for wearable computingJust in-time code offloading for wearable computing
Just in-time code offloading for wearable computing
redpel dot com
 
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
inventionjournals
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
inventy
 
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
IJET - International Journal of Engineering and Techniques
 
Detailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless NetworksDetailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless Networks
Gabriele D'Angelo
 
J41046368
J41046368J41046368
J41046368
IJERA Editor
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Sara Alvarez
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
neirew J
 

Similar to Chapter on Book on Cloud Computing 96 (20)

International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014
 
Elastic neural network method for load prediction in cloud computing grid
Elastic neural network method for load prediction in cloud computing gridElastic neural network method for load prediction in cloud computing grid
Elastic neural network method for load prediction in cloud computing grid
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
 
A load balancing strategy for reducing data loss risk on cloud using remodif...
A load balancing strategy for reducing data loss risk on cloud  using remodif...A load balancing strategy for reducing data loss risk on cloud  using remodif...
A load balancing strategy for reducing data loss risk on cloud using remodif...
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
 
Just in-time code offloading for wearable computing
Just in-time code offloading for wearable computingJust in-time code offloading for wearable computing
Just in-time code offloading for wearable computing
 
Just in-time code offloading for wearable computing
Just in-time code offloading for wearable computingJust in-time code offloading for wearable computing
Just in-time code offloading for wearable computing
 
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
 
Detailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless NetworksDetailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless Networks
 
J41046368
J41046368J41046368
J41046368
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 

More from Michele Cermele

Business NPL McKinsey Quarterly2002
Business NPL McKinsey Quarterly2002Business NPL McKinsey Quarterly2002
Business NPL McKinsey Quarterly2002Michele Cermele
 
Article on nanotech in Italy
Article on nanotech in ItalyArticle on nanotech in Italy
Article on nanotech in ItalyMichele Cermele
 
SOLE24ORE_28settembre2005
SOLE24ORE_28settembre2005SOLE24ORE_28settembre2005
SOLE24ORE_28settembre2005Michele Cermele
 

More from Michele Cermele (6)

Business NPL McKinsey Quarterly2002
Business NPL McKinsey Quarterly2002Business NPL McKinsey Quarterly2002
Business NPL McKinsey Quarterly2002
 
Article on nanotech in Italy
Article on nanotech in ItalyArticle on nanotech in Italy
Article on nanotech in Italy
 
SOLE24ORE_28settembre2005
SOLE24ORE_28settembre2005SOLE24ORE_28settembre2005
SOLE24ORE_28settembre2005
 
20160229191004045
2016022919100404520160229191004045
20160229191004045
 
20160229191739649
2016022919173964920160229191739649
20160229191739649
 
20160229193321825
2016022919332182520160229193321825
20160229193321825
 

Chapter on Book on Cloud Computing 96

  • 1. Parallel Computing: State-of-the Art Perspective, E.H. D’Hollander, G.R. Joubert, F.J. Peters, D. Trystram (Eds.), Elsevier, 1996 A programming environment for heterogeneous network computing with transparent workload redistribution M. Angelaccio, M. Cermele and M. Colajanni Dipartimento di Informatica, Sistemi e Produzione Università di Roma “Tor Vergata”, Via della Ricerca Scientifica, Roma, Italy The project presented in this paper aims to extend the SPMD programming paradigm to a computational platform composed of a network of heterogeneous workstations with time-varying conditions. Presently, packages such as PVM and MPI allow us to use a network of distributed nodes as a single parallel machine but do not overcome the potential inefficiencies due both to heterogeneity and the unpredictable variability of usually shared resources. The aim of this paper is to illustrate an environment that both supports SPMD programming on a network of workstations, and also provides transparent dynamic data re-distribution. Our experiments demonstrate that a workload re-distribution support is necessary to achieve a satisfactory efficiency when the computational platform is subject to heavy modifications. 1. INTRODUCTION SPMD programming is a widely adopted paradigm for a large class of problems. Nevertheless, it becomes hard to preserve efficiency when the computing platform is highly irregular and subject to dynamically varying conditions. The SPMD programming paradigm, in fact, requires the choice of a specific data decomposition, and the insertion of primitives in a decomposition dependent way. This approach yields parallel programs that correspond to a single data distribution and guarantee adequate efficiency only for regular problems running on homogeneous static platforms. On the other hand, there are several cases where both data decomposition and hardware platform are subject to dynamic variations. For example, in all the problems such as molecular dynamics in which the workload intrinsically changes at run-time; in case of heterogeneous network computing to adjust load balancing when the available resources dynamically vary; in the recovery of parallel programs in the presence of faulty nodes provided that a run-time process/data reconfiguration support is available. In all these cases the use of static environments would lead to serious inefficiencies that can be avoided by adapting the workload distribution (in this case corresponding to data decomposition) to the modified framework. This can be obtained by decomposition and machine independent (DMI) parallel programs that do not require specification of data decomposition and target machine at compile-time.
  • 2. Presently, two main frameworks (i.e. PVM and MPI) allow us to use a network of distributed nodes as a single parallel machine, thus yielding the design of machine independent (MI) programs. These packages finely hide differences among the nodes of a distributed platform from the programmer, but they do not overcome the potential inefficiencies due to heterogeneity and unpredictable variability of usually shared resources. At the moment, the solution to this problem is completely left to the programmer who has to face any random modification of the computing platform. The intent of our project called DAME (DAta Migration Environment) is twofold: firstly to write DMI programs in an explicit message passing environment, secondly to support dynamic data re-distribution. The first goal has been accomplished by the parallel run-time library PLUS the theoretical foundations of which are in [1]. PLUS provides a set of DMI collective primitives that allow the design and implementation of programs in which the distribution attributes can be settled at run-time. The second goal has been achieved by a transparent mechanism that, at regular intervals, checks the status of the platform and, if necessary, autonomously provides suitable data migrations from overloaded to under-loaded nodes. The paper is organised as follows. Section 2 presents the DAME project focusing on its aims and comparing them to related frameworks. Section 3 outlines the virtual architecture and its effects on data decomposition. Section 4 describes the programming model provided by PLUS and the interactions among the DAME components. Section 5 presents experimental results on a computational platform composed of a network of workstations. 2. THE DAME PROJECT DAME is an environment that supports SPMD programming by means of primitives that identify node properties (such as memory, current computational power, etc.), facilitate node grouping operations, and support inter/intra group communications. DAME provides double independence: from machines and from data distribution. For SPMD programs the amount of computation performed by each processing unit is usually proportional to the size of data owned. Therefore at the beginning, DAME automatically distributes data by taking into account the differences among current computational power of each workstation. At run-time, DAME provides a dynamic data balancing support to preserve efficiency on a platform subject to modification without forcing the programmer to manage potentially complex operations such as workload monitoring, process synchronisation, data migrations, and so on. Literature presents various examples of strategies for load balancing. Task migration strategies for highly parallel computers are shown in [7], whereas optimal scheduling algorithms for network computing are presented in [4]. Piranha dynamically adapts Linda computations to the number of available workstations [2]. Nedeljkovic and Quinn propose a modification of the run-time system of Dataparallel C (DPC) by adapting it to heterogeneous networks and providing transparent workload migration [6]. Automatic Data Movement (ADM) furnishes a set of functions that help the programmer to achieve load balancing by means of data migration [3]. By comparing DAME to the existing strategies for SPMD applications, it should be noted that ADM is not yet transparent to the programmer, whereas DPC presents some similarities even if it is carried out in a completely different way. In particular,
  • 3. the programming language provided by this latter is Dataparallel C, a SIMD language oriented to virtual processors without explicit communication primitives, whereas DAME supports PLUS, a decomposition independent message-passing language for SPMD computations. In addition, DAME achieves dynamic load balancing by data migration only instead of virtual parallel processor migration, as needed in DPC. Moreover, since DAME is partially built over PVM [5], it inherits all the portability advantages of this latter framework. 3. VIRTUAL COMPUTATIONAL ARCHITECTURE DAME supports a virtual mesh topology because SPMD programming is considerably simplified if an underlying regular platform is assumed. Nevertheless, workstations are heterogeneous and irregularly connected. Their topology is usually composed of a main backbone that connects several subnets by means of some bridges (Figure 1.a). Even if widely used protocols such as TCP-IP provide complete interconnection among nodes, efficiency issues suggest that we should cluster together nodes that are more quickly connected among each other. To this purpose, DAME groups together nodes of the same physical subnet to form the rows of the virtual mesh topology (the so called row subnets). In addition, DAME emulates a regular platform (i.e. each group with the same number of nodes) by splitting some nodes into several virtual nodes whose number depends on the offered computational power of each workstation. A B C D E F H I A B D G 1 A2 C1 H1 H2 I1 I2 I3 E1 C2 E2 E3 E4 F1 F2 G Figure 1.a. Actual network. Figure 1.b. Virtual network. For example, once the computational parameters have been evaluated, DAME maps the irregular physical network of Figure 1.a into the virtual mesh of Figure 1.b. The virtual mesh seems the best compromise because it introduces fewer virtual links (grey lines in Figure 1.b) than unbounded degree topologies and it does not represent a severe limitation since several practical applications can be immediately mapped over such domain or can be easily reduced to it. As a consequence of this virtual topology definition, DAME always maps the data domain onto a mn virtual mesh (e.g. 36 in Figure 1.b). For example in the case of 2D matrix domain, the partition algorithm decomposes the matrix into m groups of rows and n groups of columns (Figure 2.a). In such way, a programmer deals with virtual nodes/decomposition and can adopt the usual SPMD paradigm for 2D regular topologies. Figure 2.b shows the actual mapping of data on the physical network: each node has an amount of data proportional to the offered computational power
  • 4. thus implying a very irregular topology. The dynamic load balancing support that causes data migration and run-time modifications of the physical data distribution does not require any adjustment on the high level code oriented to virtual nodes thanks to the decomposition-independent paradigm provided by the PLUS run-time library underlying DAME. The PLUS language, in fact, overcomes the difficulties of programming on irregular and variable domains by providing a suitable set of functions whose syntax appears quite similar to that of traditional data-parallel primitives. The DMI PLUS primitives are characterised by a semantic flexibility, in the sense that they self-adapt their effect to any data distribution. A B D G 1 A2 C1 H1 H2 I1 I2 I3 E1 C2 E2 E3 E4 F1 F2 B D G A H I C E F Figure 2.a. Virtual data decomposition. Figure 2.b. Actual data decomposition. 4. DAME COMPONENTS DAME is organised into two logical components: master and computing nodes. The whole evolution of programs is governed by the master that is a process resident in one node. Since the master is idle during most of program execution, one node (possibly, the most powerful) carries on the double activity of master and computing node. The master starts the PVM demon on each workstation, and groups nodes according to the network configuration. The static data distribution is carried out by a “data balancing algorithm” on the basis of the network monitor function that quantifies the current computational power of each workstation (in Figure 3 these activities are evidenced by the grey arrows). Afterwards, each node can start the execution of the parallel code. During program execution, a plus_check() call guarantees load balancing by performing, if necessary, a data migration. In such a case, the program execution is interrupted, information about current computational power is collected by the network monitor and, if heavy modifications have occurred, dynamic data distribution algorithm is executed (in Figure 3 these activities are evidenced by the black arrows). The re-distribution is not performed by the master that only indicates to each node which data are to be sent and to be received. In such a way, each row subnet can concurrently re-distribute data among its nodes. For the sake of efficiency we distinguish between local and global reconfiguration, in the sense that data exchange can happen only among nodes belonging to the same row subnet (local) or among row subnets (global). The scalability requirement is satisfied since, if we increase the number of nodes, the complexity of load balancing grows in proportion to the square root of the number of nodes.
  • 5. Each node behaves as in an usual SPMD programming environment. The programmer should insert communication PLUS primitives as he would with a regular virtual mesh. The decomposition can be settled and/or modified at run-time by means of the plus_check() primitive that can be called either by the programmer or automatically by the system if heavy and unexpected events require the suddenly re- evaluation of data partition. The node program is written in C enriched by the PLUS primitives. The Figure 3 illustrates a typical aspect of a PLUS code and how the different DAME components interact. The self-adapting characteristic of the PLUS primitives cannot be illustrated because it influences an underlying level. Figure 3. Template of a PLUS node program and interactions among function calls and DAME components. The PLUS primitives can be divided into four groups Some of them are currently built on top of PVM [5] thus representing an auxiliary layer. Identification primitives. Usually called once before the main loop of the program, they return global (such as number of nodes involved in computation, number of row subnets) and local information (such as position of each node in the mesh, its number of row subnet). Loop dependent primitives. Used inside the main loop, they can be distinguished between owner compute functions that determine the owners of a given set of data, and indexing rules that allows the programmer to access to local data by means of their global indexes in the original data structure. These primitives are the fundamental basis that supports the decomposition independence paradigm of PLUS since the programmer is never required to exactly express where data are located. Communication primitives. They conform to the PVM standard by supporting several types of data exchange among nodes and among row subnets (such as fan-
  • 6. in, fan-out, gathering). Some primitives are implemented by means of PVM functions, others are designed and implemented ex novo. DAME interface primitives. They represent the only non-transparent interface between a traditional SPMD code and the irregular computational platform. At present, three primitives belong to this class: plus_init(), plus_end(), and plus_check().
  • 7. 5. EXPERIMENTAL RESULTS DAME is currently implemented on a Ethernet-based local area network composed of four HP-9000, four Sun Sparc-Stations and one IBM RISC-6000 that are connected as in Figure 1.a. Experiments were carried out on dedicated network and workstations. In some experiments, though, some synthetic overheads were added to the computational platform with the aim of emulating network and/or machine contention. We have run several SPMD numerical algorithms such as matrix multiplication, Gaussian and Cholesky factorisation, block Jacobi. For the sake of room, we restrict ourselves here to the LU factorisation algorithm the results of which are representative of the performance achieved by DAME. We evaluate efficacy of the supports for irregular data decomposition, virtual network and dynamic data re-distribution. The first set of experiments has been carried out on a dedicated computational platform. The aim is to demonstrate that the DAME supports do not add heavy overheads to the execution times under static condition. Before starting computation, the irregular data decomposition support partitions the workload in a way proportional to the current computational power of each workstation. It has been verified that for any number of machines and data dimension, DAME execution times are lower than those achieved by using a workload equally partitioned among nodes. In particular, Figure 4 shows the execution time (in seconds) of a parallel algorithm for the factorisation of a dense matrix running on different numbers of workstations under the hypothesis that no modification occurs in the computational platform. This figure shows that considerable speed-up is achieved until four workstations are involved, thus demonstrating that the irregular data decomposition and virtual network supports do not degrade performance. The loss of efficiency for a higher number of nodes is due both to an increased number of communications, and mainly to the fact that the additional workstations belong to different physical subnets connected through bridges. 1100900700500300100 0 40 80 120 160 1 Node 2 Nodes 4 Nodes 6 Nodes 8 Nodes Matrix Size Execution Time 1100900700500300100 0 20 40 60 80 100 2 Nodes 4 Nodes 8 Nodes 2-plus _check 4-plus _check 8-plus _check Matrix Size Execution Time Figure 4. Execution times for LU factorisation Figure 5. Overhead of one plus_check() call
  • 8. of a dense matrix with varying dimensions without data migration. (dedicated computational platform). (dedicated computational platform).
  • 9. The efficacy of the dynamic data re-distribution support has to be evaluated under static and dynamic condition. A trade-off exists between the performance degradation due to load unbalance and the overhead due to the execution of the plus_check() primitive. The latter consists of four phases: process synchronisation, network monitoring, decision algorithm, and data re-distribution. Since DAME efficiently implements the second and third phases, the main costly factors of the plus_check() execution are process synchronisation and data re-distribution. Figure 5 shows the execution times of a DAME program with and without plus_check() call, respectively. Since no modification occurs in the computational platform, no data re-distribution is carried out. Therefore, the gap between the two curves evidences the cost of the first three phases. In particular, the light differences demonstrate the scalability of the plus_check() primitive: the introduced overhead, in fact, does not increase for higher number of nodes. It should be noted, though, that this low overhead is also due to the characteristics of the considered SPMD algorithm which implicitly synchronises the different processes at the end of each iteration, if the workload is well balanced. Figure 6 shows the execution time tex of the same parallel algorithm when some modification of the computational power of workstations occurs. To evaluate the impact of data re-distribution only, we preserve the global power of the computational platform. In particular, at time tex/4, one workstation is burdened with three synthetic workloads that cause a loss of power equal to 10%, 30% and 50%, respectively. At the same time, some other workstations gain an analogous amount of power. In this experiment the DAME program executes only one plus_check() call at time tex/2. The (plain) curves point out the importance of a dynamic data migration support especially when the occurred modifications are heavy (for the considered algorithm, at least 30%) and/or the computational cost of the problem is high (i.e. in case of long execution times). 1100900700500300100 0 20 40 60 80 100 120 10 % 10 %-p lu s_check 30 % 30 %-p lu s_check 50 % 50 %-p lu s_check Matrix Size Execution Time 1100900700500300100 0 20 40 60 80 100 120 10 % 10 %-p lu s_check 30 % 30 %-p lu s_check 50 % 50 %-p lu s_check Matrix Size Execution Time Figure 6. Execution times with and without Figure 7. Execution times with and without data migration for different variations of the data migration for different variations of the computational platform (1 plus_check() call). computational platform (3 plus_check() calls).
  • 10. Figure 7 illustrates the same experiments under a different frequency of the plus_check() call, that is at time tex/4, tex/2 and 3tex/4. In this case, the modification of the computational power occurs at time tex/8. We can observe that the additional overhead caused by the multiple occurrence of the plus_check() call is widely compensated if heavy modifications occur in the platform: the execution time is reduced if a power variation of at least 30% occurs, whereas a longer execution time is observed when the modifications are light (less than 30%). In addition, by considering Figure 6 and 7 together, we can observe that three plus_check() calls improve performance of the 50%-modification case to the extent that the resulting execution time is lower than the unbalanced 30%-modification case (compare 30% and 50%-plus_check curves in the two figures). It should be noted, though, that here the checkpoint frequency is empirically solved once known the program execution time. The optimal checkpoint insertion for any kind of SPMD algorithm is one of the open problem that is still under study. 6. CONCLUSIONS The DAME project presented in this paper aims to face some of the intrinsic difficulties of SPMD programming on heterogeneous and time-varying network platforms. DAME supplies the programmer with four kinds of transparent supports: a run-time library (PLUS) of decomposition and machine independent primitives; a virtual mesh abstraction that hides irregularities of the network; a static mechanism that automatically distributes workload in a way which is proportional to the current computational power of each workstation; a dynamic and transparent data migration support that masks any modification of the underlying platform. The satisfying experimental results shown by all these supports demonstrate that DAME is a theoretical-based and efficacious framework for SPMD network computing and it preserves efficiency when the platform is subject to dynamic variations. References [1] M. Angelaccio, M. Colajanni, “Unifying and optimizing parallel linear algebra algorithms”, IEEE Trans. on Parallel and Distributed Systems, v. 4, no. 12, pp. 1382-1397, Dec. 1993. [2] N. Carriero, D. Kaminsky, “Adaptive parallelism and Piranha”, IEEE Computer, v. 28, no. 1, Jan. 1995. [3] J. Casas, R. Konuru, S.W. Otto, R. Prouty, J. Walpole, “Adaptive load migration systems for PVM”, Proc. of Supercomputing ’94, pp. 390-399, Nov. 1994. [4] K. Efe, V. Krishnamoorty, “Optimal scheduling of compute-intensive tasks on a network of workstations”, IEEE Trans. on Parallel and Distributed Systems, v. 6, no. 6, pp. 668-673, June 1995. [5] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM 3.0 User’s Guides and Reference Manual, Feb. 1993 (available via ftp). [6] N. Nedeijkovic, M.J. Quinn, “Data-parallel programming on a network of heterogeneous workstations”, Concurrency: Practice and Experience, v. 5, no. 4, pp. 257-268, June 1993.
  • 11. [7] M.H. Willebeek-Le Mair, A.P. Reeves, “Strategies for dynamic load balancing on highly parallel computers“, IEEE Trans. on Parallel and Distributed Systems, v. 4, no. 9, pp. 979-993, Sept. 1993.