NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel
system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting
of multiple collections of nodes with different types of computing devices. The execution engine of the
system is open for optimizer implementations, focusing on various criteria. In this paper, we propose a new
optimizer for KernelHive, that utilizes distributed databases and performs data prefetching to optimize the
execution time of applications, which process large input data. Employing a versatile data management
scheme, which allows combining various distributed data providers, we propose using NoSQL databases
for our purposes. We support our solution with results of experiments with real executions of our OpenCL
implementation of a regular expression matching application in various hardware configurations.
Additionally, we propose a network-aware scheduling scheme for selecting hardware for the proposed
optimizer and present simulations that demonstrate its advantages.
The theory behind parallel computing is covered here. For more theoretical knowledge: https://sites.google.com/view/vajira-thambawita/leaning-materials
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel
system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting
of multiple collections of nodes with different types of computing devices. The execution engine of the
system is open for optimizer implementations, focusing on various criteria. In this paper, we propose a new
optimizer for KernelHive, that utilizes distributed databases and performs data prefetching to optimize the
execution time of applications, which process large input data. Employing a versatile data management
scheme, which allows combining various distributed data providers, we propose using NoSQL databases
for our purposes. We support our solution with results of experiments with real executions of our OpenCL
implementation of a regular expression matching application in various hardware configurations.
Additionally, we propose a network-aware scheduling scheme for selecting hardware for the proposed
optimizer and present simulations that demonstrate its advantages.
The theory behind parallel computing is covered here. For more theoretical knowledge: https://sites.google.com/view/vajira-thambawita/leaning-materials
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONNexgen Technology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
The main principles of parallel algorithm design are discussed here. For more information: visit, https://sites.google.com/view/vajira-thambawita/leaning-materials
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
System Interconnect Architectures,Network Properties and Routing,Linear Array,
Ring and Chordal Ring,
Barrel Shifter,
Tree and Star,
Fat Tree,
Mesh and Torus,Dynamic InterConnection Networks,Dynamic bus ,Switch Modules
,Multistage Networks,Omega Network,Baseline Network,Crossbar Networks
Along with idling and contention, communication is a major overhead in parallel programs.
The cost of communication is dependent on a variety of features including the programming model semantics, the network topology, data handling and routing, and associated software protocols.
This is the keynote talk i delivered at GeekCamp.SG 2014
The main purpose of the talk is to create an awareness, if not existent, in the community when it comes to choosing and wanting to building a distributed system.
This presentation is not meant to be a survey of distributed computing through the ages but hopefully it serves as a good starting point in which the journeyman can start from.
I want to thank Jonas, CTO of Typesafe, as his work in Akka strongly influenced my own and i hope it would help you in the way his work helped me.
فضيلة الشيخ / فوزى محمد أبوزيد عالم من كبار العلماء العاملين نذر حياته وأوقف وقته وجهده وماله لله تعالى يدعو إلى الله على بصيرة بالحكمة والموعظة الحسنة علم من أعلام الدعوةإلى الله تعالى فى القرن الحادى والعشرين، وواحد من العلماء السائرين على منهاج النبى صلى الله عليه والمتمسكين بسنته والمجددين لفهم أمور الدين بما يناسب ما جد فى هذا العصر من آفات وأمراض، على منهج أهل السنة والجماعة وعلى نهج الصحابة الكرام، ومربى فاضل له تلاميذ فى جميع بلاد العالم
The detection of moving object is an important in many applications such as a vehicle identification in a traffic monitoring system,human detection in a crime branch.In this paper we identify a vehicle in a video sequence.This paper briefly explain the detection of moving vehicle in a video.We introduce a new algorithm BGS for idntifying vehicle in a video sequence. First, we differentiate the foreground from background in frames by learning the background. Then, the image is divided into many small nonoverlapped frames. The candidates of the vehicle part can be found from the frames if there is some change in gray level between the current image and the background. The extracted background subtraction method is used in subsequent analysis to detect a vehicle and classify moving vehicle.
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONNexgen Technology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
The main principles of parallel algorithm design are discussed here. For more information: visit, https://sites.google.com/view/vajira-thambawita/leaning-materials
Parallel programming platforms are introduced here. For more information about parallel programming and distributed computing visit,
https://sites.google.com/view/vajira-thambawita/leaning-materials
System Interconnect Architectures,Network Properties and Routing,Linear Array,
Ring and Chordal Ring,
Barrel Shifter,
Tree and Star,
Fat Tree,
Mesh and Torus,Dynamic InterConnection Networks,Dynamic bus ,Switch Modules
,Multistage Networks,Omega Network,Baseline Network,Crossbar Networks
Along with idling and contention, communication is a major overhead in parallel programs.
The cost of communication is dependent on a variety of features including the programming model semantics, the network topology, data handling and routing, and associated software protocols.
This is the keynote talk i delivered at GeekCamp.SG 2014
The main purpose of the talk is to create an awareness, if not existent, in the community when it comes to choosing and wanting to building a distributed system.
This presentation is not meant to be a survey of distributed computing through the ages but hopefully it serves as a good starting point in which the journeyman can start from.
I want to thank Jonas, CTO of Typesafe, as his work in Akka strongly influenced my own and i hope it would help you in the way his work helped me.
فضيلة الشيخ / فوزى محمد أبوزيد عالم من كبار العلماء العاملين نذر حياته وأوقف وقته وجهده وماله لله تعالى يدعو إلى الله على بصيرة بالحكمة والموعظة الحسنة علم من أعلام الدعوةإلى الله تعالى فى القرن الحادى والعشرين، وواحد من العلماء السائرين على منهاج النبى صلى الله عليه والمتمسكين بسنته والمجددين لفهم أمور الدين بما يناسب ما جد فى هذا العصر من آفات وأمراض، على منهج أهل السنة والجماعة وعلى نهج الصحابة الكرام، ومربى فاضل له تلاميذ فى جميع بلاد العالم
The detection of moving object is an important in many applications such as a vehicle identification in a traffic monitoring system,human detection in a crime branch.In this paper we identify a vehicle in a video sequence.This paper briefly explain the detection of moving vehicle in a video.We introduce a new algorithm BGS for idntifying vehicle in a video sequence. First, we differentiate the foreground from background in frames by learning the background. Then, the image is divided into many small nonoverlapped frames. The candidates of the vehicle part can be found from the frames if there is some change in gray level between the current image and the background. The extracted background subtraction method is used in subsequent analysis to detect a vehicle and classify moving vehicle.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Elastic neural network method for load prediction in cloud computing gridIJECEIAES
Cloud computing still has no standard definition, yet it is concerned with Internet or network on-demand delivery of resources and services. It has gained much popularity in last few years due to rapid growth in technology and the Internet. Many issues yet to be tackled within cloud computing technical challenges, such as Virtual Machine migration, server association, fault tolerance, scalability, and availability. The most we are concerned with in this research is balancing servers load; the way of spreading the load between various nodes exists in any distributed systems that help to utilize resource and job response time, enhance scalability, and user satisfaction. Load rebalancing algorithm with dynamic resource allocation is presented to adapt with changing needs of a cloud environment. This research presents a modified elastic adaptive neural network (EANN) with modified adaptive smoothing errors, to build an evolving system to predict Virtual Machine load. To evaluate the proposed balancing method, we conducted a series of simulation studies using cloud simulator and made comparisons with previously suggested approaches in the previous work. The experimental results show that suggested method betters present approaches significantly and all these approaches.
Efficient load rebalancing for distributed file system in CloudsIJERA Editor
Cloud computing is an upcoming era in software industry. It’s a very vast and developing technology.
Distributed file systems play an important role in cloud computing applications based on map reduce
techniques. While making use of distributed file systems for cloud computing, nodes serves computing and
storage functions at the same time. Given file is divided into small parts to use map reduce algorithms in
parallel. But the problem lies here since in cloud computing nodes may be added, deleted or modified any time
and also operations on files may be done dynamically. This causes the unequal load distribution of load among
the nodes which leads to load imbalance problem in distributed file system. Newly developed distributed file
system mostly depends upon central node for load distribution but this method is not helpful in large-scale and
where chances of failure are more. Use of central node for load distribution creates a problem of single point
dependency and chances of performance of bottleneck are more. As well as issues like movement cost and
network traffic caused due to migration of nodes and file chunks need to be resolved. So we are proposing
algorithm which will overcome all these problems and helps to achieve uniform load distribution efficiently. To
verify the feasibility and efficiency of our algorithm we will be using simulation setup and compare our
algorithm with existing techniques for the factors like load imbalance factor, movement cost and network traffic.
Orchestrating Bulk Data Transfers across Geo-Distributed Datacentersnexgentechnology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSNexgen Technology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
Orchestrating bulk data transfers acrossnexgentech15
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
A load balancing strategy for reducing data loss risk on cloud using remodif...IJECEIAES
Cloud computing always deals with new problems to fulfill the demand of the challenging organizations around the whole world. Reducing response time without the risk of data loss is a very critical issue for the user requests on cloud computing. Load balancing ensures quick response of virtual machine (VM), proper usage of VMs, throughput, and minimal cost of VMs. This paper introduces a re-modified throttled algorithm (RTMA) that reduces the risk of data hampering and data loss considering the availability of VM which increases system’s performance. Response time of virtual machines have been considered in our work, so that when migration process is running, data will not be overflowed in the VMs. Thus, the data migration process becomes high and reliable. We have completed the overall simulation of our proposed algorithm on the cloud analyst tool and successfully reduced the risk of data loss as well as maintains the response time.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
PAGE: A Partition Aware Engine for Parallel Graph Computation1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Scheduling Divisible Jobs to Optimize the Computation and Energy Costsinventionjournals
ABSTRACT : The important challenge in cloud computing environment is to design a scheduling strategy to handle jobs, and to process them in a heterogeneous environment with shared data centers. In this paper, we attempt to investigate a new analytical framework model that enables an existing private cloud data-center for scheduling jobs and minimizing the overall computation and energy cost together. Our model is based on Divisible Load Theory (DLT) model to derive closed-form solution for the load fractions to be assigned to each machines considering computation and energy cost. Our analysis also attempts to schedule the jobs such a way that cloud provider can gain maximum benefit for his service and Quality of Service (QoS) requirement user’s job. Finally, we quantify the performance of the strategies via rigorous simulation studies.
Research Inventy : International Journal of Engineering and Scienceinventy
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed
Cloud computing is a new computing paradigm that, just as electricity was firstly generated at home and
evolved to be supplied from a few utility providers, aims to transform computing into a utility. It is a mapping
strategy that efficiently equilibrates the task load into multiple computational resources in the network based on the
system status to improve performance. The objective of this research paper is to show the results of Hybrid DEGA,
in which GA is implemented after DE
Detailed Simulation of Large-Scale Wireless NetworksGabriele D'Angelo
WiFra is a new framework for the detailed simulation of very large-scale wireless networks. It is based on the parallel and distributed simulation approach and provides high scalability in terms of size of simulated networks and number of execution units running the simulation. In order to improve the performance of distributed simulation, additional techniques are proposed. Their aim is to reduce the communication overhead and to maintain a good level of load-balancing. Simulation architectures composed of low-cost Commercial-Off-The-Shelf (COTS) hardware are specifically supported by WiFra. The framework dynamically reconfigures the simulation, taking care of the performance of each part of the execution architecture and dealing with unpredictable fluctuations of the available computation power and communication load on the single execution units. A fine-grained model of the 802.11 DCF protocol has been used for the performance evaluation of the proposed framework. The results demonstrate that the distributed approach is suitable for the detailed simulation of very-large scale wireless networks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud
computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to
process massive amounts of data. Processing large datasets has become crucial in research and business
environments. The big challenges associated with processing large datasets is the vast infrastructure
required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be
provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce paradigm can
be used to process data wherein the mapper assign part of task to particular Vms in cluster and reducer
combines individual output from each Vms to produce final result. we have proposed an algorithm to
reduce the overall data distribution and processing time. We tested our solution in Cloud Analyst
Simulation environment wherein, we found that our proposed algorithm significantly reduces the overall
data processing time in cloud.
Similar to Chapter on Book on Cloud Computing 96 (20)
1. Parallel Computing: State-of-the Art Perspective,
E.H. D’Hollander, G.R. Joubert, F.J. Peters, D.
Trystram (Eds.), Elsevier, 1996
A programming environment for heterogeneous network
computing with transparent workload redistribution
M. Angelaccio, M. Cermele and M. Colajanni
Dipartimento di Informatica, Sistemi e Produzione
Università di Roma “Tor Vergata”, Via della Ricerca Scientifica, Roma, Italy
The project presented in this paper aims to extend the SPMD programming
paradigm to a computational platform composed of a network of heterogeneous
workstations with time-varying conditions. Presently, packages such as PVM and
MPI allow us to use a network of distributed nodes as a single parallel machine but
do not overcome the potential inefficiencies due both to heterogeneity and the
unpredictable variability of usually shared resources. The aim of this paper is to
illustrate an environment that both supports SPMD programming on a network of
workstations, and also provides transparent dynamic data re-distribution. Our
experiments demonstrate that a workload re-distribution support is necessary to
achieve a satisfactory efficiency when the computational platform is subject to heavy
modifications.
1. INTRODUCTION
SPMD programming is a widely adopted paradigm for a large class of problems.
Nevertheless, it becomes hard to preserve efficiency when the computing platform is
highly irregular and subject to dynamically varying conditions. The SPMD
programming paradigm, in fact, requires the choice of a specific data decomposition,
and the insertion of primitives in a decomposition dependent way. This approach
yields parallel programs that correspond to a single data distribution and guarantee
adequate efficiency only for regular problems running on homogeneous static
platforms. On the other hand, there are several cases where both data
decomposition and hardware platform are subject to dynamic variations. For
example, in all the problems such as molecular dynamics in which the workload
intrinsically changes at run-time; in case of heterogeneous network computing to
adjust load balancing when the available resources dynamically vary; in the recovery
of parallel programs in the presence of faulty nodes provided that a run-time
process/data reconfiguration support is available. In all these cases the use of static
environments would lead to serious inefficiencies that can be avoided by adapting
the workload distribution (in this case corresponding to data decomposition) to the
modified framework. This can be obtained by decomposition and machine
independent (DMI) parallel programs that do not require specification of data
decomposition and target machine at compile-time.
2. Presently, two main frameworks (i.e. PVM and MPI) allow us to use a network of
distributed nodes as a single parallel machine, thus yielding the design of machine
independent (MI) programs. These packages finely hide differences among the
nodes of a distributed platform from the programmer, but they do not overcome the
potential inefficiencies due to heterogeneity and unpredictable variability of usually
shared resources. At the moment, the solution to this problem is completely left to
the programmer who has to face any random modification of the computing platform.
The intent of our project called DAME (DAta Migration Environment) is twofold:
firstly to write DMI programs in an explicit message passing environment, secondly to
support dynamic data re-distribution. The first goal has been accomplished by the
parallel run-time library PLUS the theoretical foundations of which are in [1]. PLUS
provides a set of DMI collective primitives that allow the design and implementation of
programs in which the distribution attributes can be settled at run-time. The second
goal has been achieved by a transparent mechanism that, at regular intervals,
checks the status of the platform and, if necessary, autonomously provides suitable
data migrations from overloaded to under-loaded nodes.
The paper is organised as follows. Section 2 presents the DAME project focusing
on its aims and comparing them to related frameworks. Section 3 outlines the virtual
architecture and its effects on data decomposition. Section 4 describes the
programming model provided by PLUS and the interactions among the DAME
components. Section 5 presents experimental results on a computational platform
composed of a network of workstations.
2. THE DAME PROJECT
DAME is an environment that supports SPMD programming by means of
primitives that identify node properties (such as memory, current computational
power, etc.), facilitate node grouping operations, and support inter/intra group
communications. DAME provides double independence: from machines and from
data distribution. For SPMD programs the amount of computation performed by each
processing unit is usually proportional to the size of data owned. Therefore at the
beginning, DAME automatically distributes data by taking into account the
differences among current computational power of each workstation. At run-time,
DAME provides a dynamic data balancing support to preserve efficiency on a
platform subject to modification without forcing the programmer to manage
potentially complex operations such as workload monitoring, process
synchronisation, data migrations, and so on.
Literature presents various examples of strategies for load balancing. Task
migration strategies for highly parallel computers are shown in [7], whereas optimal
scheduling algorithms for network computing are presented in [4]. Piranha
dynamically adapts Linda computations to the number of available workstations [2].
Nedeljkovic and Quinn propose a modification of the run-time system of Dataparallel
C (DPC) by adapting it to heterogeneous networks and providing transparent
workload migration [6]. Automatic Data Movement (ADM) furnishes a set of functions
that help the programmer to achieve load balancing by means of data migration [3].
By comparing DAME to the existing strategies for SPMD applications, it should be
noted that ADM is not yet transparent to the programmer, whereas DPC presents
some similarities even if it is carried out in a completely different way. In particular,
3. the programming language provided by this latter is Dataparallel C, a SIMD language
oriented to virtual processors without explicit communication primitives, whereas
DAME supports PLUS, a decomposition independent message-passing language for
SPMD computations. In addition, DAME achieves dynamic load balancing by data
migration only instead of virtual parallel processor migration, as needed in DPC.
Moreover, since DAME is partially built over PVM [5], it inherits all the portability
advantages of this latter framework.
3. VIRTUAL COMPUTATIONAL ARCHITECTURE
DAME supports a virtual mesh topology because SPMD programming is
considerably simplified if an underlying regular platform is assumed. Nevertheless,
workstations are heterogeneous and irregularly connected. Their topology is usually
composed of a main backbone that connects several subnets by means of some
bridges (Figure 1.a). Even if widely used protocols such as TCP-IP provide complete
interconnection among nodes, efficiency issues suggest that we should cluster
together nodes that are more quickly connected among each other.
To this purpose, DAME groups together nodes of the same physical subnet to
form the rows of the virtual mesh topology (the so called row subnets). In addition,
DAME emulates a regular platform (i.e. each group with the same number of nodes)
by splitting some nodes into several virtual nodes whose number depends on the
offered computational power of each workstation.
A B C D
E F
H I
A B D
G
1 A2 C1
H1 H2 I1 I2 I3
E1
C2
E2 E3 E4 F1 F2
G
Figure 1.a. Actual network. Figure 1.b. Virtual network.
For example, once the computational parameters have been evaluated, DAME
maps the irregular physical network of Figure 1.a into the virtual mesh of Figure 1.b.
The virtual mesh seems the best compromise because it introduces fewer virtual
links (grey lines in Figure 1.b) than unbounded degree topologies and it does not
represent a severe limitation since several practical applications can be immediately
mapped over such domain or can be easily reduced to it.
As a consequence of this virtual topology definition, DAME always maps the data
domain onto a mn virtual mesh (e.g. 36 in Figure 1.b). For example in the case of
2D matrix domain, the partition algorithm decomposes the matrix into m groups of
rows and n groups of columns (Figure 2.a). In such way, a programmer deals with
virtual nodes/decomposition and can adopt the usual SPMD paradigm for 2D regular
topologies. Figure 2.b shows the actual mapping of data on the physical network:
each node has an amount of data proportional to the offered computational power
4. thus implying a very irregular topology. The dynamic load balancing support that
causes data migration and run-time modifications of the physical data distribution
does not require any adjustment on the high level code oriented to virtual nodes
thanks to the decomposition-independent paradigm provided by the PLUS run-time
library underlying DAME. The PLUS language, in fact, overcomes the difficulties of
programming on irregular and variable domains by providing a suitable set of
functions whose syntax appears quite similar to that of traditional data-parallel
primitives. The DMI PLUS primitives are characterised by a semantic flexibility, in the
sense that they self-adapt their effect to any data distribution.
A B D
G
1 A2
C1
H1 H2 I1 I2 I3
E1
C2
E2
E3 E4 F1 F2
B D
G
A
H I
C
E F
Figure 2.a. Virtual data decomposition. Figure 2.b. Actual data decomposition.
4. DAME COMPONENTS
DAME is organised into two logical components: master and computing nodes.
The whole evolution of programs is governed by the master that is a process
resident in one node. Since the master is idle during most of program execution, one
node (possibly, the most powerful) carries on the double activity of master and
computing node. The master starts the PVM demon on each workstation, and groups
nodes according to the network configuration. The static data distribution is carried
out by a “data balancing algorithm” on the basis of the network monitor function that
quantifies the current computational power of each workstation (in Figure 3 these
activities are evidenced by the grey arrows). Afterwards, each node can start the
execution of the parallel code.
During program execution, a plus_check() call guarantees load balancing by
performing, if necessary, a data migration. In such a case, the program execution is
interrupted, information about current computational power is collected by the
network monitor and, if heavy modifications have occurred, dynamic data distribution
algorithm is executed (in Figure 3 these activities are evidenced by the black arrows).
The re-distribution is not performed by the master that only indicates to each node
which data are to be sent and to be received. In such a way, each row subnet can
concurrently re-distribute data among its nodes. For the sake of efficiency we
distinguish between local and global reconfiguration, in the sense that data
exchange can happen only among nodes belonging to the same row subnet (local)
or among row subnets (global). The scalability requirement is satisfied since, if we
increase the number of nodes, the complexity of load balancing grows in proportion
to the square root of the number of nodes.
5. Each node behaves as in an usual SPMD programming environment. The
programmer should insert communication PLUS primitives as he would with a
regular virtual mesh. The decomposition can be settled and/or modified at run-time
by means of the plus_check() primitive that can be called either by the programmer or
automatically by the system if heavy and unexpected events require the suddenly re-
evaluation of data partition.
The node program is written in C enriched by the PLUS primitives. The Figure 3
illustrates a typical aspect of a PLUS code and how the different DAME components
interact. The self-adapting characteristic of the PLUS primitives cannot be illustrated
because it influences an underlying level.
Figure 3. Template of a PLUS node program and interactions among function calls
and DAME components.
The PLUS primitives can be divided into four groups Some of them are currently
built on top of PVM [5] thus representing an auxiliary layer.
Identification primitives. Usually called once before the main loop of the program,
they return global (such as number of nodes involved in computation, number of row
subnets) and local information (such as position of each node in the mesh, its
number of row subnet).
Loop dependent primitives. Used inside the main loop, they can be distinguished
between owner compute functions that determine the owners of a given set of data,
and indexing rules that allows the programmer to access to local data by means of
their global indexes in the original data structure. These primitives are the
fundamental basis that supports the decomposition independence paradigm of
PLUS since the programmer is never required to exactly express where data are
located.
Communication primitives. They conform to the PVM standard by supporting
several types of data exchange among nodes and among row subnets (such as fan-
6. in, fan-out, gathering). Some primitives are implemented by means of PVM
functions, others are designed and implemented ex novo.
DAME interface primitives. They represent the only non-transparent interface
between a traditional SPMD code and the irregular computational platform. At
present, three primitives belong to this class: plus_init(), plus_end(), and plus_check().
7. 5. EXPERIMENTAL RESULTS
DAME is currently implemented on a Ethernet-based local area network
composed of four HP-9000, four Sun Sparc-Stations and one IBM RISC-6000 that
are connected as in Figure 1.a. Experiments were carried out on dedicated network
and workstations. In some experiments, though, some synthetic overheads were
added to the computational platform with the aim of emulating network and/or
machine contention. We have run several SPMD numerical algorithms such as
matrix multiplication, Gaussian and Cholesky factorisation, block Jacobi. For the
sake of room, we restrict ourselves here to the LU factorisation algorithm the results
of which are representative of the performance achieved by DAME. We evaluate
efficacy of the supports for irregular data decomposition, virtual network and dynamic
data re-distribution.
The first set of experiments has been carried out on a dedicated computational
platform. The aim is to demonstrate that the DAME supports do not add heavy
overheads to the execution times under static condition. Before starting computation,
the irregular data decomposition support partitions the workload in a way
proportional to the current computational power of each workstation. It has been
verified that for any number of machines and data dimension, DAME execution times
are lower than those achieved by using a workload equally partitioned among nodes.
In particular, Figure 4 shows the execution time (in seconds) of a parallel algorithm
for the factorisation of a dense matrix running on different numbers of workstations
under the hypothesis that no modification occurs in the computational platform. This
figure shows that considerable speed-up is achieved until four workstations are
involved, thus demonstrating that the irregular data decomposition and virtual
network supports do not degrade performance. The loss of efficiency for a higher
number of nodes is due both to an increased number of communications, and mainly
to the fact that the additional workstations belong to different physical subnets
connected through bridges.
1100900700500300100
0
40
80
120
160
1 Node
2 Nodes
4 Nodes
6 Nodes
8 Nodes
Matrix Size
Execution Time
1100900700500300100
0
20
40
60
80
100
2 Nodes
4 Nodes
8 Nodes
2-plus _check
4-plus _check
8-plus _check
Matrix Size
Execution Time
Figure 4. Execution times for LU factorisation Figure 5. Overhead of one plus_check()
call
8. of a dense matrix with varying dimensions without data migration.
(dedicated computational platform). (dedicated computational platform).
9. The efficacy of the dynamic data re-distribution support has to be evaluated under
static and dynamic condition. A trade-off exists between the performance
degradation due to load unbalance and the overhead due to the execution of the
plus_check() primitive. The latter consists of four phases: process synchronisation,
network monitoring, decision algorithm, and data re-distribution. Since DAME
efficiently implements the second and third phases, the main costly factors of the
plus_check() execution are process synchronisation and data re-distribution.
Figure 5 shows the execution times of a DAME program with and without
plus_check() call, respectively. Since no modification occurs in the computational
platform, no data re-distribution is carried out. Therefore, the gap between the two
curves evidences the cost of the first three phases. In particular, the light differences
demonstrate the scalability of the plus_check() primitive: the introduced overhead, in
fact, does not increase for higher number of nodes. It should be noted, though, that
this low overhead is also due to the characteristics of the considered SPMD
algorithm which implicitly synchronises the different processes at the end of each
iteration, if the workload is well balanced.
Figure 6 shows the execution time tex of the same parallel algorithm when some
modification of the computational power of workstations occurs. To evaluate the
impact of data re-distribution only, we preserve the global power of the
computational platform. In particular, at time tex/4, one workstation is burdened with
three synthetic workloads that cause a loss of power equal to 10%, 30% and 50%,
respectively. At the same time, some other workstations gain an analogous amount
of power. In this experiment the DAME program executes only one plus_check() call at
time tex/2. The (plain) curves point out the importance of a dynamic data migration
support especially when the occurred modifications are heavy (for the considered
algorithm, at least 30%) and/or the computational cost of the problem is high (i.e. in
case of long execution times).
1100900700500300100
0
20
40
60
80
100
120
10 %
10 %-p lu s_check
30 %
30 %-p lu s_check
50 %
50 %-p lu s_check
Matrix Size
Execution Time
1100900700500300100
0
20
40
60
80
100
120
10 %
10 %-p lu s_check
30 %
30 %-p lu s_check
50 %
50 %-p lu s_check
Matrix Size
Execution Time
Figure 6. Execution times with and without Figure 7. Execution times with and without
data migration for different variations of the data migration for different variations of the
computational platform (1 plus_check() call). computational platform (3 plus_check() calls).
10. Figure 7 illustrates the same experiments under a different frequency of the
plus_check() call, that is at time tex/4, tex/2 and 3tex/4. In this case, the modification of
the computational power occurs at time tex/8. We can observe that the additional
overhead caused by the multiple occurrence of the plus_check() call is widely
compensated if heavy modifications occur in the platform: the execution time is
reduced if a power variation of at least 30% occurs, whereas a longer execution time
is observed when the modifications are light (less than 30%).
In addition, by considering Figure 6 and 7 together, we can observe that three
plus_check() calls improve performance of the 50%-modification case to the extent that
the resulting execution time is lower than the unbalanced 30%-modification case
(compare 30% and 50%-plus_check curves in the two figures). It should be noted,
though, that here the checkpoint frequency is empirically solved once known the
program execution time. The optimal checkpoint insertion for any kind of SPMD
algorithm is one of the open problem that is still under study.
6. CONCLUSIONS
The DAME project presented in this paper aims to face some of the intrinsic
difficulties of SPMD programming on heterogeneous and time-varying network
platforms. DAME supplies the programmer with four kinds of transparent supports: a
run-time library (PLUS) of decomposition and machine independent primitives; a
virtual mesh abstraction that hides irregularities of the network; a static mechanism
that automatically distributes workload in a way which is proportional to the current
computational power of each workstation; a dynamic and transparent data migration
support that masks any modification of the underlying platform. The satisfying
experimental results shown by all these supports demonstrate that DAME is a
theoretical-based and efficacious framework for SPMD network computing and it
preserves efficiency when the platform is subject to dynamic variations.
References
[1] M. Angelaccio, M. Colajanni, “Unifying and optimizing parallel linear algebra
algorithms”, IEEE Trans. on Parallel and Distributed Systems, v. 4, no. 12, pp.
1382-1397, Dec. 1993.
[2] N. Carriero, D. Kaminsky, “Adaptive parallelism and Piranha”, IEEE Computer, v.
28, no. 1, Jan. 1995.
[3] J. Casas, R. Konuru, S.W. Otto, R. Prouty, J. Walpole, “Adaptive load migration
systems for PVM”, Proc. of Supercomputing ’94, pp. 390-399, Nov. 1994.
[4] K. Efe, V. Krishnamoorty, “Optimal scheduling of compute-intensive tasks on a
network of workstations”, IEEE Trans. on Parallel and Distributed Systems, v. 6,
no. 6, pp. 668-673, June 1995.
[5] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM
3.0 User’s Guides and Reference Manual, Feb. 1993 (available via ftp).
[6] N. Nedeijkovic, M.J. Quinn, “Data-parallel programming on a network of
heterogeneous workstations”, Concurrency: Practice and Experience, v. 5, no. 4,
pp. 257-268, June 1993.
11. [7] M.H. Willebeek-Le Mair, A.P. Reeves, “Strategies for dynamic load balancing on
highly parallel computers“, IEEE Trans. on Parallel and Distributed Systems, v. 4,
no. 9, pp. 979-993, Sept. 1993.