Many Machine Learning inference workloads compute predictions based on a limited number of models that are deployed together in the system. These models often share common structure and state. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks, thus being unaware of optimization and sharing opportunities.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
Elementary Parallel Algorithm - Sum of n numbers on Hypercube, Shuffle Exchange and Mesh SIMD computers, UMA multiprocessors, Broadcasting and pre-fix sum on multicomputer.
In today's world developers are faced with the problem of writing high-performing algorithms that scale efficiently across a range of multi-core processors. Traditional blocked algorithms need to be tuned to each processor, but the discovery of cache-oblivious algorithms give developers new tools to tackle this emerging challenge. In this talk you will learn about the external memory model, the cache-oblivious model, and how to use these tools to create faster, scalable algorithms.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 21st
Abstract. The maritime industry is huge and consists of a lot of complex processes. It is a consequence of the fact that the maritime industry provides most of the goods transportation. During transportation, people serve the vessel. And here the problem is raised of the optimal distribution of crew on vessels. This problem can be solved by formalizing the integer programming problem. In practice, we saw that solving this problem is time-consuming since there are a large number of free variables. This makes the solution inapplicable to the end-user. In this work, we describe the approach to speed up a solution of crew optimization for the maritime industry using the Rolling Time Horizon technique. Our approach is 3.5 times faster than the benchmark and deviates from the optimal solution by less than 1%.
Many Machine Learning inference workloads compute predictions based on a limited number of models that are deployed together in the system. These models often share common structure and state. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks, thus being unaware of optimization and sharing opportunities.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
Elementary Parallel Algorithm - Sum of n numbers on Hypercube, Shuffle Exchange and Mesh SIMD computers, UMA multiprocessors, Broadcasting and pre-fix sum on multicomputer.
In today's world developers are faced with the problem of writing high-performing algorithms that scale efficiently across a range of multi-core processors. Traditional blocked algorithms need to be tuned to each processor, but the discovery of cache-oblivious algorithms give developers new tools to tackle this emerging challenge. In this talk you will learn about the external memory model, the cache-oblivious model, and how to use these tools to create faster, scalable algorithms.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 21st
Abstract. The maritime industry is huge and consists of a lot of complex processes. It is a consequence of the fact that the maritime industry provides most of the goods transportation. During transportation, people serve the vessel. And here the problem is raised of the optimal distribution of crew on vessels. This problem can be solved by formalizing the integer programming problem. In practice, we saw that solving this problem is time-consuming since there are a large number of free variables. This makes the solution inapplicable to the end-user. In this work, we describe the approach to speed up a solution of crew optimization for the maritime industry using the Rolling Time Horizon technique. Our approach is 3.5 times faster than the benchmark and deviates from the optimal solution by less than 1%.
Google BACAT talk.
Abstract: Dialectica Categories (also known as dialectica spaces), the main construction of my phd thesis, have had several (unrelated) applications. I've used them to model Linear Logic, FILL(Full Intuitionistic Linear Logic), the Lambek Calculus and classes of Petri Nets (with C. Brown and D. Gurr). They were also used to model state in programming language semantics, after U. Reddy (with M. Correa and H. Hausler), and fuzzy Petri nets (with A. Syropoulos) and several 'superpower games' (A. Blass). Recently Mihai Budiu, Joel Galenson and Gordon Plotkin used dialectica categories in the modelling of partial compilers. I want to discuss this application, presented in the preprint "The Compiler Forest" (#90), available from Plotkin's webpage, to see if I understand it. Since I know little about compilers, audience participation will be very welcome!
Optimized Assignment of Independent Task for Improving Resources Performance ...ijgca
Grid computing has emerged from category of distributed and parallel computing where the
heterogeneous resources from different network are used simultaneously to solve a particular problem that
need huge amount of resources. Potential of Grid computing depends on my issues such as security of
resources, heterogeneity of resources, fault tolerance & resource discovery and job scheduling. Scheduling
is one of the core steps to efficiently exploit the capabilities of heterogeneous distributed computing
resources and is an NP-complete problem. To achieve the promising potential of grid computing, an
effective and efficient job scheduling algorithm is proposed, which will optimized two important criteria to
improve the performance of resources i.e. makespan time & resource utilization. With this, we have
classified various tasks scheduling heuristic in grid on the basis of their characteristics.
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Pooyan Jamshidi
Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis~\cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.
VaMoS 2022 - Transfer Learning across Distinct Software SystemsLuc Lesoil
Many research studies predict the performance of configurable software using machine learning techniques, thus requiring large amounts of data. Transfer learning aims to reduce the amount of data needed to train these models and has been successfully applied on different executing environments (hardware) or software versions. In this paper we investigate for the first time the idea of applying transfer learning between distinct configurable systems. We design a study involving two video encoders (namely x264 and x265) coming from different code bases. Our results are encouraging since transfer learning outperforms traditional learning for two performance properties (out of three). We discuss the open challenges to overcome for a more general application.
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...IDES Editor
The data flow model is gaining popularity as a
programming paradigm for multi-core processors. Efficient
scheduling of an application modeled by Directed Acyclic
Graph (DAG) is a key issue when performance is very
important. DAG represents computational solutions, in which
the nodes represent tasks to be executed and edges represent
precedence constraints among the tasks. The task scheduling
problem in general is a NP-complete problem[2]. Several static
scheduling heuristics have been proposed. But the major
problem in static list scheduling is the inherent difficulty in
exact estimation of task cost and edge cost in a DAG and also
its inability to consider and manage with runtime behavior of
tasks. This underlines the need for dynamic scheduling of a
DAG. This paper presents how in general, dynamic scheduling
of a DAG can be done. Also proposes 4 simple methods to
perform dynamic scheduling of a DAG. These methods have
been simulated and experimented using a representative set
of DAG structured computations from both synthetic and real
problems. The proposed dynamic scheduler performance is
found to be in comparable with that of static scheduling
methods. The performance comparison of the proposed
dynamic scheduling methods is also carried out.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Scheduling Using Multi Objective Genetic Algorithmiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Google BACAT talk.
Abstract: Dialectica Categories (also known as dialectica spaces), the main construction of my phd thesis, have had several (unrelated) applications. I've used them to model Linear Logic, FILL(Full Intuitionistic Linear Logic), the Lambek Calculus and classes of Petri Nets (with C. Brown and D. Gurr). They were also used to model state in programming language semantics, after U. Reddy (with M. Correa and H. Hausler), and fuzzy Petri nets (with A. Syropoulos) and several 'superpower games' (A. Blass). Recently Mihai Budiu, Joel Galenson and Gordon Plotkin used dialectica categories in the modelling of partial compilers. I want to discuss this application, presented in the preprint "The Compiler Forest" (#90), available from Plotkin's webpage, to see if I understand it. Since I know little about compilers, audience participation will be very welcome!
Optimized Assignment of Independent Task for Improving Resources Performance ...ijgca
Grid computing has emerged from category of distributed and parallel computing where the
heterogeneous resources from different network are used simultaneously to solve a particular problem that
need huge amount of resources. Potential of Grid computing depends on my issues such as security of
resources, heterogeneity of resources, fault tolerance & resource discovery and job scheduling. Scheduling
is one of the core steps to efficiently exploit the capabilities of heterogeneous distributed computing
resources and is an NP-complete problem. To achieve the promising potential of grid computing, an
effective and efficient job scheduling algorithm is proposed, which will optimized two important criteria to
improve the performance of resources i.e. makespan time & resource utilization. With this, we have
classified various tasks scheduling heuristic in grid on the basis of their characteristics.
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Pooyan Jamshidi
Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis~\cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.
VaMoS 2022 - Transfer Learning across Distinct Software SystemsLuc Lesoil
Many research studies predict the performance of configurable software using machine learning techniques, thus requiring large amounts of data. Transfer learning aims to reduce the amount of data needed to train these models and has been successfully applied on different executing environments (hardware) or software versions. In this paper we investigate for the first time the idea of applying transfer learning between distinct configurable systems. We design a study involving two video encoders (namely x264 and x265) coming from different code bases. Our results are encouraging since transfer learning outperforms traditional learning for two performance properties (out of three). We discuss the open challenges to overcome for a more general application.
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...IDES Editor
The data flow model is gaining popularity as a
programming paradigm for multi-core processors. Efficient
scheduling of an application modeled by Directed Acyclic
Graph (DAG) is a key issue when performance is very
important. DAG represents computational solutions, in which
the nodes represent tasks to be executed and edges represent
precedence constraints among the tasks. The task scheduling
problem in general is a NP-complete problem[2]. Several static
scheduling heuristics have been proposed. But the major
problem in static list scheduling is the inherent difficulty in
exact estimation of task cost and edge cost in a DAG and also
its inability to consider and manage with runtime behavior of
tasks. This underlines the need for dynamic scheduling of a
DAG. This paper presents how in general, dynamic scheduling
of a DAG can be done. Also proposes 4 simple methods to
perform dynamic scheduling of a DAG. These methods have
been simulated and experimented using a representative set
of DAG structured computations from both synthetic and real
problems. The proposed dynamic scheduler performance is
found to be in comparable with that of static scheduling
methods. The performance comparison of the proposed
dynamic scheduling methods is also carried out.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Scheduling Using Multi Objective Genetic Algorithmiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
It is rather surprising that in software engineering, standard measurement units have yet to be
widely accepted and used. Every other engineering discipline has their own. By and large, effort
is the most commonly used parameter for measuring software initiatives. The problem of
course is that effort is not an independent variable – it depends on who is doing the work and
how it is done. This presentation looks at an approach that has been used to convert the large
amount of effort data usually collected in an organization into something that can meaningfully
be used for estimation and comparison purposes.
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...ecij
In this paper, a hybrid swarm intelligence algorithm (named VNABCSA) is presented for the scheduling of non-preemptive soft real-time tasks in heterogeneous multiprocessor platforms. The method is based on a combination of artificial bee colony and simulated annealing algorithms. The multi-objective function of the VNABCSA algorithm is defined to minimize the total tardiness of all tasks, total number of utilized processors, total completion time, total waiting time for all tasks, and total waiting time for all processors. We introduce a hybrid variable neighborhood search strategy to improve the convergence speed of the algorithm. Simulation results demonstrate the efficiency of the proposed methodology as compared with the
existing scheduling algorithms.
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...ijfcstjournal
Algorithms developed for scheduling applications on heterogeneous multiprocessor system focus on a
single objective such as execution time, cost or total data transmission time. However, if more than one
objective (e.g. execution cost and time, which may be in conflict) are considered, then the problem becomes
more challenging. This project is proposed to develop a multiobjective scheduling algorithm using
Evolutionary techniques for scheduling a set of dependent tasks on available resources in a multiprocessor
environment which will minimize the makespan and reliability cost. A Non-dominated sorting Genetic
Algorithm-II procedure has been developed to get the pareto- optimal solutions. NSGA-II is a Elitist
Evolutionary algorithm, and it takes the initial parental solution without any changes, in all iteration to
eliminate the problem of loss of some pareto-optimal solutions.NSGA-II uses crowding distance concept to
create a diversity of the solutions.
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
With the development of productivity and the fast growth of the economy, environmental pollution, resource utilization and low product recovery rate have emerged subsequently, so more and more attention has been paid to the recycling and reuse of products. However, since the complexity of disassembly line balancing problem (DLBP) increases with the number of parts in the product, finding the optimal balance is computationally intensive. In order to improve the computational ability of particle swarm optimization (PSO) algorithm in solving DLBP, this paper proposed an improved adaptive multi-objective particle swarm optimization (IAMOPSO) algorithm. Firstly, the evolution factor parameter is introduced to judge the state of evolution using the idea of fuzzy classification and then the feedback information from evolutionary environment is served in adjusting inertia weight, acceleration coefficients dynamically. Finally, a dimensional learning strategy based on information entropy is used in which each learning object is uncertain. The results from testing in using series of instances with different size verify the effect of proposed algorithm.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems iosrjce
Imprecise computation model is used in dynamic scheduling algorithm having heuristic function to
schedule task sets. A task is characterized by ready time, worst case computation time, deadline and resource
requirements. A task failing to meet its deadline and resource requirements on time is split into mandatory part
and optional part. These sub-tasks of a task can execute concurrently on multiple cores, thus achieving
parallelization provided by the multi-core system. Mandatory part produces acceptable results while optional
part refines the result further. To study the effectiveness of proposed scheduling algorithm, extensive simulation
studies have been carried out. Performance of proposed scheduling algorithm is compared with myopic and
improved myopic scheduling algorithm. The simulation studies shows that schedulability of task split myopic
algorithm is always higher than myopic and improved myopic algorithm.
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsIosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Evolutionary Algorithms in Test Generation for digital systems
Yuriy Skobtsov, Vadim Skobtsov, St.Petersburg Polytechnic University
For presentation follow the link: https://www.youtube.com/watch?v=gUnKmPg614k
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...csandit
Computational Grid (CG) creates a large heterogeneous and distributed paradigm to manage and execute the applications which are computationally intensive. In grid scheduling tasks are assigned to the proper processors in the grid system to for its execution by considering the execution policy and the optimization objectives. In this paper, makespan and the faulttolerance of the computational nodes of the grid which are the two important parameters for the task execution, are considered and tried to optimize it. As the grid scheduling is considered to be NP-Hard, so a meta-heuristics evolutionary based techniques are often used to find a solution for this. We have proposed a NSGA II for this purpose. The performance estimation ofthe proposed Fault tolerance Aware NSGA II (FTNSGA II) has been done by writing program in Matlab. The simulation results evaluates the performance of the all proposed algorithm and the results of proposed model is compared with existing model Min-Min and Max-Min algorithm which proves effectiveness of the model.
Similar to Observations on dag scheduling and dynamic load-balancing using genetic algorithm (20)
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
3. 3
Introduction
1. Heterogeneous Computing System
Heterogeneous computing systems refer to electronic systems that use a
variety of different types of computational units.
2. Task Scheduling
The multiprocessor scheduling problem is to allocate the tasks of a parallel
program to processors in a way that minimizes its completion time and
optimizes the performance.
3. Load Balancing
The technique of distributing load among processors in order to avoid overloading
4. 4
Thesis Objective
This thesis comprises the study of two research projects: DAG
Scheduling using Genetic Algorithm in Part I and Dynamic Load
Balancing using Genetic Algorithm in Part II.
Part I: The objective of part-I is to design an algorithm to schedule
the DAG tasks on Heterogeneous processors in such a way that
minimize the total completion time (Makespan).
Part II: This part is based on designing an algorithm for scheduling
the load among the processor in such a way that none of the processor
is overloaded.
Comparison of various metrics is to be done for DAG-Scheduling and
Dynamic Load Balancing.
5. 5
Directed Acyclic Graph
A process or an application can be broken down into a set of tasks
we represent these tasks in the form of a directed acyclic graph
(DAG)
A parallel program with n tasks can be represented by a 4-tuple (T, E,
D, [Ai])
1) T = {tl, t2, . . , tn} is the set of tasks.
2) E the edges, represents the communication between tasks
3) D is an n x n matrix, where the element dij of D is the data volume
which ti should transmit into tj.
4) Ai, 1 <= i <= n, is a vector [eil, ei2, . . . , eiu,], where eiu, is the execution
time of ti on pu.
7. 7
DAG-Scheduling
Basic Assumptions
Any processor can execute the task and communicate with other
machines at the same time.
Each processor can only execute one process at each moment.
Graph is fully connected.
Once a processor has started task execution it continues without
interruption, and after completing the execution it immediately sends
the output data to children tasks in parallel.
Intra-processor communication cost is negligible compared to the
inter-processor communication cost.
8. 8
DAG-Scheduling
1.Task selection and Schedule phase
Task Selection phase
Task are selected according to their height in DAG
Calculation of task’s start and finish time
Where , PAT(ti , pu) = processor available
DAT(ti , tk , pu) = data avaulable
ET(ti , pu) = execution time of ti on pu.
9. 9
DAG-Scheduling
2. Scheduling Encoding (Chromosome)
A string is a candidate solution for the problem. String consists of
several lists. Each list is associated with a processor.
Lets for any application of 10 tasks the generated schedule is:
• 1th Processor : t3 t4 t8
• 2th Processor : t5 t7 t9
• 3th Processor : t0 t1 t2 t6
Then the chromosome can be represented as matrix of size[No. of
Task x No. of Processors]
P1 P2 P3
t3 t5 t0
t4 t7 t1
t8 t9 t2
10. 10
DAG-Scheduling
3. Initialzation
Population of size POP_SIZE has been initialized.
4.Fitness Function
The GA requires a fitness function that assigns a score to each
chromosome in the population.
The fitness function in a GA is the objective function that is to be
optimized.
In the proposed algorithm Fitness function returns the time when all
tasks in a DAG complete their executions. A fitness function f of a
string x is defined as follows:
11. 11
DAG-Scheduling
5.Roulette-Wheel Selection
Roulette-wheel selection is used for selecting potentially useful solutions
for recombination ( Crossover ).
Probability of being selected of any chromosome is:
Sum of Fitness = 8.
Rand ( 8 ) = 3
Chromosome3 is the parent
.5 1.5 4 2
12. 12
DAG-Scheduling
6.Crossover
New chromosome is generated with this operator.
A parent chromosome is selected by roulette-wheel operator and then
two processors are selected from this chromosome.
Apply single point crossover on these selected processor list.
Figure: Modified Single Point Crossover
13. 13
DAG-Scheduling
7. Mutation
A mutation operation is designed to reduce the idle time of a
processor waiting for the data from other processors.
Data Dominating Parent (DDP) task of task ti be the task which
transmits the largest volume of data to ti. That is,
Example of Mutation :
Figure : Mutation.
14. 14
DAG-Scheduling
8. Termination Conditions
Condition 1: If we find an Individual which has Makespan less then
the specified minimum, then GA stops evolving.
Condition 2: Variable gen stores the number, how many generations
the GA should run. User input the variable every time he runs the
program. When the generation count crosses the gen, the GA stops
evolving.
15. 15
DAG-Scheduling
9. Pseudo code
Begin
initialize P(k); {create an initial population of size POP_SIZE}
evaluate P(k); {evaluates the fitness of all the chromosomes}
Repeat
For i=1 to POPSIZE do
Select a chromosome a as parent from population;
Child 1 <= Crossover( parent);
Child 2 <= Mutation ( Child 1 );
Add (new temporary population, Child 1, Child 2);
End For;
Make (new pop, new temp pop, old pop );
Population = new population;
While (not termination condition);
Select Best chromosome in population as solution and
return it;
End
16. 16
Dynamic Load Balancing
1. Basic Definitions
Load Calculations: sum of processes execution times allocated to
that processor.
Maxspan : maximal finishing time of all processes.
max span(T ) = max(Load ( pi ) )
∀ 1 ≤ i ≤ Number of Processors
Processor Utilization : Ratio of Load(Pi) to maxspan.
17. 17
Dynamic Load Balancing
2. Basic Assumptions
Each processor can only execute one process at each moment.
Tasks are non-preemptive.
Tasks are totally independent i.e. there is no data transfer take place
among tasks and there are no precedence relations.
Heterogeneity of processors is defined by a multiplying factor x. If 1th
processor’s speed is P1 then the ith
processor’s speed can be calculated
as
Pi = (1+ (i-1)*x) p1
18. 18
Dynamic Load Balancing
3. Sliding Window Technique
How many task should selected at a time from the pool of task, this is
decided by size of sliding window.
Size is inputted by user. The number of task in a chromosome is
equal to size of sliding window.
Sliding window contains task ID .
Example: Sliding window of size 10
1 2 3 4 5 6 7 8 9 10
19. 19
Dynamic Load Balancing
4. Scheduling Encoding (Chromosome)
This is a 2D matrix of size[no. of processors x size of sliding
window].
Figure: Chromosome Representation
5. Initialization
Population of size POP_SIZE is initialized by randomly assigning
tasks to processors.
P1 P2 P3
3 5 0
4 7 1
8 9 2
6
20. 20
Dynamic Load Balancing
6. Fitness Function
Fitness function attaches a value to each chromosome in the
population, which indicates the quality of the schedule.
7. Roulette-wheel selection
Roulette wheel selection is used which I have described in DAG-
Scheduling Section.
21. 21
Dynamic Load Balancing
8. Cycle Crossover
Single point crossover can’t be used in this GA as it may cause some
tasks to be assigned more than once while some are not assigned.
A new crossover operator is designed called cycle crossover. Here I
am showing you how does it works.
A 8 6 4 10 9 7 1 5 3 2
B 10 2 9 5 6 9 8 7 4 1
A` 8 - - - - - - - - -
B` 10 - - - - - - - - -
A` 8 6 - 10 9 7 1 5 - 2
B` 10 2 - 5 6 9 8 7 - 1
A` 8 6 3 10 9 7 1 5 4 2
B` 10 2 4 5 6 9 8 7 3 1
22. 22
Dynamic Load Balancing
9. Random Swap Mutation
Random mutation technique is used to apply mutation on newly
generated child.
Two processor is selected randomly from the processor list. Make
sure must be different.
Then two random task selected from each processor and swapped.
mutation
P1 P2 P3
3 5 0
4 7 1
8 9 2
6
P1 P2 P3
3 5 0
4 7 8
1 9 2
6
23. 23
Dynamic Load Balancing
10. Termination Condition
The variable gen stores the count, how many generations the GA
should run.
When the generation count crosses the gen ,the GA stop evaluating.
11.Task allocation and updating the window
When termination condition met, fittest chromosome is assigned to
final schedule.
Now window is filled up again by sliding along the subsequent tasks
waiting in the task pool.
24. 24
Dynamic Load Balancing
12. Pseudo Code
Begin
Repeat
save the tasks into sliding window;
initialize P(k); {create an initial population of size POP_SIZE}
evaluate P(k); {evaluates the fitness of all the chromosomes}
Repeat
For i=1 to POPSIZE do
Select two chromosome as parent from population;
Child 1, Child2 <= Crossover( parent1,parent2);
Child 3 <= Mutation ( Child 1 );
child4 <= Mutation (Child2)
Add (new temp pop, Child 1, Child 2,Child3,Child4);
End For;
Make (new pop, new temp pop, old pop );
Population = new population;
While (not termination condition);
Assign the Best chromosome in population to final schedule;
While(Task pool has more tasks)
End
25. 25
Experimental results and Discussion
1.Dynamic Load Balancing
1. Test Parameters
The measurement of performance of proposed algorithm was based
on two metrics: total completion time (Makespan) and average
processor utilization. The calculation of these metrics depends on
the following parameters: Default
values
• Population Size ( POP_SIZE ) 100
• Sliding window size ( sizeSlidingWindow ) 10
• No. of Generations ( gen ) 100
• No. of Processors ( no_of_Proc ) 10
26. 26
Experimental results and Discussion
1.Dynamic Load Balancing
2. Changing the population size
The population sizes ranged from 20 to 200.
It was observed that increasing in the population does not increase
the performance after certain limit. You after 120 the completion
time is approximately constant
Increasing the population size had a positive effect on the processor
utilization
27. 27
Experimental results and Discussion
1.Dynamic Load Balancing
3. Changing the No. of Generation Cycles
The number of generation cycles was changed from 1 to 500.
As the no. of Generation cycle was increased, performance of the
schedule also increased. The total completion time was significantly
reduced as the number of generations was increased from 1 to 50.
Increasing the population size had a positive effect on the processor
utilization
28. 28
Experimental results and Discussion
1.Dynamic Load Balancing
4. Changing the No. Processors
The no. of processors was changed from 2 to 20.
As the no. of processors were increased for the same number of
tasks, Completion time decreased because now the system has more
number of processing elements.
When the no. of processors was increased, avg. processor utilization
decreased.
29. 29
Experimental results and Discussion
1.Dynamic Load Balancing
5. Changing the No. Tasks
The number of tasks was varied from 10 to 1000.
As we increases the no. of tasks, the time taken to completion time
also increases.
When no. of tasks is large, avg. utilization is more then 96%.
30. 30
Experimental results and Discussion
1.Dynamic Load Balancing
6. Changing the Sliding Window
The sliding window size was changed from 2 to 50
the effect on completion time and avg. Processors utilization is
given below:
31. 31
Experimental results and Discussion
2.DAG-Scheduling
1. Test Parameters
The measurement of performance for DAG scheduling algorithm
was done by speedup .
The speedup value for a given graph is computed by dividing the
sequential execution by the parallel execution time.
Speedup for the proposed DAG scheduling algorithm depends on
the following parameters.
• No. of Generations
32. 32
Experimental results and Discussion
2.DAG-Scheduling
2. Changing the no. of Generation cycles
The no. of generations was varied from 1 to 1000.
As the no. of Generation cycle was increased, performance of the
schedule also increased.
After 250 Generations, it is observed that running the GA does not
seem to improve performance much.
33. 33
Experimental results and Discussion
2.DAG-Scheduling
3. Changing the no. of Tasks
The no. of tasks was varied from 10 to 60.
34. 34
Conclusion
The result generated by the proposed dynamic load-balancing
mechanism using Genetic Algorithm was extremely good when the
number of tasks is large.
The avg. Processor Utilization by proposed algorithm was found
more then 97-98%.
The complete genetic algorithm for DAG scheduling was
implemented and tested on the various input task graphs in a
heterogeneous system.
Proposed DAG- Scheduling algorithm gives best speedup when the
generation cycle is more then 250.
35. 35
References
1) AlbertY.Zomaya, Senior Member, IEEE, Chris Ward, and Ben Macey, “Genetic Scheduling for Parallel Processor Systems: Comparative
Studies and Performance Issues” VOL.10, NO.8, AUGUST 1999
2) Andrew J. Page , Thomas M. Keane, Thomas J. Naughton, "Multi-heuristic dynamic task allocation using genetic algorithms in a
heterogeneous distributed system" Journal of Parallel and Distributed Computing Volume 70, Issue 7, July 2010, Pages 758–766.
3) Yuan, Yuan , Xue, Huifeng “Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems” Jisuanji Celiang yu Kongzhi/
Computer Measurement & Control (China). Vol. 13, no. 5, pp. 488-490. May 2005
4) Y.K. Kwok and I. Ahmad, “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors”,
IEEE Trans. Parallel and Distributed Systems, Vol. 7, No. 5, pp. 506-521, May 1996.
5) A.T. Haghighat, K. Faez, M. Dehghan, A. Mowlaei, & Y. Ghahremani, “GA-based heuristic algorithms for bandwidth- delay-constrained
least-cost multicast routing”, International Journal of Computer Communications 27, 2004, 111–127.
6) D.E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass” Addison-Wesley, 1989.
7) Albert Y. Zomaya, Senior Member, IEEE, and Yee-Hwei “Observations on using genetic algorithms for dynamic load-balancing ” IEEE
Transactions On Parallel And Distributed Systems, Vol. 12, No. 9, September 2001
8) H.C. Lin and C.S. Raghavendra, “A Dynamic Load-Balancing Policy with a Central Job Dispatcher (LBC)” IEEE Trans. Software Eng.,
vol. 18, no. 2, pp. 148-158, Feb. 1992.
9) M. Munetomo, Y. Takai, and Y. Sato, “A Genetic Approach to Dynamic Load-Balancing in a Distributed Computing System” Proc. First
Int'l Conf. Evolutionary Computation, IEEE World Congress Computational Intelligence, vol. 1, pp. 418-421, 1994.