This document compares different workflow scheduling algorithms in cloud computing. It first introduces workflow management systems and their components like workflow design, structure, models and composition. It then discusses workflow scheduling including architectures, transformation approaches, strategies and task clustering techniques. Four scheduling algorithms - First Come First Serve (FCFS), Min-min, Max-min and Minimum Completion Time (MCT) - are described and their performance is evaluated using simulation on two datasets. The results show that the Max-min algorithm achieves the lowest makespan and total cost compared to the other algorithms.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Scheduling
1. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 31
Comparative Study of Workflow Scheduling
Algorithms in Cloud Computing
Santhosh B1
, Harshitha2
, Prachi Kaneria A3
, Dr D.H Manjaiah4
Assistant Professor, Department of MCA, AIMIT, St Aloysius College, Mangalore, India1
Msc 3rd semester, Department of Software Technology, AIMIT, St Aloysius College, Mangalore, India2, 3
Professor, Department of Computer Science, Mangalore University, Mangalore, India
4
ABSTRACT: Cloud computing is a new terminology which is achieved by Distributed, Parallel and Grid computing.
Cloud computing provides different types of resources like hardware and software as services via internet. Efficient
task scheduling mechanism can meet users’ requirements, and improve the resource utilization, thereby enhancing the
overall performance of the cloud computing environment. In this paper we have considered workflow scheduling and
conducted comparative analysis of workflow scheduling algorithms. Here Tasks require minimum completion time,
better performance and total cost of execution is considered.
I. INTRODUCTION
Cloud computing now is known as a provider of dynamic, scalable on demand, virtualized resources over the internet.
It also makes it possible to access applications and associated data from anywhere. Companies are able to rent
resources from cloud for storage and other computational purposes so that their infrastructure cost can be reduced
significantly. They can also make use of software, platforms and infrastructures as a services , based on pay-as-you-go
model.
INTRODUCTION TO WORKFLOW MANAGEMENT SYSTEM
Workflow Management System (WMS) is used for management of executingthe workflow tasks on the computing
resources. The major components of WMS are shown in Figure 1.
Fig1: Elements of a WMS [1]
Workflow design describes how components of workflow can be defined andcomposed. The elements of workflow
design as shown in Figure 2.
Fig 2: Taxonomy of Workflow Design [2]
2. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 32
A workflow structure describe the relationship between the tasks of theworkflow. A workflow structure can be of two
types Directed Acyclic Graph(DAG) and Non DAG. Workflow structure can further be categorized asSequence,
Parallelism and Choice in the DAG based structure. In the sequence structure, the tasks execute in a series. A new task
can only be executed if theprevious task has been successfully executed. In parallelism structure, tasks ofworkflow can
executes concurrently. In choice structure, the workflow tasks can be executed in series as well as concurrently.The
Non DAG based structure, contains all the DAG based patterns along withIteration pattern. In Iteration pattern, the task
can executes in an iterativemanner. A complex workflow can be formed with the combination of these fourtypes of
patterns in multiple ways.Workflow model defines the workflow, its tasks and its structure. Workflowmodels are of
two types; abstract and concrete. Abstract workflow describe theworkflow in an abstract form which means that
specific resources are notreferred for task execution. Concrete workflow describes the workflow tasksbinded to specific
resources.Workflow composition system allows users to accumulate component to theworkflow. In the user directed
system, user can modify the workflow structuredirectly by editing in the workflow. But in automatic, system
generatesworkflow automatically. User directed system is further categorized inLanguage based modeling and graph
based modeling. In former, user describeworkflow using some markup language such as XML. It is a difficultapproach
because user have to remember a lot of language syntax. Graph basedmodeling is simple and easy way to modify the
workflow with the help of basicgraph elements [1].
1.1 WORKFLOW SCHEDULING
Mapping and management of workflow task’s execution on shared resources is donewith the help of workflow
scheduling. So workflow scheduling finds a correctsequences of task execution which obeys the business constraint.
The elements ofworkflow scheduling are shown in the Figure 3
Figure 3: Elements of Workflow Scheduling [2]
Scheduling Architecture is very important in case of quality, performance andscalability of the system. In the
centralized workflow environment, schedulingdecisions for all the tasks in the workflow is done by a single central
scheduler.There is no central controller for multiple schedulers in decentralized approachbut communication between
schedulers is possible. Each scheduler schedulesthe workflow to the resource having less load. On the other hand in
hierarchicalscheduling there is a central manager. Not only Workflow execution is controlled by this central manager
but also the sub-workflows are assigned tothe lower-level schedulers [1].Scheduling decisions which are taken on the
basis of task or sub workflow are known as local decisions and which are taken by keeping in mind the wholeworkflow
are called global decisions. Global decisions based scheduling givesbetter overall results because only one task or sub
workflow is considered inlocal decision scheduling.
Transformation of abstract models to concrete models is done in two ways:static and dynamic. Concrete models are
generated before the execution withthe help of current available information about the execution environment andthe
3. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 33
dynamic change of resource is not considered in the static scheme. Staticschemes further categorized in two types;
user-directed and simulation-based.In the user directed planning scheme, decision about resource mapping
andscheduling is done on the basis of user’s knowledge, preferences andperformance criteria. But in case of simulation
based, a best schedule can beachieved by simulating task execution on resources before the workflowexecution starts.
Both static and dynamic information about resources used formaking scheduling decisions at run-time are consider in
case of dynamicscheme. For dynamic scheme, there are two approaches prediction-based andjust-in-time scheduling is
used. In prediction-based, dynamic information isused along with some results on basis of prediction and in simulation-
based static scheduling, in which prediction of performance of execution of tasksusing resources is done along with
generation of optimal schedule before theexecution of task is done. But due to this the initial schedule changed during
theexecution. Just in-time scheduling makes decision at execution time.Scheduling strategies are categorized into
performance driven, market drivenand trust driven. In case of performance driven strategies, focus is to achieve the
highest performance for the user defined QoS parameter. So the workflow’stasks mapped with resources gives the
optimal performance. Most of theperformance driven scheduling strategies focuses to maximize the makespan ofthe
workflow. Market driven Strategies focuses on resource availability,allocation cost and quality for the budget and
deadline. A market model is usedto schedule the workflow tasks to the available resource dynamically whichresults in
less cost. Trust driven focuses on security and reputation of theresources and tasks are scheduled based on the trust by
considering these twoparameters [1].
II. WORKFLOW TASK CLUSTERING
Task clustering is basically used to combine the small sized tasks of the workflow intoa comparatively large sized task
to minimize the makespan of the workflow by reducingthe impact of queue wait time. So task clustering restructure the
workflow bycombining the small sized tasks into a cluster and execute this cluster as a single task.Due to clustering the
number of task in the workflow is reduced and there is less queuewait time as compared to small sized tasks. Task
clustering is categorized as level- andlabel based clustering [3], vertical, blocked and balanced clustering [4]. Figure
4represents a simple workflow. Figure 5 describes horizontal clustering in level 2 andlevel 4 with each cluster contains
two tasks.
4. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 34
Level-based clustering is also called horizontal clustering. In this type ofclustering the tasks are independent and the
tasks of the same level of theworkflow can be combined together. In vertical clustering, tasks of the same pipeline can
be combined together.Blocked clustering represents the combination of both horizontal and verticalclustering in the
same workflow.Figure 6 represents vertical clustering, in which tasks in the same pipeline can be combined together to
form a cluster. Each cluster of vertical clustering contains threetasks. Figure 7 shows the blocked clustering, in which
horizontal and verticalclustering can be combined together.Tasks in the same level of workflow can have different
execution time and wheneverthese tasks are combined without the consideration of their runtime variance then itcauses
the problem of load imbalance, i.e., some cluster may contain the smaller tasks
and other may have larger tasks. Due to this inappropriate clustering, workflowoverhead is produced. Another problem
is data dependency between the tasks ofworkflow in case of level-based clustering which is called dependency
imbalance.
III. SCHEDULING ALGORITHMS
The main objective of scheduling algorithm is to achieve the best system throughputwith proper resource utilization
and high performance by obeying the user’s specifiedQoS parameter. There are various time comparison based
scheduling heuristics existsin the grid and Cloud computing environment, which schedules the tasks by comparingthe
arrival time or execution time of the tasks. In the following section these schedulingheuristics are described.
4.1 First Come First Serve (FCFS)
In this algorithm, tasks are compared on the basis of their arrival time and the taskwhich comes first in the ready queue
is served first. Advantage of this algorithm is itssimplicity and fast execution behavior. But the main disadvantage of
this algorithm isthat sometimes due to the execution of a longer job, which comes in the queue first,small jobs have to
wait for its completion. Due to this problem the waiting time of tasksincreased and overall performance of the
workflow execution decreases.
4.2 Min-min
In this algorithm, small task is executed first so that large task delays for long time.Algorithm begins with by sorting
the set of all unmapped tasks in increasing order oftheir completion time. Then the tasks having the minimum
completion is scheduledfrom the unmapped task set and the mapped task has been removed from unmappedtask list,
and the process repeats until all the tasks of unmapped list is mapped to thecorresponding available resources [5].
4.3 Max-min
In this algorithm, large task is executed first so that small task delays for long time.This algorithm is very similar to
Min-min algorithm, instead of sorting the task in theincreasing order of completion time. This algorithms sorts the tasks
in decreasing orderof their completion time. Then the task with the overall maximum completion time isselected from
this task list and scheduled to the corresponding available resource. Thenthe scheduled task has been removed from
unmapped task set and the process repeatsuntil all tasks of unmapped list is mapped [5].
4.4 Minimum Completion Time (MCT)
In this algorithm, task that takes least time to complete is allocated to a machinerandomly. So MCT behaves somewhat
like Min-min. However, Min-min algorithmconsiders all the unmapped tasks during each mapping decision but on the
other handMCT considers only one task at a time [6].
IV. EXPERIMENTAL RESULTS
The paper uses simulation to test and verify the efficiency and correctness of the scheduling algorithm presented in this
paper.
5.1 Simulation setup
The proposed algorithm is simulated in a simulation toolkit workflowsim[7].We have created a datacenter with the
following properties and created 20 virtual machines.
5. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 35
Arch Os Vmm time_zone cost costPerMem CostPerStorage costPerBw
x86 Linux Xen 10.0 0.3 0.05 0.1 0.1
5.2 Analysis of results
The fig 8 shows workflow of task dependency in CyberShake_50 and fig 9 shows simulation results when MINMIN
simulation algorithm is used. Here the total cost is calculated using the formula
cost= (communication cost + computation cost) = (data (both input and output) *
* unit cost of data + runtime * cpu cost) for each task
Fig 8: CyberShake_50 task dependency graph
6. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 36
Fig 9: simulation results when MINMIN simulation is used
The table shows the makespan and total cost of scheduling of tasks using the algorithms FCFS,MCT,MAXMIN and
MINMIN with Montage_100 and CyberShake_50 datasets
Scheduling Algorithms Data set Make span time Total cost
FCFS Montage_100 114.2 533.58
MCT Montage_100 114.2 533.58
MINMIIN Montage_100 115.19 534.56
MAXMIN Montage_100 113.55 534.7
FCFS CyberShake_50 851.2 5996.73
MCT CyberShake_50 851.2 5996.73
MINMIIN CyberShake_50 851.2 5996.73
MAXMIN CyberShake_50 846.95 5996.69
Table 1 : Results of various scheduling algorithms with different datasets
Our experiment shows that MAXMIN is the most efficient with respect to the makespan and total cost in comparison to
the other algorithms FCFS,MCT,MINMIN
V. CONCLUSIONS
Minimizing Makspan is one of the objective in scheduling algorithm. TheExperiments shown MAXMIN is the most
efficient with respect to the makespan and total cost in comparison to the other algorithms FCFS,MCT,MINMIN.
These algorithm still can be improved by considering multiple objective functions, fault tolerance. because after a job is
submitted to the resource , if the resource becomes unavailable it may affect the makespan and cost of execution.
ACKNOWLEDGEMENT
The authors would like to thank the dedicated research group in the area of Cloud & Grid Computing , wireless
networks at the Dept of Computer Science, AIMIT, St Aloysius College, Mangalore, India, for many stimulating
discussions for further improvement and future aspects of the Paper. Lastly but not least the author would like to thank
everyone, including the anonymous reviewers.
REFERENCES
[1] J. Yu and R. Buyya, “A Taxonomy of Workflow Management Systems for GridComputing,” journel of Grid Computing, Springer, pp. 171-200,
2005.
[2] J. Yu and R. Buyya, “A taxonomy of scientific workflow systems for gridcomputing,” SIGMOD Record, Vol. 34, no. 3, pp. 44-49, 2005.
[3] G. Singh et al. “Workflow task clustering for best effort systems with Pegasus,” InProceedings of the 15th ACM Mardi Gras conference: From
lightweight mash-upsto lambda grids: Understanding the spectrum of distributed computingrequirements, applications, tools, infrastructures,
interoperability, and theincremental adoption of key capabilities. ACM, 2008.
[4] W. Chen, R. Silva, E. Deelman, and R. Sakellariou, “Balanced Task Clustering inScientific Workflows,” In proceedings IEEE 9th International
Conference oneScience, pp. 188-195. 2013.
7. ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol.2, Special Issue 5, October 2014
Copyright to IJIRCCE www.ijircce.com 37
[5] D. Tracy et. al. “A comparison of eleven static heuristics for mapping a class ofindependent tasks onto heterogeneous distributed computing
systems,” Journal ofParallel and Distributed computing, vol.61, no. 6, pp. 810-837, 2001.
[6] F. Dong and S.G. Akl, “Scheduling Algorithms for Grid Computing: State of theArt and Open Problems,” Technical Report of the Open Issues in
Grid SchedulingWorkshop, School of Computing, University Kingston, Ontario, January 2006.
[7] W. Chen and E. Deelman, “WorkflowSim: A toolkit for simulating scientificworkflows in distributed environments,” E-Science (e-Science), 2012
IEEE 8th
International Conference on. IEEE, 2012.