This document discusses scheduling algorithms for batches of MapReduce jobs in heterogeneous cloud environments with budget and deadline constraints. It proposes two optimization problems: 1) Given a fixed budget B, how to efficiently schedule tasks to minimize workflow completion time without exceeding the budget. 2) Given a fixed deadline D, how to efficiently schedule tasks to minimize monetary cost without missing the deadline. It presents an optimal dynamic programming algorithm for the first problem that runs in O(κB2) time, and two faster greedy algorithms. It also briefly discusses reducing the second problem to a knapsack problem. The goal is to help cloud service providers deploy MapReduce cost-effectively given user constraints.
Time Efficient VM Allocation using KD-Tree Approach in Cloud Server Environmentrahulmonikasharma
Cloud computing is an incipient and quickly evolving model, with new expenses and capabilities being proclaimed frequently. The increases of user on cloud with the expansion of variety of services, with that the complete allocation of resource with the minimum latent time for Virtual machine is necessary. To allocate this virtual cloud computing resources to the cloud user is a key technical issue because user demand is dynamic in nature that required dynamic allocation of resource too. To improve the allocation there must be a correct balanced algorithmic scheduling for Resource Allocation Technique. The aim of this work is to allocate resource to scientific experiment request coming from multiple users, wherever customized Virtual machines (VM) are aloft in applicable host out there in cloud. Therefore, properly programmed scheduling cloud is extremely vital and it’s significant to develop efficient scheduling methods for appropriately allocation of VMs into physical resource. The planned formulas minimize the time interval quality so as of O (Log n) by adopting KD-Tree.
Challenges in Dynamic Resource Allocation and Task Scheduling in Heterogeneou...rahulmonikasharma
Resource Allocation and Task scheduling are the most important key words in today’s dynamic cloud based applications. Task scheduling involves assigning tasks to available processors with the aim of producing minimum execution time, whereas resource allocation involves deciding on an allocation policy to allocate resources to various tasks so as to have maximum resource utilization. Algorithms used for scheduling resources for virtual machines are designed for both homogeneous and heterogeneous environments. Majority of the algorithms focus on processing ability often neglecting other features such as network bandwidth and actual resource requirements. One of the major pitfalls in cloud computing is related to optimizing the resources being allocated. Because of the uniqueness of the model, resource allocation is performed with the objective of minimizing the costs associated with it. The other challenges of resource allocation are meeting customer demands and application requirements. In this paper we will focus on the challenges faced in task scheduling and resource allocation in dynamic heterogeneous clouds.
THRESHOLD BASED VM PLACEMENT TECHNIQUE FOR LOAD BALANCED RESOURCE PROVISIONIN...IJCNCJournal
The unbalancing load issue is a multi-variation, multi-imperative issue that corrupts the execution and productivity of processing assets. Workload adjusting methods give solutions of load unbalancing circumstances for two bothersome aspects over-burdening and under-stacking. Cloud computing utilizes planning and workload balancing for a virtualized environment, resource partaking in cloud foundation. These two factors must be handled in an improved way in cloud computing to accomplish ideal resource sharing. Henceforth, there requires productive resource, asset reservation for guaranteeing load advancement in the cloud. This work aims to present an incorporated resource, asset reservation, and workload adjusting calculation for effective cloud provisioning. The strategy develops a Priority-based Resource Scheduling Model to acquire the resource, asset reservation with threshold-based load balancing for improving the proficiency in cloud framework. Extending utilization of Virtual Machines through the suitable and sensible outstanding task at hand modifying is then practiced by intensely picking a job from submitting jobs using Priority-based Resource Scheduling Model to acquire resource asset reservation. Experimental evaluations represent, the proposed scheme gives better results by reducing execution time, with minimum resource cost and improved resource utilization in dynamic resource provisioning conditions.
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTIJCNCJournal
Cloud computing has an indispensable role in the modern digital scenario. The fundamental challenge of cloud systems is to accommodate user requirements which keep on varying. This dynamic cloud environment demands the necessity of complex algorithms to resolve the trouble of task allotment. The overall performance of cloud systems is rooted in the efficiency of task scheduling algorithms. The dynamic property of cloud systems makes it challenging to find an optimal solution satisfying all the evaluation metrics. The new approach is formulated on the Round Robin and the Shortest Job First algorithms. The Round Robin method reduces starvation, and the Shortest Job First decreases the average waiting time. In this work, the advantages of both algorithms are incorporated to improve the makespan of user tasks.
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
Time Efficient VM Allocation using KD-Tree Approach in Cloud Server Environmentrahulmonikasharma
Cloud computing is an incipient and quickly evolving model, with new expenses and capabilities being proclaimed frequently. The increases of user on cloud with the expansion of variety of services, with that the complete allocation of resource with the minimum latent time for Virtual machine is necessary. To allocate this virtual cloud computing resources to the cloud user is a key technical issue because user demand is dynamic in nature that required dynamic allocation of resource too. To improve the allocation there must be a correct balanced algorithmic scheduling for Resource Allocation Technique. The aim of this work is to allocate resource to scientific experiment request coming from multiple users, wherever customized Virtual machines (VM) are aloft in applicable host out there in cloud. Therefore, properly programmed scheduling cloud is extremely vital and it’s significant to develop efficient scheduling methods for appropriately allocation of VMs into physical resource. The planned formulas minimize the time interval quality so as of O (Log n) by adopting KD-Tree.
Challenges in Dynamic Resource Allocation and Task Scheduling in Heterogeneou...rahulmonikasharma
Resource Allocation and Task scheduling are the most important key words in today’s dynamic cloud based applications. Task scheduling involves assigning tasks to available processors with the aim of producing minimum execution time, whereas resource allocation involves deciding on an allocation policy to allocate resources to various tasks so as to have maximum resource utilization. Algorithms used for scheduling resources for virtual machines are designed for both homogeneous and heterogeneous environments. Majority of the algorithms focus on processing ability often neglecting other features such as network bandwidth and actual resource requirements. One of the major pitfalls in cloud computing is related to optimizing the resources being allocated. Because of the uniqueness of the model, resource allocation is performed with the objective of minimizing the costs associated with it. The other challenges of resource allocation are meeting customer demands and application requirements. In this paper we will focus on the challenges faced in task scheduling and resource allocation in dynamic heterogeneous clouds.
THRESHOLD BASED VM PLACEMENT TECHNIQUE FOR LOAD BALANCED RESOURCE PROVISIONIN...IJCNCJournal
The unbalancing load issue is a multi-variation, multi-imperative issue that corrupts the execution and productivity of processing assets. Workload adjusting methods give solutions of load unbalancing circumstances for two bothersome aspects over-burdening and under-stacking. Cloud computing utilizes planning and workload balancing for a virtualized environment, resource partaking in cloud foundation. These two factors must be handled in an improved way in cloud computing to accomplish ideal resource sharing. Henceforth, there requires productive resource, asset reservation for guaranteeing load advancement in the cloud. This work aims to present an incorporated resource, asset reservation, and workload adjusting calculation for effective cloud provisioning. The strategy develops a Priority-based Resource Scheduling Model to acquire the resource, asset reservation with threshold-based load balancing for improving the proficiency in cloud framework. Extending utilization of Virtual Machines through the suitable and sensible outstanding task at hand modifying is then practiced by intensely picking a job from submitting jobs using Priority-based Resource Scheduling Model to acquire resource asset reservation. Experimental evaluations represent, the proposed scheme gives better results by reducing execution time, with minimum resource cost and improved resource utilization in dynamic resource provisioning conditions.
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTIJCNCJournal
Cloud computing has an indispensable role in the modern digital scenario. The fundamental challenge of cloud systems is to accommodate user requirements which keep on varying. This dynamic cloud environment demands the necessity of complex algorithms to resolve the trouble of task allotment. The overall performance of cloud systems is rooted in the efficiency of task scheduling algorithms. The dynamic property of cloud systems makes it challenging to find an optimal solution satisfying all the evaluation metrics. The new approach is formulated on the Round Robin and the Shortest Job First algorithms. The Round Robin method reduces starvation, and the Shortest Job First decreases the average waiting time. In this work, the advantages of both algorithms are incorporated to improve the makespan of user tasks.
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...IJCNCJournal
In a cloud computing environment with multiple data centers over a wide area, it is highly likely that each data center would provide the different service quality to users at different locations. It is also required to consider the nodes at the edge of the network (local cloud) which support applications such as IoTs that require low latency and location awareness. The authors proposed the joint multiple resource allocation method in a cloud computing environment that consists of multiple data centers and each data center provides the different network delay. However, the existing method does not take account of cases where requests that require a short network delay occur more than expected. Moreover, the existing method does not take account of service processing time in data centers and therefore cannot provide the optimal resource allocation when it is necessary to take the total processing time (both network delay and service processing time in a data center) into consideration in resource allocation.
Management of context aware software resources deployed in a cloud environmen...ijdpsjournal
In cloud computing environments, context information is continuously created by context providers and
consumed by the applications on mobile devices. An important characteristic of cloud-based context aware
services is meeting the service level agreements (SLAs) to deliver a certain quality of service (Qos), such as
guarantees on response time or price. The response time to a request of context-aware software is affected
by loading extensive context data from multiple resources on the chosen server. Therefore, the speed of
such software would be decreased during execution time. Hence, proper scheduling of such services is
indispensable because the customers are faced with time constraints. In this research, a new scheduling
algorithm for context aware services is proposed which is based on classifying similar context consumers
and dynamically scoring the requests to improve the performance of the server hosting highly-requested
context-aware software while reducing costs of cloud provider. The approach is evaluated via simulation
and comparison with gi-FIFO scheduling algorithm. Experimental results demonstrate the efficiency of the
proposed approach.
An efficient resource sharing technique for multi-tenant databases IJECEIAES
Multi-tenancy is a key component of Software as a Service (SaaS) paradigm. Multi-tenant software has gained a lot of attention in academics, research and business arena. They provide scalability and economic benefits for both cloud service providers and tenants by sharing same resources and infrastructure in isolation of shared databases, network and computing resources with Service level agreement (SLA) compliances. In a multitenant scenario, active tenants compete for resources in order to access the database. If one tenant blocks up the resources, the performance of all the other tenants may be restricted and a fair sharing of the resources may be compromised. The performance of tenants must not be affected by resource-intensive activities and volatile workloads of other tenants. Moreover, the prime goal of providers is to accomplish low cost of operation, satisfying specific schemas/SLAs of each tenant. Consequently, there is a need to design and develop effective and dynamic resource sharing algorithms which can handle above mentioned issues. This work presents a model referred as MultiTenant Dynamic Resource Scheduling Model (MTDRSM) embracing a query classification and worker sorting technique enabling efficient and dynamic resource sharing among tenants. The experiments show significant performance improvement over existing model.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A latency-aware max-min algorithm for resource allocation in cloud IJECEIAES
Cloud computing is an emerging distributed computing paradigm. However, it requires certain initiatives that need to be tailored for the cloud environment such as the provision of an on-the-fly mechanism for providing resource availability based on the rapidly changing demands of the customers. Although, resource allocation is an important problem and has been widely studied, there are certain criteria that need to be considered. These criteria include meeting user’s quality of service (QoS) requirements. High QoS can be guaranteed only if resources are allocated in an optimal manner. This paper proposes a latency-aware max-min algorithm (LAM) for allocation of resources in cloud infrastructures. The proposed algorithm was designed to address challenges associated with resource allocation such as variations in user demands and on-demand access to unlimited resources. It is capable of allocating resources in a cloud-based environment with the target of enhancing infrastructure-level performance and maximization of profits with the optimum allocation of resources. A priority value is also associated with each user, which is calculated by analytic hierarchy process (AHP). The results validate the superiority for LAM due to better performance in comparison to other state-of-the-art algorithms with flexibility in resource allocation for fluctuating resource demand patterns.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Cloud computing is a new computing paradigm that, just as electricity was firstly generated at home and
evolved to be supplied from a few utility providers, aims to transform computing into a utility. It is a mapping
strategy that efficiently equilibrates the task load into multiple computational resources in the network based on the
system status to improve performance. The objective of this research paper is to show the results of Hybrid DEGA,
in which GA is implemented after DE
LOAD BALANCING ALGORITHM ON CLOUD COMPUTING FOR OPTIMIZE RESPONE TIMEijccsa
To improve the performance of cloud computing, there are many parameters and issues that we should consider, including resource allocation, resource responsiveness, connectivity to resources, unused resources exploration, corresponding resource mapping and planning for resource. The planning for the use of resources can be based on many kinds of parameters, and the service response time is one of them.
The users can easily figure out the response time of their requests, and it becomes one of the important QoSs. When we discover and explore more on this, response time can provide solutions for the distribution, the load balancing of resources with better efficiency. This is one of the most promising
research directions for improving the cloud technology. Therefore, this paper proposes a load balancing algorithm based on response time of requests on cloud with the name APRA (ARIMA Prediction of Response Time Algorithm), the main idea is to use ARIMA algorithms to predict the coming response time, thus giving a better way of effectively resolving resource allocation with threshold value. The experiment
result outcomes are potential and valuable for load balancing with predicted response time, it shows that prediction is a great direction for load balancing.
Differentiating Algorithms of Cloud Task Scheduling Based on various Parametersiosrjce
Cloud computing is a new design structure for large, distributed data centers. Cloud computing
system promises to offer end user “pay as go” model. To meet the expected quality requirements of users, cloud
computing need to offer differentiated services to users. QoS differentiation is very important to satisfy
different users with different QoS requirements. In this paper, various QoS based scheduling algorithms,
scheduling parameters and the future scope of discussed algorithms have been studied. This paper summarizes
various cloud scheduling algorithms, findings of algorithms, scheduling factors, type of scheduling and
parameters considered
Current perspective in task scheduling techniques in cloud computing a reviewijfcstjournal
Cloud computing is a development of parallel, distributed and grid computing which provides computing
potential as a service to clients rather than a product. Clients can access software resources, valuable
information and hardware devices as a subscribed and monitored service over a network through cloud
computing.Due to large number of requests for access to resources and service level agreements between
cloud service providers and clients, few burning issues in cloud environment like QoS, Power, Privacy and
Security, VM Migration, Resource Allocation and Scheduling need attention of research
community.Resource allocation among multiple clients has to be ensured as per service level agreements.
Several techniques have been invented and tested by research community for generation of optimal
schedules in cloud computing. A few promising approaches like Metaheuristics, Greedy, Heuristic
technique and Genetic are applied for task scheduling in several parallel and distributed systems. This
paper presents a review on scheduling proposals in cloud environment.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...IJCNCJournal
Cloud computing is an environment which provides services for user demand such as software, platform, infrastructure. Applications which are deployed on cloud computing have become more varied and complex to adapt to increase end-user quantity and fluctuating workload. One popular characteristic of
cloud computing is the heterogeneity of network, hosts and virtual machines (VM). There were many studies on cloud computing modeling based on queuing theory, but most studies have focused on homogeneity characteristic. In this study, we propose a cloud computing model based on open Jackson
network for multi-tier application systems which are deployed on heterogeneous VMs of IaaS cloud computing. The important metrics are analyzed in our experiments such as mean waiting time; mean request quantity, the throughput of the system. Besides that, metrics in model is used to modify number VMs
allocated for applications. Result of experiments shows that open queue network provides high efficiency.
MCCVA: A NEW APPROACH USING SVM AND KMEANS FOR LOAD BALANCING ON CLOUDijccsa
Nowadays, the demand of using resources, using services via the intranet system or on the Internet is rapidly growing. The respective problem coming is how to use these resources effectively in terms of time and quality. Therefore, the network QoS and its economy are people concerns, cloud computing was born in an inevitable trend. However, managing resources and scheduling tasks in virtualized data centres on the cloud are challenging tasks. Currently, there are a lot of Load Balancing algorithms applied in clouds and proposed by many authors, scholars, and experts. These existing methods are more about natural and heuristic, but the application of AI, or modern datamining technologies, in load balancing is not too popular due to the different characteristics of cloud. In this paper, we propose an algorithm to reduce the processing time (makespan) on cloud computing, helping the load balancing work more efficiency. Here, we use the SVM algorithm to classify the coming Requests, K - Mean to cluster the VMs in cloud, then the LB will allocate the requests into the VMs in the most reasonable way. In this way, request with the least processing time will be allocated to the VMs with the lowest usage. We name this new proposal as MCCVA - Makespan Classification & Clustering VM Algorithm. We have experimented and evaluated this algorithm in CloudSim, a cloud simulation environment, we obtained better results than some other wellknown algorithms. With this MCCVA, we can see the big potential of AI and datamining in Load Balancing, we can further develop LB with AI to achieve better and better results of QoS.
Role of Operational System Design in Data Warehouse Implementation: Identifyi...iosrjce
Data warehouse designing process takes input from operational system of the organization. Quality
of data warehousing solution depends on design of operational system. Often, operational system
implementations of organizations have some limitations. Thus, we cannot proceed for data warehouse
designing so easily. In this paper, we have tried to investigate operational system of the organization for
identifying such limitations and determine role of operational system design in the process of data warehouse
design and implementation. We have worked out to find possible methods to handle such limitations and have
proposed techniques to get a quality data warehousing solution under such limitations. To make the work based
on live example, National Rural Health Mission (NRHM) Project has been taken. It is a national project of
health sector, managed by Indian Government across the country. The complex structure and high volume of
data makes it an ideal case for data warehouse implementation.
Using Transcendental Number to Encrypt BlackBerry VideoJun Steed Huang
The basic concept of motion image based wireless monitoring and control system, the main requirements from the M2M communities and related encryption method of the wireless system are described. There are four different ways to explain what is M2M video streaming solutions: Machine2Man (for security applications), Man2Man (for commercial fast delivery services), Man2Machine (for mining and oil industries) and Machine2Machine (for environmental protections). The scrambling of motion image based video signals is done by using transcendental number that is iterated over Fibonacci prime number sequence, with video time stamp and user pass phrase, the simulation and experiment are reported. The major challenge in this area is providing low computation algorithm that runs easily on embedded Java application, without compromising a hack-proof security feature.
RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...IJCNCJournal
In a cloud computing environment with multiple data centers over a wide area, it is highly likely that each data center would provide the different service quality to users at different locations. It is also required to consider the nodes at the edge of the network (local cloud) which support applications such as IoTs that require low latency and location awareness. The authors proposed the joint multiple resource allocation method in a cloud computing environment that consists of multiple data centers and each data center provides the different network delay. However, the existing method does not take account of cases where requests that require a short network delay occur more than expected. Moreover, the existing method does not take account of service processing time in data centers and therefore cannot provide the optimal resource allocation when it is necessary to take the total processing time (both network delay and service processing time in a data center) into consideration in resource allocation.
Management of context aware software resources deployed in a cloud environmen...ijdpsjournal
In cloud computing environments, context information is continuously created by context providers and
consumed by the applications on mobile devices. An important characteristic of cloud-based context aware
services is meeting the service level agreements (SLAs) to deliver a certain quality of service (Qos), such as
guarantees on response time or price. The response time to a request of context-aware software is affected
by loading extensive context data from multiple resources on the chosen server. Therefore, the speed of
such software would be decreased during execution time. Hence, proper scheduling of such services is
indispensable because the customers are faced with time constraints. In this research, a new scheduling
algorithm for context aware services is proposed which is based on classifying similar context consumers
and dynamically scoring the requests to improve the performance of the server hosting highly-requested
context-aware software while reducing costs of cloud provider. The approach is evaluated via simulation
and comparison with gi-FIFO scheduling algorithm. Experimental results demonstrate the efficiency of the
proposed approach.
An efficient resource sharing technique for multi-tenant databases IJECEIAES
Multi-tenancy is a key component of Software as a Service (SaaS) paradigm. Multi-tenant software has gained a lot of attention in academics, research and business arena. They provide scalability and economic benefits for both cloud service providers and tenants by sharing same resources and infrastructure in isolation of shared databases, network and computing resources with Service level agreement (SLA) compliances. In a multitenant scenario, active tenants compete for resources in order to access the database. If one tenant blocks up the resources, the performance of all the other tenants may be restricted and a fair sharing of the resources may be compromised. The performance of tenants must not be affected by resource-intensive activities and volatile workloads of other tenants. Moreover, the prime goal of providers is to accomplish low cost of operation, satisfying specific schemas/SLAs of each tenant. Consequently, there is a need to design and develop effective and dynamic resource sharing algorithms which can handle above mentioned issues. This work presents a model referred as MultiTenant Dynamic Resource Scheduling Model (MTDRSM) embracing a query classification and worker sorting technique enabling efficient and dynamic resource sharing among tenants. The experiments show significant performance improvement over existing model.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A latency-aware max-min algorithm for resource allocation in cloud IJECEIAES
Cloud computing is an emerging distributed computing paradigm. However, it requires certain initiatives that need to be tailored for the cloud environment such as the provision of an on-the-fly mechanism for providing resource availability based on the rapidly changing demands of the customers. Although, resource allocation is an important problem and has been widely studied, there are certain criteria that need to be considered. These criteria include meeting user’s quality of service (QoS) requirements. High QoS can be guaranteed only if resources are allocated in an optimal manner. This paper proposes a latency-aware max-min algorithm (LAM) for allocation of resources in cloud infrastructures. The proposed algorithm was designed to address challenges associated with resource allocation such as variations in user demands and on-demand access to unlimited resources. It is capable of allocating resources in a cloud-based environment with the target of enhancing infrastructure-level performance and maximization of profits with the optimum allocation of resources. A priority value is also associated with each user, which is calculated by analytic hierarchy process (AHP). The results validate the superiority for LAM due to better performance in comparison to other state-of-the-art algorithms with flexibility in resource allocation for fluctuating resource demand patterns.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Cloud computing is a new computing paradigm that, just as electricity was firstly generated at home and
evolved to be supplied from a few utility providers, aims to transform computing into a utility. It is a mapping
strategy that efficiently equilibrates the task load into multiple computational resources in the network based on the
system status to improve performance. The objective of this research paper is to show the results of Hybrid DEGA,
in which GA is implemented after DE
LOAD BALANCING ALGORITHM ON CLOUD COMPUTING FOR OPTIMIZE RESPONE TIMEijccsa
To improve the performance of cloud computing, there are many parameters and issues that we should consider, including resource allocation, resource responsiveness, connectivity to resources, unused resources exploration, corresponding resource mapping and planning for resource. The planning for the use of resources can be based on many kinds of parameters, and the service response time is one of them.
The users can easily figure out the response time of their requests, and it becomes one of the important QoSs. When we discover and explore more on this, response time can provide solutions for the distribution, the load balancing of resources with better efficiency. This is one of the most promising
research directions for improving the cloud technology. Therefore, this paper proposes a load balancing algorithm based on response time of requests on cloud with the name APRA (ARIMA Prediction of Response Time Algorithm), the main idea is to use ARIMA algorithms to predict the coming response time, thus giving a better way of effectively resolving resource allocation with threshold value. The experiment
result outcomes are potential and valuable for load balancing with predicted response time, it shows that prediction is a great direction for load balancing.
Differentiating Algorithms of Cloud Task Scheduling Based on various Parametersiosrjce
Cloud computing is a new design structure for large, distributed data centers. Cloud computing
system promises to offer end user “pay as go” model. To meet the expected quality requirements of users, cloud
computing need to offer differentiated services to users. QoS differentiation is very important to satisfy
different users with different QoS requirements. In this paper, various QoS based scheduling algorithms,
scheduling parameters and the future scope of discussed algorithms have been studied. This paper summarizes
various cloud scheduling algorithms, findings of algorithms, scheduling factors, type of scheduling and
parameters considered
Current perspective in task scheduling techniques in cloud computing a reviewijfcstjournal
Cloud computing is a development of parallel, distributed and grid computing which provides computing
potential as a service to clients rather than a product. Clients can access software resources, valuable
information and hardware devices as a subscribed and monitored service over a network through cloud
computing.Due to large number of requests for access to resources and service level agreements between
cloud service providers and clients, few burning issues in cloud environment like QoS, Power, Privacy and
Security, VM Migration, Resource Allocation and Scheduling need attention of research
community.Resource allocation among multiple clients has to be ensured as per service level agreements.
Several techniques have been invented and tested by research community for generation of optimal
schedules in cloud computing. A few promising approaches like Metaheuristics, Greedy, Heuristic
technique and Genetic are applied for task scheduling in several parallel and distributed systems. This
paper presents a review on scheduling proposals in cloud environment.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...IJCNCJournal
Cloud computing is an environment which provides services for user demand such as software, platform, infrastructure. Applications which are deployed on cloud computing have become more varied and complex to adapt to increase end-user quantity and fluctuating workload. One popular characteristic of
cloud computing is the heterogeneity of network, hosts and virtual machines (VM). There were many studies on cloud computing modeling based on queuing theory, but most studies have focused on homogeneity characteristic. In this study, we propose a cloud computing model based on open Jackson
network for multi-tier application systems which are deployed on heterogeneous VMs of IaaS cloud computing. The important metrics are analyzed in our experiments such as mean waiting time; mean request quantity, the throughput of the system. Besides that, metrics in model is used to modify number VMs
allocated for applications. Result of experiments shows that open queue network provides high efficiency.
MCCVA: A NEW APPROACH USING SVM AND KMEANS FOR LOAD BALANCING ON CLOUDijccsa
Nowadays, the demand of using resources, using services via the intranet system or on the Internet is rapidly growing. The respective problem coming is how to use these resources effectively in terms of time and quality. Therefore, the network QoS and its economy are people concerns, cloud computing was born in an inevitable trend. However, managing resources and scheduling tasks in virtualized data centres on the cloud are challenging tasks. Currently, there are a lot of Load Balancing algorithms applied in clouds and proposed by many authors, scholars, and experts. These existing methods are more about natural and heuristic, but the application of AI, or modern datamining technologies, in load balancing is not too popular due to the different characteristics of cloud. In this paper, we propose an algorithm to reduce the processing time (makespan) on cloud computing, helping the load balancing work more efficiency. Here, we use the SVM algorithm to classify the coming Requests, K - Mean to cluster the VMs in cloud, then the LB will allocate the requests into the VMs in the most reasonable way. In this way, request with the least processing time will be allocated to the VMs with the lowest usage. We name this new proposal as MCCVA - Makespan Classification & Clustering VM Algorithm. We have experimented and evaluated this algorithm in CloudSim, a cloud simulation environment, we obtained better results than some other wellknown algorithms. With this MCCVA, we can see the big potential of AI and datamining in Load Balancing, we can further develop LB with AI to achieve better and better results of QoS.
Role of Operational System Design in Data Warehouse Implementation: Identifyi...iosrjce
Data warehouse designing process takes input from operational system of the organization. Quality
of data warehousing solution depends on design of operational system. Often, operational system
implementations of organizations have some limitations. Thus, we cannot proceed for data warehouse
designing so easily. In this paper, we have tried to investigate operational system of the organization for
identifying such limitations and determine role of operational system design in the process of data warehouse
design and implementation. We have worked out to find possible methods to handle such limitations and have
proposed techniques to get a quality data warehousing solution under such limitations. To make the work based
on live example, National Rural Health Mission (NRHM) Project has been taken. It is a national project of
health sector, managed by Indian Government across the country. The complex structure and high volume of
data makes it an ideal case for data warehouse implementation.
Using Transcendental Number to Encrypt BlackBerry VideoJun Steed Huang
The basic concept of motion image based wireless monitoring and control system, the main requirements from the M2M communities and related encryption method of the wireless system are described. There are four different ways to explain what is M2M video streaming solutions: Machine2Man (for security applications), Man2Man (for commercial fast delivery services), Man2Machine (for mining and oil industries) and Machine2Machine (for environmental protections). The scrambling of motion image based video signals is done by using transcendental number that is iterated over Fibonacci prime number sequence, with video time stamp and user pass phrase, the simulation and experiment are reported. The major challenge in this area is providing low computation algorithm that runs easily on embedded Java application, without compromising a hack-proof security feature.
#TesterbhiCoder - Every Tester should get into coding - Selenium automationAgile Testing Alliance
Agile Testing Alliance is trying to ensure that every testers fulfills their dream of getting into automation. Coding is something most are afraid off and their is lot of resistance from within them. There is huge demand for cross skilled testers, who can work in agile and devops team. who can code in selenium and who can understand the inherent coding language. This program has been a great success already in two cities in India, Mumbai and Pune. If you are interested to get this done in your organization at no training cost - please get in touch with us @AgileTA
Objectives of Five year plans in India,Five year plans,India,Development in India,Planning,Economic planning,Industries,India,Planning commission of India
Webtechnologien Grundlagen und Auswahl geeigneter Web Rahmenwerkeadoubleu
Innerhalb dieser Vorlesung werden die Grundlagen der im Web verwendeten Technologien gezeigt. Anhand von Use Cases wird gezeigt, wie man kleine Webanwendungen erstellt und wie sich diese von umfangreichen Webanwendungen auf Basis von Webrahmenwerken wie z.b. Apache Wicket oder Google Web Toolkit unterscheiden.
Scheduling Divisible Jobs to Optimize the Computation and Energy Costsinventionjournals
ABSTRACT : The important challenge in cloud computing environment is to design a scheduling strategy to handle jobs, and to process them in a heterogeneous environment with shared data centers. In this paper, we attempt to investigate a new analytical framework model that enables an existing private cloud data-center for scheduling jobs and minimizing the overall computation and energy cost together. Our model is based on Divisible Load Theory (DLT) model to derive closed-form solution for the load fractions to be assigned to each machines considering computation and energy cost. Our analysis also attempts to schedule the jobs such a way that cloud provider can gain maximum benefit for his service and Quality of Service (QoS) requirement user’s job. Finally, we quantify the performance of the strategies via rigorous simulation studies.
Recently, lot of interest have been put forth by researchers to improve workload scheduling in cloud platform. However, execution of scientific workflow on cloud platform is time consuming and expensive. As users are charged based on hour of usage, lot of research work have been emphasized in minimizing processing time for reduction of cost. However, the processing cost can be reduced by minimizing energy consumption especially when resources are heterogeneous in nature; very limited work have been done considering optimizing cost with energy and processing time parameters together in meeting task quality of service (QoS) requirement. This paper presents cost and performance aware workload scheduling (CPA-WS) technique under heterogeneous cloud platform. This paper presents a cost optimization model through minimization of processing time and energy dissipation for execution of task. Experiments are conducted using two widely used workflow such as Inspiral and CyberShake. The outcome shows the CPA-WS significantly reduces energy, time, and cost in comparison with standard workload scheduling model.
Research Inventy : International Journal of Engineering and Scienceinventy
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed
Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
works as a vital role in the cloud computing. Thus my protocol is designed to minimize the switching time,
improve the resource utilization and also improve the server performance and throughput. This method or
protocol is based on scheduling the jobs in the cloud and to solve the drawbacks in the existing protocols.
Here we assign the priority to the job which gives better performance to the computer and try my best to
minimize the waiting time and switching time. Best effort has been made to manage the scheduling of jobs
for solving drawbacks of existing protocols and also improvise the efficiency and throughput of the server.
Job scheduling in hybrid cloud using deep reinforcement learning for cost opt...ArchanaKalapgar
Project description Using Deep Reinforcement learning, I and my teammate are solving the problem of job scheduling in a hybrid environment to give results with the average job response time at a minimal cost. The software programming environment to be used is Tensor flow, Google Colab, and AWS.
Intelligent Workload Management in Virtualized Cloud EnvironmentIJTET Journal
Abstract— Cloud computing is a rising high performance computing environment with a huge scale, heterogeneous collection of self-sufficient systems and elastic computational design. To develop the overall performance of cloud computing, through the deadline constraint, a task scheduling replica is traditional for falling the system power utilization of cloud computing and recovering the yield of service providers. To improve the overall act of cloud environment, with the deadline constraint, a task scheduling model is conventional for reducing the system performance time of cloud computing and improving the profit of service providers. In favor of scheduling replica, a solving technique based on multi-objective genetic algorithm (MO-GA) is considered and the study is determined on programming rules, intersect operators, mixture operators and the scheme of arrangement of Pareto solutions. The model is designed based on open source cloud computing simulation platform CloudSim, to obtainable scheduling algorithms, the result shows that the proposed algorithm can obtain an enhanced solution, thus balancing the load for the concert of multiple objects.
Load Balancing Algorithm to Improve Response Time on Cloud Computingneirew J
Load balancing techniques in cloud computing can be applied at different levels. There are two main
levels: load balancing on physical server and load balancing on virtual servers. Load balancing on a
physical server is policy of allocating physical servers to virtual machines. And load balancing on virtual
machines is a policy of allocating resources from physical server to virtual machines for tasks or
applications running on them. Depending on the requests of the user on cloud computing is SaaS (Software
as a Service), PaaS (Platform as a Service) or IaaS (Infrastructure as a Service) that has a proper load
balancing policy. When receiving the task, the cloud data center will have to allocate these tasks efficiently
so that the response time is minimized to avoid congestion. Load balancing should also be performed
between different datacenters in the cloud to ensure minimum transfer time. In this paper, we propose a
virtual machine-level load balancing algorithm that aims to improve the average response time and
average processing time of the system in the cloud environment. The proposed algorithm is compared to the
algorithms of Avoid Deadlocks [5], Maxmin [6], Throttled [8] and the results show that our algorithms
have optimized response times.
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
Load balancing techniques in cloud computing can be applied at different levels. There are two main
levels: load balancing on physical server and load balancing on virtual servers. Load balancing on a
physical server is policy of allocating physical servers to virtual machines. And load balancing on virtual
machines is a policy of allocating resources from physical server to virtual machines for tasks or
applications running on them. Depending on the requests of the user on cloud computing is SaaS (Software
as a Service), PaaS (Platform as a Service) or IaaS (Infrastructure as a Service) that has a proper load
balancing policy. When receiving the task, the cloud data center will have to allocate these tasks efficiently
so that the response time is minimized to avoid congestion. Load balancing should also be performed
between different datacenters in the cloud to ensure minimum transfer time. In this paper, we propose a
virtual machine-level load balancing algorithm that aims to improve the average response time and
average processing time of the system in the cloud environment. The proposed algorithm is compared to the
algorithms of Avoid Deadlocks [5], Maxmin [6], Throttled [8] and the results show that our algorithms
have optimized response times.
An efficient cloudlet scheduling via bin packing in cloud computingIJECEIAES
In this ever-developing technological world, one way to manage and deliver services is through cloud computing, a massive web of heterogenous autonomous systems that comprise adaptable computational design. Cloud computing can be improved through task scheduling, albeit it being the most challenging aspect to be improved. Better task scheduling can improve response time, reduce power consumption and processing time, enhance makespan and throughput, and increase profit by reducing operating costs and raising the system reliability. This study aims to improve job scheduling by transferring the job scheduling problem into a bin packing problem. Three modifies implementations of bin packing algorithms were proposed to be used for task scheduling (MBPTS) based on the minimisation of makespan. The results, which were based on the open-source simulator CloudSim, demonstrated that the proposed MBPTS was adequate to optimise balance results, reduce waiting time and makespan, and improve the utilisation of the resource in comparison to the current scheduling algorithms such as the particle swarm optimisation (PSO) and first come first serve (FCFS).
Reliable and efficient webserver management for task scheduling in edge-cloud...IJECEIAES
The development in the field of cloud webserver management for the execution of the workflow and meeting the quality-of-service (QoS) prerequisites in a distributed cloud environment has been a challenging task. Though, internet of things (IoT) of work presented for the scheduling of the workflow in a heterogeneous cloud environment. Moreover, the rapid development in the field of cloud computing like edge-cloud computing creates new methods to schedule the workflow in a heterogenous cloud environment to process different tasks like IoT, event-driven applications, and different network applications. The current methods used for workflow scheduling have failed to provide better trade-offs to meet reliable performance with minimal delay. In this paper, a novel web server resource management framework is presented namely the reliable and efficient webserver management (REWM) framework for the edge-cloud environment. The experiment is conducted on complex bioinformatic workflows; the result shows the significant reduction of cost and energy by the proposed REWM in comparison with standard webserver management methodology.
Task Scheduling using Hybrid Algorithm in Cloud Computing Environmentsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Task scheduling is an important aspect to improve the utilization of resources in the Cloud Computing. This paper proposes a Divide and Conquer based approach for heterogeneous earliest finish time algorithm. The proposed system works in two phases. In the first phase it assigns the ranks to the incoming tasks with respect to size of it. In the second phase, we properly assign and manage the task to the virtual machine with the consideration of ideal time of respective virtual machine. This helps to get more effective resource utilization in Cloud Computing. The experimental results using Cybershake Scientific Workflow shows that the proposed Divide and Conquer HEFT performs better than HEFT in terms of task's finish time and response time. The result obtained by experimentally demonstrate that the proposed DCHEFT performance superiorly.
Similar to Ieeepro techno solutions ieee java project - budget-driven scheduling algorithms for (20)
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
1. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 1
Budget-Driven Scheduling Algorithms for
Batches of MapReduce Jobs in Heterogeneous
Clouds
Yang Wang and Wei Shi IEEE Member
Abstract—In this paper, we consider task-level scheduling algorithms with respect to budget and deadline constraints for a batch of
MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. The heterogeneity is manifested in the
popular ”pay-as-you-go” charging model where the service machines with different performance would have different service rates. We
organize the batch of jobs as a κ-stage workflow and study two related optimization problems, depending on whether the constraints
are on monetary budget or on scheduling length of the workflow. First, given a total monetary budget B, by combining an in-stage
local greedy algorithm (whose optimality is also proven) and dynamic programming (DP) techniques, we propose a global optimal
scheduling algorithm to achieve minimum scheduling length of the workflow within O(κB2). Although the optimal algorithm is efficient
when B is polynomially bounded by the number of tasks in the MapReduce jobs, the quadratic time complexity is still high. To improve
the efficiency, we further develop two greedy algorithms, called Global Greedy Budget (GGB) and Gradual Refinement (GR), each
adopting different greedy strategies. In GGB we extend the idea of the local greedy algorithm to the efficient global distribution of the
budget with minimum scheduling length as a goal whilst in GR we iteratively apply the DP algorithm to the distribution of exponentially
reduced budget so that the solutions are gradually refined. Second, we consider the optimization problem of minimizing cost when
the (time) deadline of the computation D is fixed. We convert this problem into the standard Multiple-Choice Knapsack Problem via a
parallel transformation. Our empirical studies verify the proposed optimal algorithms and show the efficiencies of the greedy algorithms
in cost-effectiveness to distribute the budget for performance optimizations of the MapReduce workflows.
Index Terms—MapReduce scheduling, cost and time constraints, optimal greedy algorithm, optimal parallel scheduling algorithm,
dynamic programming, Cloud computing
✦
1 INTRODUCTION
DUE to their abundant on-demand computing re-
sources and elastic billing models, clouds have
emerged as a promising platform to address various
data processing and task computing problems [1]–[4].
MapReduce [5], characterized by its remarkable sim-
plicity, fault tolerance, and scalability, is becoming a
popular programming framework to automatically par-
allelize large scale data processing as in web indexing,
data mining [6], and bioinformatics [7]. MapReduce is
extremely powerful and runs fast for various application
areas.
Since a cloud supports on-demand “massively par-
allel” applications with loosely coupled computational
tasks, it is amenable to the MapReduce framework and
thus suitable for the MapReduce applications in different
areas. Therefore, many cloud infrastructure providers
(CIPs) have deployed the MapReduce framework on
their commercial clouds as one of their infrastructure
services (e.g., Amazon Elastic MapReduce (Amazon
EMR) [8]). Often, some cloud service providers (CSPs)
• Y. Wang is with the Faculty of Computer Science, University of New
Brunswick, Fredericton, Canada, E3B 5A3.
E-mail: {ywang8@unb.ca}.
• W. Shi is with the Faculty of Business and I.T., University of Ontario
Institute of Technology, Ontario, Canada, H1L 7K4.
E-mail: {wei.shi@uoit.ca}.
also offer their own MapReduce as a Service (MRaaS)
which is typically set up as a kind of Software as a
Service (SaaS) on the owned or provisioned MapRe-
duce clusters of cloud instances (e.g., Microsofts Apache
Hadoop on Windows Azure Services [9], Hadoop on
Google Cloud Platform [10] and Teradata Aster Discov-
ery Platform [11]). Traditionally, these cloud instances
are composed of a homogeneous set of commodity hard-
ware multiplexed by virtualization technology. However,
with the advance of computing technologies and the
ever-growth of diverse requirements of end-users, a het-
erogeneous set of resources that take advantages of dif-
ferent network accelerators, machine architectures, and
storage hierarchies allow clouds to be more beneficial
to the deployments of the MapReduce framework for
various applications [12], [13].
Clearly, for CSPs to reap the benefits of such a deploy-
ment, many challenging problems have to be addressed.
However, most current studies focus solely on system
issues pertaining to deployment, such as overcoming the
limitations of the cloud infrastructure to build-up the
framework [14], [15], evaluating the performance harm
from running the framework on virtual machines [16],
and other issues in fault tolerance [17], reliability [18],
data locality [19], etc. We are also aware of some recent
research tackling the scheduling problem of MapReduce
as well as the heterogeneity in clouds [12], [20]–[25].
Some contributions mainly address the scheduling is-
2. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 2
sues with various concerns placed on dynamic load-
ing [21], energy reduction [23], task-slot assignment [26],
and network performance [24] while others optimize
the MapReduce framework for heterogeneous Hadoop
clusters with respect to data placements [25], resource
utilization [27], and performance modelling [28].
To the best of our knowledge, prior work squarely on
optimizing the scheduling of a batch of MapReduce jobs
with budget constraints at task level in heterogeneous
clusters is quite few [29]. In our opinion two major
factors that may account for this status quo. First, as
mentioned above, the MapReduce service, like other
basic database and system services, could be provided
as an infrastructure service by CIPs (e.g., Amazon),
rather than CSPs. Consequently, it would be charged
together with other infrastructure services. Second, some
properties of the MapReduce framework (e.g., automatic
fault tolerance with speculative execution [12]) make it
difficult for CSPs to track job execution in a reasonable
way, thus making scheduling very complex.
Since cloud resources are typically provisioned on de-
mand with a ”pay-as-you-go” billing model, cloud-based
applications are usually budget driven. Consequently, in
practice, the effective use of resources to satisfy relevant
performance requirements within budget is always a
pragmatic concern for CSPs.
In this paper, we investigate the problem of schedul-
ing a batch of MapReduce jobs as a workflow within
budget and deadline constraints. This workflow could
be an iterative MapReduce job, a set of independent
MapReduce jobs, or a collection of jobs related to some
high-level applications such as Hadoop Hive [30]. We
address task-level scheduling, which is fine grained com-
pared to the frequently-discussed job-level scheduling,
where the scheduled unit is a job instead of a task. More
specifically, we focus on the following two optimiza-
tion problems (whose solutions are of particular interest
to CSPs intending to deploy MRaaS on heterogeneous
cloud instances in a cost-effective way):
1) Given a fixed budget B, how to efficiently select
the machine from a candidate set for each task so
that the total scheduling length of the workflow is
minimum without breaking the budget;
2) Given a fixed deadline D, how to efficiently select
the machine from a candidate set for each task
so that the total monetary cost of the workflow is
minimum without missing the deadline;
At first sight, both problems appear to be mirror
cases of one another: solving one may be sufficient to
solve the other. However, we will show that there are
still some asymmetries in their solutions. In this paper,
we focus mainly on the first problem, and then briefly
discuss the second. To solve the fixed-budget problem,
we first design an efficient in-stage greedy algorithm for
computing the minimum execution time with a given
budget for each stage. Based on the structure of this
problem and the adopted greedy approach, we then
prove the optimality of this algorithm with respect to
execution time and budget use. With these results, we
develop a dynamic programming algorithm to achieve a
global optimal solution with scheduling time of O(κB2
).
Although the optimal algorithm is efficient when B is
polynomially bounded by the number of tasks in the
workflow, the quadratic time complexity is still high. To
improve the efficiency, we further develop two greedy al-
gorithms, called Global Greedy Budget (GGB) and Gradual
Refinement (GR), each having different greedy strategies.
Specifically, in GGB we extend the idea of the in-stage
greedy algorithm to the efficient global distribution of
the budget with minimum scheduling length as a goal
whilst in GR we iteratively run a DP-based algorithm to
distribute exponentially reduced budget in the workflow
so that the final scheduling length could be gradually
refined. Our evaluations reveal that both the GGB and
GR algorithms, each exhibiting a distinct advantage over
the other, are very close to the optimal algorithm in
terms of scheduling lengths but entail much lower time
overhead.
In contrast, a solution to the second problem is rel-
atively straightforward as we can reduce it into the
standard multiple-choice knapsack (MCKS) problem [31],
[32] via a parallel transformation. Our results show
that the two problems can be efficiently solved if the
total budget B and the deadline D are polynomially
bounded by the number of tasks and the number of
stages, respectively in the workflow, which is usually
the case in practice. Our solutions to these problems
facilitate the deployment of the MapReduce framework
as a MRaaS for CSPs to match diverse user requirements
(again with budget or deadline constraints) in reality.
The rest of this paper is organized as follows: in
Section 2, we introduce some background knowledge
regarding the MapReduce framework and survey some
related work. Section 3 presents our problem formula-
tion. The proposed budget-driven and time-constrained
algorithms are discussed in Section 4. We follow with
the the results of our empirical studies in Section 5 and
conclude the paper in Section 6.
2 BACKGROUND AND RELATED WORK
The MapReduce framework was first advocated by
Google in 2004 as a programming model for its in-
ternal massive data processing [33]. Since then it has
been widely discussed and accepted as the most pop-
ular paradigm for data intensive processing in different
contexts. Therefore there are many implementations of
this framework in both industry and academia (such as
Hadoop [34], Dryad [35], Greenplum [36]), each with its
own strengths and weaknesses.
Since Hadoop MapReduce is the most popular open
source implementation, it has become the de facto re-
search prototype on which many studies are conducted.
We thus use the terminology of the Hadoop community
in the rest of this paper, and focus here mostly on related
work built using the Hadoop implementation.
3. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 3
Fig. 1: MapReduce framework.
From an abstract viewpoint, a MapReduce job essen-
tially consists of two sets of tasks: map tasks and reduce
tasks, as shown in Fig. 1. The executions of both sets
of tasks are synchronized into a map stage followed by
a reduce stage. In the map stage, the entire dataset is
partitioned into several smaller chunks in forms of key-
value pairs, each chunk being assigned to a map node for
partial computation results. The map stage ends up with
a set of intermediate key-value pairs on each map node,
which are further shuffled based on the intermediate
keys into a set of scheduled reduce nodes where the
received pairs are aggregated to obtain the final results.
For an iterative MapReduce job, the final results could be
tentative and further partitioned into a new set of map
nodes for the next round of the computation. A batch of
MapReduce jobs may have multiple stages of MapRe-
duce computation, each stage running either map or
reduce tasks in parallel, with enforced synchronization
only between them. Therefore, the executions of the jobs
can be viewed as a fork&join workflow characterized
by multiple synchronized stages, each consisting of a
collection of sequential or parallel map/reduce tasks.
An example of such a workflow is shown in Fig. 2
which is composed of 4 stages, respectively with 8,
2, 4 and 1 (map or reduce) tasks. These tasks are to
be scheduled on different nodes for parallel execution.
However, in heterogeneous clouds, different nodes may
have different performance and/or configuration spec-
ifications, and thus may have different service rates.
Therefore, because resources are provisioned on-demand
in cloud computing, the CSPs are faced with a general
practical problem: how are resources to be selected and
utilized for each running task in a cost-effective way?
This problem is, in particular, directly relevant to CSPs
wanting to compute their MapReduce workloads, espe-
cially when the computation budget is fixed.
Hadoop MapReduce is made up of an execution
runtime and a distributed file system. The execution
runtime is responsible for job scheduling and execution.
It is composed of one master node called JobTracker and
multiple slave nodes called TaskTrackers. The distributed
file system, referred to as HDFS, is used to manage task
and data across nodes. When the JobTracker receives a
Fig. 2: A 4-stage MapReduce workflow.
submitted job, it first splits the job into a number of map
and reduce tasks and then allocates them to the Task-
Trackers, as described earlier. As with most distributed
systems, the performance of the task scheduler greatly
affects the scheduling length of the job, as well as, in our
particular case, the budget consumed.
Hadoop MapReduce provides a FIFO-based default
scheduler at job level, while at task level, it offers
developers a TaskScheduler interface to design their
own schedulers. By default, each job will use the whole
cluster and execute in order of submission. In order
to overcome this inadequate strategy and share fairly
the cluster among jobs and users over time, Facebook
and Yahoo! leveraged the interface to implement Fair
Scheduler [37] and Capacity Scheduler [38], respectively.
Beyond fairness, there exists additional research on the
scheduler of Hadoop MapReduce aiming at improving
its scheduling policies. For instance, Hadoop adopts
speculative task scheduling to minimize the slowdown in
the synchronization phases caused by straggling tasks in
a homogeneous environment [34].
To extend this idea to heterogeneous clusters, Zaharia
et al. [12] proposed the LATE algorithm. But this algo-
rithm does not consider the phenomenon of dynamic
loading, which is common in practice. This limitation
was studied by You et al. [21] who proposed a load-
aware scheduler. Zaharia’s work on a delay scheduling
mechanism [19] to improve data locality with relaxed
fairness is another example of research on Hadoop’s
scheduling. There is also, for example, work on power-
aware scheduling [39], deadline constraint scheduling [40],
and scheduling based on automatic task slot assign-
ments [41]. While these contributions do address differ-
ent aspects of MapReduce scheduling, they are mostly
centred on system performance and do not consider
budget, which is our main focus.
Budget constraints have been considered in studies
focusing on scientific workflow scheduling on HPC
platforms including the Grid and Cloud [42]–[44]. For
example, Yu et al. [42] discussed this problem based
on service Grids and presented a QoS-based workflow
scheduling method to minimize execution cost and yet
meet the time constraints imposed by the user. In the
4. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 4
same vein, Zeng et al. [43] considered the executions of
large scale many-task workflows in clouds with budget
constraints. They proposed ScaleStar, a budget-conscious
scheduling algorithm to effectively balance execution
time with the monetary costs. Now recall that, in the
context of this paper, we view the executions of the
jobs as a fork&join workflow characterized by multi-
ple synchronized stages, each consisting of a collection
of sequential or parallel map/reduce tasks. From this
perspective, the abstracted fork&join workflow can be
viewed as a special case of general workflows. However,
our focus is on MapReduce scheduling with budget and
deadline constraints, rather than on general workflow
scheduling. Therefore, the characteristics of MapReduce
framework are fully exploited in the designs of the
scheduling algorithms.
A more recent work that is highly related to ours is
the Dynamic Priority Task Scheduling algorithm (DPSS) for
heterogeneous Hadoop clusters [29]. Although this algo-
rithm also targets at task-level scheduling with budget
optimization as a goal, it is different from ours in two
major aspects. First, DPSS is designed to allow capacity
distribution across concurrent users to change dynam-
ically based on user preferences. In contrast, our algo-
rithm assume sufficient capacities, each with different
prices, for task scheduling and the goal is to minimize
the scheduling length (budget) within the given budget
(deadline). Second, DPSS optimizes the budget on per-
job basis by allowing users to adjust their spending over
time whereas our algorithms optimize the scheduling of
a batch of jobs as a whole. Therefore, our algorithms
and DPSS can complement to each other for different
requirements.
3 PROBLEM FORMULATION
We model a batch of MapReduce job as a multi-stage
fork&join workflow that consists of κ stages (called
a κ-stage job), each stage j having a collection of
independent map or reduce tasks, denoted as Jj =
{Jj0, Jj1, ..., Jjnj }, where 0 ≤ j < κ, and nj + 1 is the
size of stage j. In a cloud, each map or reduce task may
be associated with a set of machines provided by cloud
infrastructure providers to run this task, each with pos-
sibly distinct performance and configuration and thus
having different charge rates. More specifically, for Task
Jjl, 0 ≤ j < κ and 0 ≤ l ≤ nj the available machines
and corresponding prices (service rates) are listed in
Table 1 where tu
jl, 1 ≤ u ≤ mjl represents the time to
TABLE 1: Time-price table of task Jjl
t1
jl t2
jl ... t
mjl
jl
p1
jl p2
jl ... p
mjl
jl
run task Jjl on machine Mu whereas pu
jl represents the
corresponding price for using that machine, and mjl
is the total number of the machines that can run Jjl,
the values of these variables could be determined by
the VM power and the computational loads of each
task. Here, for the sake of simplicity, we assume that
there are sufficient resources for each task in the Cloud,
which implies that a machine is never competed by
more than one tasks, and its allocation is charged based
on the actual time it is used based on a fixed service
rate. Although there some studies on dynamic pricing to
maximize revenue [45], static pricing is still the dominant
strategy today. Therefore, we believe this model is accu-
rate to reflect the on-demand provisioning and billing of
the cloud resources.
Without loss of generality, we further assume that
times have been sorted in increasing order and prices
in decreasing order, and furthermore, that both time
and price values are unique in their respective sorted
sequence. These assumptions are reasonable since given
any two machines with same run time for a task, the
expensive one should never be selected. Similarly, given
any two machines with same price for a task, the slow
machine should never be chosen.
Note that we do not model the communication cost
inherent to these problems since, in our particular case,
communication between the map/reduce tasks is manip-
ulated by the MapReduce framework via the underlying
network file systems, and transparent to the scheduler.1
For clarity and quick reference, we provide in Table 2 a
summary of some symbols frequently used hereafter.
3.1 Budget Constraints
Given budget Bjl for task Jjl, the shortest time to finish
it, denoted as Tjl(Bjl) is defined as
Tjl(Bjl) = tu
jl pu+1
jl < Bjl < pu−1
jl (1)
Obviously, if Bjl < p
mjl
jl , Tjl(Bjl) = +∞.
The time to complete a stage j with budget Bj, de-
noted as Tj(Bj), is defined as the time consumed when
the last task in that stage completes within the given
budget:
Tj (Bj) = max
l∈[0,nj ] Bjl≤Bj
{Tjl(Bjl)} (2)
For fork&join, one stage cannot start until its imme-
diately preceding stage has terminated. Thus the total
makespan within budget B to complete the workflow is
defined as the sum of all stages’ times. Our goal is to
minimize this time within the given budget B.
T (B) = min
j∈[0,κ) Bj ≤B
j∈[0,κ)
Tj(Bj) (3)
3.2 Deadline Constraints
Given deadline Dj for stage j, the minimum cost to
finish stage j is
Cj(Dj ) =
l∈[0,nj ]
Cjl(Dj )
(4)
1. In practice the tasks in workflow computations usually commu-
nicate with each other via the file systems in the Cloud.
5. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 5
TABLE 2: Notation frequently used in model and algo-
rithm descriptions
Symbol Meaning
κ the number of stages
Jji the ith task in stage j
Jj task set in stage j
nj the number of tasks in stage j
n the total number of tasks in the workflow
tu
jl the time to run task Jjl on machine Mu
pu
jl the cost rate for using Mu
mjl the total number of the machines that can run Jjl
m the total size of time-price tables of the workflow
Bjl the budget used by Jjl
B the total budget for the MapReduce job
Tjl(Bjl) the shortest time to finish Jjl given Bjl
Tj(Bj ) the shortest time to finish stage j given Bj
T (B) the shortest time to finish job given B
Dj the deadline to stage j
Cjl(Dj ) the minimum cost of Jjl in stage j within Dj
C(D) the minimum cost to finish job within D
where Cjl(Dj) is the minimum cost to finish Jjl in stage
j within Dj. Note that we require t1
jl ≤ Dj ≤ t
mjl
jl .
Otherwise Cjl(Dj) = +∞. Finally, our optimization
problem can be written as
C(D) = min
j∈[1,k] Dj ≤D
j∈[0,κ)
Cj(Dj) (5)
Some readers may question the feasibility of this
model since the number of stages and the number of
tasks in each stage need to be known a prior to the
scheduler. But, in reality, it is entirely possible since
a) the number of map tasks for a given job is driven
by the number of input splits (which is known to the
scheduler) and b) the number of reduce tasks can be
preset as with all other parameters (e.g., parameter
mapred.reduce.tasks in Hadoop). As for the number
of stages, it is not always possible to predefine it for
MapReduce workflows. This is the main limitation of
our model. But under the default FIFO job scheduler,
we can treat a set of independent jobs as a single
fork&join workflow. Therefore, we believe our model is
still representative of most cases in reality.
4 BUDGET-DRIVEN ALGORITHMS
In this section, we propose our task-level scheduling
algorithms for MapReduce workflows with the goals
of optimizing Equations (3) and (5) under respective
budget and deadline constraints. We first consider the
optimization problem under budget constraint where an
in-stage local greedy algorithm is designed and com-
bined with dynamic programming techniques to obtain
an optimal global solution. To overcome the inherent
complexity of the optimal solution, we also present
two efficient greedy algorithms, called Global-Greedy-
Budget algorithm (GGB) and Gradual-Refinement algo-
rithm (GR). With these results, we then briefly discuss
Algorithm 1 In-stage greedy distribution algorithm
1: procedure Tj(nj , Bj) ⊲ Dist. Bj among Jj
2: B′
j = Bj − l∈[0,nj] p
mjl
jl
3: if B′
j < 0 then return (+∞)
4: end if
5: ⊲ Initialization
6: for Jjl ∈ Jj do ⊲ O(nj )
7: Tjl ← t
mjl
jl ⊲ record exec. time
8: Bjl ← p
mjl
jl ⊲ record budget dist.
9: Mjl ← mjl ⊲ record assigned machine.
10: end for
11: while B′
j ≥ 0 do ⊲ O(
Bj log nj
min0≤l≤nj
{δjl}
)
12: jl∗
← arg max
l∈[0,nj]
{Tjl} ⊲ get the slowest task
13: u ← Mjl∗
14: if u = 1 then
15: return (Tjl∗ )
16: end if
17: ⊲ Lookup matrix in Table 1
18: < pu−1
jl∗ , pu
jl∗ >← Lookup(Jjl∗ , u − 1, u)
19: δjl∗ ← pu−1
jl∗ − pu
jl∗
20: if B′
j ≥ δjl∗ then ⊲ reduce Jjl∗ ’s time
21: B′
j ← B′
j − δjl∗
22: ⊲ Update
23: Bjl∗ ← Bjl∗ + δjl∗
24: Tjl∗ ← tu−1
jl
25: Mjl∗ ← u − 1
26: else
27: return (Tjl∗ )
28: end if
29: end while
30: end procedure
the second optimization problem with respect to the
deadline constraints.
4.1 Optimization under Budget Constraints
The proposed algorithm should be able of distributing
the budget among the stages, and in each stage distribut-
ing the assigned budget to each constituent task in an
optimal way. To this end, we design the algorithm in
two steps:
1) Given budget Bj for stage j, distribute the budget
to all constituent tasks in such a way that Tj(Bj)
is minimum (see Equation (2)). Clearly, the com-
putation for each stage is independent of other
stages. Therefore such computations can be treated
in parallel using κ machines.
2) Given budget B for a workflow and the results in
Equation (2), optimize our goal of Equation (3).
4.1.1 In-Stage Distribution
To address the first step, we develop an optimal in-stage
greedy algorithm to distribute budget Bj between the
nj + 1 tasks in such a way that Tj(Bj) is minimized.
Based on the structure of this problem, we then prove
the optimality of this local algorithm.
6. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 6
The idea of the algorithm is simple. To ensure that
all the tasks in stage j have sufficient budget to finish
while minimizing Tj(Bj), we first require B′
j = Bj −
l∈[0,nj] p
mjl
jl ≥ 0 and then iteratively distribute B′
j in
a greedy manner each time to a task whose current
execution time determines Tj(Bj) (i.e., the slowest one).
This process continues until no sufficient budget is left.
Algorithm 1 shows the pseudo code of this algorithm.
In this algorithm, we use three profile variables Tjl, Bjl
and Mjl for each task Jjl to record respectively its
execution time, assigned budget, and the selected ma-
chine (Lines 6-10). After setting these variables with
their initial values, the algorithm enters into its main
loop to iteratively update the profile variables associated
with the current slowest task (i.e., Jjl∗ ) (Lines 11-30).
By searching the time-price table of Jjl∗ (i.e., Table 1),
the Lookup function can obtain the costs of the machines
indexed by its second and third arguments. Each time,
the next faster machine (i.e., u−1) is selected when more
δjl∗ is paid. The final distribution information is updated
in the profile variables (Lines 18-28).
Theorem 4.1: Given budget Bj for stage j having nj
tasks, Algorithm 1 yields the optimal solution to the
distribution of the budget Bj to all the nj tasks in that
stage within time O(
Bj log nj
min0≤l≤nj
{δjl} + nj).
Proof: Given budget Bj allocated to stage j, by
following the greedy algorithm, we can obtain a solution
∆j = {b0, b1, ..., bnj } where bl is the budget allocated
to task Jjl, 0 ≤ l ≤ nj. Based on this sequence, we
can further compute the corresponding finish time se-
quence of the tasks as t0, t1, ..., tnj . Clearly, there exists
k ∈ [0, nj] that determines the stage completion time to
be Tj(Bj) = tk.
Suppose ∆∗
j = {b∗
0, b∗
1, ..., b∗
nj
} is an optimal so-
lution and its corresponding finish time sequence is
t∗
0, t∗
1, ..., t∗
nj
. Given budget Bj, there exists k′
∈ [0, nj]
satisfying T ∗
j (Bj) = t∗
k′ . Obviously, tk ≥ t∗
k′ . In the
following we will show that, necessarily, tk = t∗
k′ .
To this end, we consider two cases:
1) If for ∀l ∈ [0, nj], t∗
l ≤ tl, then we have b∗
l ≥ bl. This
is impossible because given b∗
l ≥ bl for ∀l ∈ [0, nj],
the greedy algorithm would have sufficient budget
≥ b∗
k − bk to further reduce tk of taskk, which is
contradictory to Tj(Bj) = tk, unless b∗
k = bk, but in
this case, T ∗
j (Bj) will be tk, rather than t∗
k. Thus,
case 1 is indeed impossible.
2) Given the result in 1), there must exist i ∈ [0, nj]
that satisfies ti < t∗
i . This indicates that in the pro-
cess of the greedy choice, taski is allocated budget
to reduce the execution time at least from t∗
i to ti
and this happens no later than when Tj(Bj) = tk.
Therefore, we have t∗
i ≥ tk ≥ t∗
k ≥ t∗
i , then tk = t∗
k′ .
Overall, tk = t∗
k′ , that is, the algorithm making the
greedy choice at every step does produce an optimal
solution.
The (scheduling) time complexity of this algorithm is
straightforward. It consists of the overhead in initializa-
tion (Lines 6-10) and the main loop to update the profile
variables (Lines 11-30). Since the size of Jj is nj, the ini-
tialization overhead is O(nj). If we adopt some advanced
data structure to organize Tjl, 0 ≤ l ≤ nj for efficient
identification of the slowest task, l∗
can be obtained
within O(log nj) (Line 12). On the other hand, there are
at most O(
Bj
min0≤l≤nj
{δjl} ) iterations (Line 11). Overall, we
have the time complexity of O(
Bj log nj
min0≤l≤nj
{δjl} + nj).
Since all the κ stages can be computed in parallel, the
total time complexity for the parallel pre-computation is
O( max
j∈[0,κ)
{
Bj log nj
min0≤l≤nj {δjl}
+ nj}).
Given Theorem 4.1, we immediately obtain the follow-
ing corollary, which is a direct result of the first case in
the proof of Theorem 4.1.
Corollary 4.2: Algorithm 1 minimizes the budget to
achieve the optimal stage execution time.
To illustrate the algorithm, Fig. 3 shows an example
where a stage has four (map/reduce) tasks, each being
able to run on 3 or 4 candidate machines with differ-
ent prices and anticipated performance. For instance,
in Fig. 3(a), t3 is the execution time of task0 running
on a certain machine. After paying extra delta3, the
task can be shifted to the next faster machine with t2
as the execution time. Since the stage completion time
is determined by the slowest task. The algorithm first
invests some budget to task2, allowing it to move to the
next faster machine (Fig. 3(a)). Unfortunately, after this
investment, task2 is still the slowest one, then the algo-
rithm continues to invest budget to this task so that it
can run on the next faster machine (Fig. 3(b)). Eventually
task2 is no longer the slowest task and instead task3 is.
Consequently, the algorithm starts to allocate budget to
task3 (Fig. 3(c)) in order to minimize its execution time.
This process can be repeated until no more budget is
left over to invest (Fig. 3(d)). At that time, the algorithm
completes and the minimum stage execution time under
the given budget constraint is computed.
4.1.2 Global Distribution
Now we consider the second step. Given the results
of Algorithm 1 for all the κ stages, we try to use a
dynamic programming recursion to compute the global
optimal result. To this end, we use T (j, r) to represent
the minimum total time to complete stages indexed from
j to κ when budget r is available, and have the following
recursion (0 < j ≤ κ, 0 < r ≤ B):
T (j, r) =
min
0<q≤r
{Tj(nj , q) + T (j + 1, r − q)} if j < κ
Tj (nj, r) if j = κ
(6)
where the optimal solution can be found in T (1, B). The
scheduling scheme can be reconstructed from T (1, B)
by recursively backtracking the Dynamic Programming
(DP) matrix in (6) up to the initial budget distribution
at stage κ which can, phase by phase, steer to the final
optimal result. To this end, in addition to the time value,
we only store the budget q and the index of the previous
7. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 7
0 1 2 3 4 5
Task
0
4
8
12
16
20
24TaskExecutionTime
t3
t2
t1
t0
delta3
delta2
delta1
remove
(a)
0 1 2 3 4 5
Task
0
4
8
12
16
20
24
TaskExecutionTime
remove
(b)
0 1 2 3 4 5
Task
0
4
8
12
16
20
24
TaskExecutionTime
remove
(c)
0 1 2 3 4 5
Task
0
4
8
12
16
20
24
TaskExecutionTime
stage time
(d)
Fig. 3: An illustrative example of the in-stage greeddy algorithm on budget distribution
stage (i.e., T (j + 1, r − q)) in each cell of the matrix
since, given the budget for each stage, we can simply
use Algorithm 1 to recompute the budget distribution.
Theorem 4.3: Given budget B for a κ-stage MapReduce
job, each stage j having nj tasks, Recursion (6) yields
an optimal solution to the distribution of budget B to
all the κ stages with time complexity O(κB2
) when
Tj(nj, q), 0 < j ≤ κ, 0 < q ≤ B is pre-computed.
Otherwise, O(nB3
) is required if computed online.
Proof: We prove this by induction on the number of
stages (κ). Let the number of stages, κ = 1. Clearly, given
budget r, the optimal solution is obtained by Tj(nj, r).
Suppose there are κ stages. Consider stages j and j + 1.
As an induction hypothesis, let T (j + 1, p) be an optimal
solution to stages from j + 1 to κ given budget p. We
will show that T (j, r) is an optimal solution to stages
from j to κ under budget constraint r. In order to find
the optimal distribution of the budget r among κ − j + 1
stages, we need to consider all possibilities. To this end,
we assign q units to the first stage j and the remaining
r−q units to the leftover stages from j+1 to κ, and allow
q to be varied in the range of (0, r]. Clearly, the recursion
chooses the minimum of all these, thus serving all the
stages from j to κ using minimum time.
Finally, at stage 1, since there are no more previous
stage, the recursion (6) yields the optimal result T (1, B)
for the workflow.
There are O(κB) elements in the DP matrix (6).
For each element, the computation complexity is
at most O(B) when Tj(nj, q), 0 < q ≤ r have
been pre-computed. Therefore, the total time com-
plexity is O(κB2
). Otherwise, it would be written
as B(
κ
j=1
B
q=0(
q log nj
min0≤l≤nj
{δjl} + nj)), which is upper
bounded by O(nB3
) given n = κ
j=1 nj.
4.2 Efficiency Improvements
In the previous subsection, we presented an optimal
solution to the distribution of a given budget among
different stages to minimize the workflow execution
time. The time complexity of the proposed algorithm is
pseudo-polynomial and proportional to the square of the
budget, which is fairly high. To address this problem, we
now propose two heuristic algorithms that are based on
different greedy strategies. The first one, called Global
Greedy Budget (GGB) extends the idea of Algorithm 1
in computing Tj(nj, Bj) to the whole multi-stage work-
flow. The second one, called Gradual Refinement (GR), is
built upon the recursion (6) in an iterative way, each
iteration using different budget allocation granularities
to compute the DP matrix so as to gradually refine the
final results. Each algorithm offers a specific advantage
over the other one. However, our empirical studies show
that both are very close to the optimal results in terms of
scheduling lengths but enjoy much lower time overhead.
4.2.1 Global-Greedy-Budget Algorithm (GGB)
This algorithm applies the idea of Algorithm 1 with some
extensions to the selection of candidate tasks for budget
assignments across all the stages of the workflow. The
pseudo code of GGB is shown in Algorithm 2. Similar
to Algorithm 1, we also need to ensure the given budget
has a lower bound j∈[1,κ] Bj where B′
j = l∈[0,nj] p
mjl
jl
that guarantees the completion of the workflow (Lines
2-3). We also use the three profile variables Tjl, Bjl and
Mjl for each task Jjl in stage j to record its execution
time, assigned budget, and selected machine (Lines 6-
12).
Since in each stage, the slowest task determines the
stage completion time, we first need to allocate the
budget to the slowest task in each stage. After the
slowest task is allocated, the second slowest will become
the bottleneck. In our heuristic, we must consider this
fact. To this end, we first identify the slowest and the
second slowest tasks in each stage j, which are indexed
by jl and jl′
, respectively. Then we gather these index
pairs in a set L thereby determining which task in L
should be allocated budget (Lines 14-18). To measure the
quality of a budget investment, we define a utility value,
vu
jl, for each given task Jjl, which is a value assigned to
an investment on the basis of anticipated performance:
2
vu
jl = αβj + (1 − α)β′
j (7)
where βj =
tu
jl−tu′
jl′
pu−1
jl −pu
jl
≥ 0, β′
j =
tu
jl−tu−1
jl
pu−1
jl −pu
jl
≥ 0, and α is
defined as:
α =
1 if κ
j=1 βj > 0
0 Otherwise
(8)
2. Recall that the sequences of tu
jl and pu
jl are sorted, respectively in
Table 1.
8. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 8
Algorithm 2 Global-Greedy-Budget Algorithm (GGB)
1: procedure T (1, B) ⊲ Dist. B among κ stages
2: B′
= B − j∈[1,κ] B′
j ⊲ B′
j = l∈[0,nj ] p
mjl
jl
3: if B′
< 0 then return (+∞)
4: end if ⊲ No sufficient budget!
5: ⊲ Initialization
6: for j ∈ [1, κ] do ⊲ O( κ
j=1 nj ) = # of tasks
7: for Jjl ∈ Jj do
8: Tjl ← t
mjl
jl ⊲ record exec. time
9: Bjl ← p
mjl
jl ⊲ record budget dist.
10: Mjl ← mjl ⊲ record assigned machine index.
11: end for
12: end for
13: while B′
≥ 0 do ⊲ ≤ O( B
min1≤j≤κ,0≤l≤nj
{δjl}
)
14: L ← ∅
15: for j ∈ [1, κ] do ⊲ O( κ
j=1 log nj )
16: < jl, jl′
>∗
← arg max
l∈[0,nj ]
{Tjl(Bjl)}
17: L ← L ∪ {< jl, jl′
>∗
} ⊲ |L| = κ
18: end for
19: V ← ∅
20: for < jl, jl′
>∈ L do ⊲ O(κ)
21: u ← Mjl
22: if u > 1 then
23: < pu−1
jl , pu
jl >← Lookup(Jjl, u − 1, u)
24: vu
jl ← αβj + (1 − α)β′
j
25: V ← V ∪ {vu
jl} ⊲ |V | ≤ κ
26: end if
27: end for
28: while V = ∅ do ⊲ O(κ log κ)
29: ⊲ sel. task with max. u.value
30: jl∗
← arg max
vu
jl
∈V
{vu
jl}
31: u ← Mjl∗ ⊲ Lookup matrix in Table 1
32: δjl∗ ← pu−1
jl∗ − pu
jl∗ ⊲ u > 1
33: if B′
≥ δjl∗ then ⊲ reduce Jjl∗ ’s time
34: B′
← B′
− δjl∗
35: Bjl∗ ← Bjl∗ + δjl∗
36: Tjl∗ ← tu−1
jl∗
37: Mjl∗ ← u − 1
38: break ⊲ restart from scratch
39: else
40: V ← V {vu
jl∗ } ⊲ select the next one in V
41: end if
42: end while
43: if V = ∅ then
44: return ⊲ Bj = l∈[0,nj ] Bjl
45: end if
46: end while
47: end procedure
βj represents time saving on per-budget unit when task
Jjl is moved from machine u to run on the next faster
machine u − 1 in stage j (βj > 0) while β′
j is used when
there are multiple slowest tasks in stage j (βj = 0). α is
defined to allow βj to have a higher priority than β′
j in
task selection. Put simply, unless for ∀j ∈ [1, κ], βj = 0 in
which case β′
j is used, we use the value of βj, j ∈ [1, κ]
as the criteria to select the allocated tasks.
In the algorithm, all the values of the tasks in L are
collected into a set V (Lines 19-28). We note that the
tasks running on machine u = 1 in each stage have no
definition of this value since they are already running on
the fastest machine under the given budget (and thus no
further improvement is available).
Given set V , we can iterate over it to select the task in
V that has the largest utility value, indexed by jl∗
, to be
allocated budget for minimizing the stage computation
time (Lines 29-30). We fist obtain the machine u to which
the selected task is currently mapped and then compute
the extra monetary cost δjl∗ if the task is moved from
u to the next faster machine u − 1 (Lines 31-32). If the
leftover budget B′
is insufficient, the selected task will
not be considered and removed from V (Line 40). In the
next step, a task in a different stage will be selected for
budget allocation (given each stage has at most one task
in V ). This process will be continued until either the
leftover budget B′
is sufficient for a selected task or V
becomes empty. In the former case, δjl∗ will be deducted
from B′
and added to the select task. At the same time,
other profile information related to this allocation is also
updated (Lines 33-37). After this, the algorithm exits
from the loop and repeats the computation of L (Line
13) since L has been changed due to this allocation. In
the latter case, when V becomes empty, the algorithm
returns directly, indicating that the final results of the
budget distribution and the associated execution time of
each tasks in each stage are available as recorded in the
corresponding profile variables.
Theorem 4.4: The time complexity of GGB is not
greater than O(B(n + κ log κ)). In particular, when n ≥
κ log κ, the complexity of GGB is upper bounded by
O(nB).
Proof: The time complexity of this algorithm is
largely determined by the nested loops (Lines 13-
42). Since each allocation of the budget B′
is at
least min1≤≤κ,0≤l≤nj {δjl}, the algorithm has at most
O( B
min{δjl} ), 1 ≤ j ≤ κ, 0 ≤ l ≤ nj iterations at Line
13. On the other hand, if some advanced data structure
such as a priority queue is used to optimize the search
process, the algorithm can achieve a time complexity
of O(
κ
j=1 log nj) at Line 15 and O(κ log κ) at Line 29.
Therefore, the overall time complexity can be written as
O(n +
B
min{δjl}
(
κ
j=1
log nj + κ log κ)) < O(B(n + κ log κ))
(9)
where δjl = pu−1
jl − pu
jl, 1 ≤ j ≤ κ, 0 ≤ l ≤ nj and n =
κ
j=1 nj the total number of tasks in the workflow. Here,
we leverage the fact that log n < n. Obviously, when n ≥
κ log κ, which is reasonable in multi-stage MapReduce
jobs, we obtain a time complexity of O(nB).
4.2.2 Gradual Refinement Algorithm (GR)
Given the results of the per-stage and global budget
distributions, in this subsection we propose the GR
algorithm to drastically reduce time complexity in most
cases.
9. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 9
Algorithm 3 Gradual Refinement Algorithm (GR)
1: procedure T (1, B) ⊲ Dist. B among κ stages
2: B′
= B − j∈[1,κ] Bj ⊲ Bj = l∈[0,nj ] p
mjl
jl
3: if B′
< 0 then return +∞
4: end if ⊲ No sufficient budget!
5: ⊲ Update Table 1 of each task in each stage
6: ⊲ Stages is a global var.
7: for j ∈ [0, κ) do
8: T askT abs ← Stages.getStage(j)
9: for l ∈ [0, nj ] do
10: T askT abs[l].substractPrice(p
mjl
jl )
11: end for
12: end for
13: r ← 10k
14: while r ≥ 1 do
15: < C, R >←< B′
/r, B′
%r >
16: for b ∈ [0, C] do
17: T [κ − 1][b] ←< Tκ−1(nκ−1, b/r), 0 >
18: end for
19: for j ∈ [κ − 2, 0] do
20: for b ∈ [0, C] do
21: T [j][r] ←< +∞, 0 >
22: q ← 0
23: while q ≤ b do
24: t1 ← Tj(nj , q/r)
25: t2 ← T [j + 1][b − q]
26: if T [j][r].tval > t1 + t2 then
27: T [j][r] ←< t1 + t2, b − q >
28: end if
29: q + +
30: end while
31: end for
32: end for
33: b′
← constructSchedule(0, C, R)
34: r ← r/10
35: B′
← b′
⊲ b′
= κ
i=1 bi + R
36: end while
37: end procedure
This algorithm consists of two parts. First, we consider
the distribution of B′
= B − j∈[1,κ]{B′
j} instead of
B in Recursion (6), where B′
j = 0≤l≤nj
p
mjl
jl is the
lower bound of the budget of stage j. This optimization
is simple yet effective to minimize the size of the DP
matrix. Second, we optimize the selection of the size of q
to iterate over the B in (6). Instead of using a fixed value
of 1 as the indivisible cost unit, we can continuously
select 10k
, 10k−1
, ..., 1 units as the incremental budget
rates in the computation of (6), each being built upon
its immediately previous result. In this way, we can
progressively approach the optimal result while drasti-
cally reducing the time complexity. The details of the
algorithm are formally described in Algorithm 3.
After getting the remaining budget B′
, we update the
time-price table (Table 1) of each task by subtracting its
minimal service rate from each price (Lines 7-12). This
step is necessary as now we are considering the distribu-
tion of B′
instead of B. It is accomplished by accessing a
global variable Stages that stores the information of all
the stages. Then the algorithm enters a main loop (Lines
Algorithm 4 Construct scheduler and gather unused
budget
1: procedure constructSchedule(i, j, R)
2: < t, p >← T [i][j]
3: b ← j − p
4: T askT abs ← Stages.getStage(i)
5: if i = κ − 1 then
6: b′
← Ti(ni, b) ⊲ return allocated budget
7: for l ∈ [0, nj ] do
8: T askT abs[l].substractPrice(b′
)
9: end for
10: bi ← b − b′
11: R ← R + bi
12: return R
13: end if
14: b′
← Ti(ni, b)
15: for l ∈ [0, nj ] do
16: T askT abs[l].substractPrice(b′
)
17: end for
18: bi ← b − b′
19: R ← R + bi
20: return constructSchedule(i + 1, p, R)
21: end procedure
14-37). Each loop leverages Recursion (6) to compute a
DP matrix T [κ][C+1], C ← B′
/r using a different budget
rate r (initialized by 10k
). The distributed and remaining
budgets under r are stored in C and R respectively so
that they can be used in the current and the next rounds
(Line 15). In the computation of Recursion (6), we not
only keep the execution time but also store the budget
index of the previous step in each cell of the matrix
(Lines 17 and 27). This is necessary for reconstructing
the schedule, as well as gathering the allocated budget
that is not used in the current loop, and prorate to the
next round. For example, suppose given r ← 10, we
compute T (4, 70) = min{T4(2, 30)+ T (3, 40)}. If T4(2, 30)
is allocated 30 units but only uses 27, then 3 units are
left for the next round (where r ← 1).
Following the computation of the DP matrix T [κ][C +
1], the loop ends invoking constructSchedule(0, C, R)
that constructs the allocation schedule based on the
current value of r. There are two other purposes for this
construction. First, we can gather the allocated budget
that is not used in the current loop (stored in bi for stage
i). Second, we can update the time-price tables of each
stage to reflect the current optimal distribution, which
forms the basis for the next loop. This step makes the
algorithm efficient but non-optimal. The details of these
steps are shown in Algorithm 4.
In this algorithm, we first compute the budget allo-
cated to the current stage i (Lines 2-3) and then obtain
its tasks. The algorithm is recursive from stage 0 down
to stage κ − 1 where it is returned with the total unused
budget represented by R. It is worthwhile to point out
the update of the time-price tables in Lines 7-9 and Lines
15-17. The newly allocated budget (i.e., b′
) to each stage is
10. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 10
deducted from each task’s machine price (if it is not zero
in that stage) so that the current optimal allocation can
be taken as the starting point for the next round as previ-
ously discussed. Clearly, the constructSchedule function
walks over all the stages, and in each stage, it modifies
all tasks’ time-price table. Therefore, the time complexity
of this function is O(
κ
j=1
nj
l=0 log mjl) = O(m), which
is the total size of the time-price tables. (Recall that prices
in each table have been sorted.)
However, the time complexity of the GR algorithm
as a whole is difficult to analyze since it is very hard,
if not impossible, to bound the remaining budget of
each stage (i.e., bi) for each round of distributions, and
thus to determine the remaining budget of the current
round to be used in the next one. Consequently, here we
only roughly estimate complexity. To this end, we denote
B′
= ρk10k
+ ρk−110k−1
+ ρk−210k−2
+ ... + ρ0, and have
the following lemma to estimate the remaining budget
of each round.
Lemma 4.5: Given µj =
j−1
i=1
γj−i
10i , the remaining
budget in the tth round of the while loop is b′
t ≤
µkµk−1...µtρk10k
+ µk−1...µtρk−110k−1
+ ... + µtρt10t
+
ρt−110t−1
+...+ρ0 where γi ≤ 9, 0 ≤ i < k are dependent
on the allocated budget that is not used by each stage.
Proof: We prove this by induction on the number
of rounds (t) of the while loop. Initially (t = 1), given
B′
= ρk10k
+ρk−110k−1
+ρk−210k−2
+...+ρ0, we have C =
B′
/r = ρk and R = B′
%r = ρk−110k−1
+ρk−210k−2
+...+
ρ0. According to procedure constructSchedule(0, C, R),
the allocated budget that is not used is b′
k = κ
i=1 bi + R
where
κ
i=1 bi ≤
ρk
i=1(10k
− Ti(ni, 10k
)) since there are
at most C = ρk stages allocated.3
Therefore, ∃γk−1, ..., γ0,
ρk
i=1 bi ≤ Cγk−110k−1
+ Cγk−210k−2
+ ... + Cγ0 =
C(
γk−1
10 +
γk−2
102 +
γk−3
103 + ... + γ0
10k )10k
= Cµk10k
, then,
b′
k ≤Cµk10k
+ ρk−110k−1
+ ... + ρ0
=µkρk10k
+ ρk−110k−1
+ ... + ρ0
(10)
Consider rounds t and t + 1. As an induction hypoth-
esis, let b′
t ≤ µkµk−1...µtρk10k
+ µk−1...µtρk−110k−1
+
... + µtρt10t
+ ρt−110t−1
+ ... + ρ0. In the (t +
1)th round, we have C = µkµk−1...µtρk10k−t+1
+
µk−1µk−2...µtρk−110k−t
+ ... + µtρt10 + ρt−1 and R =
ρt−210t−2
+ ... + ρ0. Since at most C are allocated, we
have
ρk
i=1 bi ≤ Cγt−210t−2
+ Cγt−310t−3
+ ... + Cγ0 =
C(γt−2
10 + γt−3
102 + γt−4
103 + ... + γ0
10t−1 )10t−1
= Cµt−110t−1
,
then we have
b′
t−1 ≤Cµt−110t−1
+ ρt−210t−2
+ ... + ρ0
=(µkµk−1...µtρk10k−t+1
+ µk−1µk−2...µtρk−110k−t
+ ... + µtρt10 + ρt−1)µt−110t−1
+ ρt−210t−2
+ ... + ρ0
=µkµk−1...µtµt−1ρk10k
+ µk−1...µtµt−1ρk−110k−1
+ ... + µtµt−1ρt10t
+ µt−1ρt−110t−1
+ ... + ρt−210t−2
+ ... + ρ0
(11)
3. If ρk > κ, multiple 10k units could be assigned to the same stage.
In this case, we can split the stage into several dummy stages, each
being allocated 10k units. Then, we can follow the same arguments.
Hence, the proof.
Since ∀j, µj < 1, this lemma demonstrates that for GR,
the remaining budget in each round of the while loop is
nearly exponentially decreased. With this result, we can
have the following theorem.
Theorem 4.6: The time complexity of GR is
O( log B
t=1 (κC2
t + m) where Ct = µkµk−1...µt+1ρk10k−t
+
µk−1µk−2...µt+1ρk−110k−t−1
+ ... + ρt.
Proof: There is a total of log B rounds in the while
loop, and in each round t, we need to a) compute the
DP matrix with a size of Ct according to Recursion (6)
(which has complexity of O(κC2
t )), and then b) count the
time used in constructSchedule(0, C, R).
Ideally, if the allocated budget is fully used in each
stage (where µj = 0), the algorithm is κ
k
i=0 ρ2
i , which
is the lower bound. But in practice, the actual speed
of the algorithm is also determined by the parameter
γ sequence, which is different from stage to stage, and
from job to job. We will investigate this when discussing
our empirical studies.
4.3 Optimization under Deadline Constraints
In this section, we discuss task-level scheduling for
MapReduce workflows with the goal of optimizing
Equation (5), which pertains to deadline constraints.
Since most of the techniques we presented earlier can
be applied to this problem, the discussion is brief. We
partition the total deadline D into κ parts, denoted by
D0, D1, ..., Dκ−1 such that 0≤j<κ Dj ≤ D. For a given
deadline Dj for stage j, we must ensure that all tasks
of this stage can be finished within Dj. Thus, in order
to minimize cost, we need to select, for each task, the
machine on which the execution time of this task is
closest to Dj. Formally Cjl(Dj) = pu
jl, tu−1
jl < Dj < tu+1
jl .
(Obviously, Cjl(Dj) is the minimum cost to finish stage
j within Dj. If stage j cannot be finished within Dj,
Cjl(Dj) = +∞.) We then can compute Equation (4).
By following the same approach as in Section 4.1, we
can derive the optimal solution. However, this strategy
is not efficient since allocation to each stage, as well
as optimal distribution within each stage, cannot be
computed in a simple way.
Alternatively, we can transform this problem into the
standard MCKS problem by constructing κ classes in
parallel, each corresponding to a stage of the workflow.
The class j consists of a set of tuples (Dji, Cji) where
1 ≤ i ≤ l∈[0,nj] p
mjl
jl , representing the total minimum
cost Cji for stage j under the given Dji. These pairs are
computed as follows:
1) for each task Jjl in stage j, gather its execution time
on the candidate machines and put into set S;
2) sort S in ascending order;
3) for each element ti in S, Dji ← ti and then compute
Cjl(Dji) for each task l in stage j. (This step can
be further parallelized based on ti.)
11. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 11
1000 2000 3000 4000 5000
200
300
400
500
600
700
800
SchedulingLength Opt(4)
GGB(4)
GR(4)
Opt(8)
GGB(8)
GR(8)
1000 2000 3000 4000 5000
0
0.1
0.2
0.3
0.4
SchedulingTime(s)
0 5000 10000 15000 20000
Budget
0
500
1000
1500
2000
2500
3000
SchedulingLength
Opt(16)
GGB(16)
GR(16)
Opt(32)
GGB(32)
GR(32)
0 5000 10000 15000 20000
Budget
0
5
10
15
SchedulingTime(s)
Fig. 4: Impact of time-price table (TP) size on the scheduling length and the scheduling time (Stage:8, Task: ≤ 20/each
stage, and the numbers in the brackets represent the different TP table sizes)
The aim of the problem then becomes to pick up exactly
one tuple from each class in order to minimize the total
cost value of this pick, subject to the deadline constraint,
which is a standard multiple-choice knapsack problem
equivalent to Equation (5). To optimize the computation,
we can remove the tuple (Dji, Cji) from the class if Cji =
+∞.
5 EMPIRICAL STUDIES
To verify and evaluate the proposed algorithms and
study their performance behaviours in reality, we de-
veloped a Budget Distribution Solver (BDS) in Java (Java
1.6) that efficiently implements the algorithms for the
specified scheduling problem in Hadoop. Since the mon-
etary cost is our primary interest, in BSD we did not
consider some properties and features of the network
platforms. Rather, we focus on the factors closely related
to our research goal. In particular, how efficient the
algorithms (i.e., scheduling time) are in minimizing the
scheduling lengths of the workflow subject to different
budget constraints is our major concern. Moreover, since
the remaining budget after workflow scheduling always
reflects the profit that the MapReduce providers could
make, we also compare it between the algorithms.
The BDS accepts as an input a batch of MapReduce
jobs that are organized as a multi-stage fork&join work-
flow by the scheduler at run-time. Each task of the job is
associated with a time-price table, which is pre-defined
by the cloud providers. As a consequence, the BDS can
be configured with several parameters, including those
described time-price tables, the number of tasks in each
stage and the total number of stages in the workflow.
Since there is no model available to these parameters,
we assume that they are automatically generated in a
uniform distribution, whose results can form a baseline
for further studies. In particular, the task execution time
and the corresponding prices are assumed to be varied
in the ranges of [1, 12.5*table size] and [1, 10*table size],
respectively. The rationale behind this assumption is
twofold. First of all, the distribution of these parameters
do not have any impact on the results of our schedul-
ing algorithms. Second, big table size usually manifest
the heterogeneity of the cluster, which implies a broad
range of task execution time. Again, the table sizes are
determined by the available resources and specified by
the cloud providers in advance.
Intuitively, with the table size being increased, the
scheduler has more choices to select the candidate ma-
chines to execute a task. On the other hand, in each ex-
periment we allow the budget resources to be increased
from its lower bound to upper bound and thereby
comparing the scheduling lengths and the scheduling
time of the proposed algorithms with respect to different
configuration parameters. Here, the lower and upper
12. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 12
0 2000 4000 6000 8000
0
500
1000
1500
2000
SchedulingLength Opt(4)
GGB(4)
GR(4)
Opt(8)
GGB(8)
GR(8)
0 2000 4000 6000 8000
0
0.5
1
1.5
2
2.5
3
SchedulingTime(s)
0 5000 10000 15000 20000 25000 30000
Budget
0
1000
2000
3000
4000
5000
6000
SchedulingLength
Opt(16)
GGB(16)
GR(16)
Opt(32)
GGB(32)
GR(32)
0 5000 10000 15000 20000 25000 30000
Budget
0
50
100
150
SchedulingTime(s)
Fig. 5: Impact of the number of stages on the total scheduling length and scheduling time (Task: ≤ 20, Table Size
≤ 16, and the numbers in the brackets represent the different number of stages)
bounds are defined to be the minimal and maximal bud-
get resources, respectively, that can be used to complete
the workflow.
All the experiments are conducted by comparing the
proposed GGB and GR algorithms with the optimal
algorithm and the numerical scheduling time results
(average over five trials except for Fig. 6) are obtained
from running the scheduling algorithms on a Ubuntu
12.04 platform having a hardware configuration with a
total of 8 processors (a quadcore CPU with each core
having 2 hardware threads) activated, each with 1600
cpu MHz and 8192K cache.
5.1 Impact of Time-Price Table Size
We first evaluate the impact of the time-price table sizes
on the total scheduling length of the workflow with
respect to different budget constraints. To this end, we fix
an 8-stage workflow with at most 20 tasks in each stage,
and the size of the time-price table associated with each
task is varied by 4, 8, 16 and 32.
The results of algorithm GGB and GR compared to the
optimal algorithm are shown in Fig. 4. With the budget
increasing, for all sizes of the tables, the scheduling
lengths are super-linearly decreased. These results are
interesting and hard to make from the analysis of the
algorithm. We attribute these results to the fact that
the opportunities of reducing the execution time of
each stage are super-linearly increased with the budget
growth, especially for those large size workflows. This
phenomenon implies that the ratio performance/cost is
increased if cloud users are willing to pay more for
MapReduce computation.
This figure also provides evidence that the perfor-
mances of both GGB and GR are very close to the opti-
mal algorithm, but their scheduling times are relatively
stable and significantly less than that of the optimal
algorithm, which is quadratic in its time complexity.
These results not only demonstrate compared to the
optimal, how GGB and GR are resilient against the
changes of the table size, a desired feature in practice, but
also show how effective and efficient the proposed GGB
and GR algorithms are to achieve the best performance
for MapReduce workflows subject to different budget
constraints.
5.2 Impact of Workflow Size
In this set of experiments, we evaluate the performance
changes with respect to different workflow sizes when
the budget resources for each workflow are increased
from the lower bound to the upper bound as we defined
before. To this end, we fix the maximum number of
tasks in the MapReduce workflow to 20 in each stage,
13. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 13
1000 2000 3000 4000
0
10
20
30
40
50
RemainingBudget
Opt(4)
GGB(4)
DR(4)
Opt(8)
GGB(8)
GR(8)
0 5000 10000 15000
0
10
20
30
40
50 Opt(16)
GGB(16)
GR(16)
Opt(32)
GGB(32)
GR(32)
0 2000 4000 6000 8000
Budget
0
10
20
30
40
50
RemainingBudget
0 5000 10000 15000 20000 25000 30000
Budget
0
10
20
30
40
50
Fig. 6: Comparison of remaining budget between the optimal algorithm, GGB and GR. The top two sub-graphs
show the case that the number of tasks ≤ 20, stages is 8, and the time-price table sizes are varied from 4 to 32
(shown in the brackets). By contrast, the bottom two sub-graphs are the cases that the number of stages are changed
from 4 to 32 (shown in the brackets) while the table size is fixed to 16 and the number of task is ≤ 20.
and each task is associated with a time-price table with
a maximum size of 16. We vary the number of stages
from 4, 8, 16 to 32, and observe the performance and
scheduling time changes in Fig. 5. From this figure,
we can see that all the algorithms exhibit the same
performance patterns with those we observed when the
impact of the table size is considered. These results are
expected as both the number of stages and the size of
tables are linearly correlated with the total workloads
in the computation. This observation can be also made
when the number of tasks in each stage is changed.
5.3 GGB vs. GR
It is interesting to compare GGB and GR. Although their
overall performances are close to each other, there are
still some performance gaps between them, which are
different from case to case. Given the different table
sizes, we can see that GR is constantly better or at
least not worse than GGB in terms of the scheduling
length and the scheduling time (Fig. 4). However, we
can not observe similar phenomenon in the case when
the number of stages is varied where neither algorithm
is constantly better than the other (Fig. 5). The reasons
behind is complicated, and mostly due to the different
algorithm behaviours. For example, as the budget re-
sources approach to the upper bound, the execution time
of GR could be increased as more budget resources could
be left for computing the DP matrix in each iteration.
In addition to the scheduling length and the execution
time, another interesting feature of the algorithms is the
remaining budget after the allocation of the computation.
The remaining budget is important as it could be an
indicator of the profit that the MapReduce providers
can earn. Fig. 6 compares the remaining budget between
the optimal algorithm and the two proposed greedy
algorithms. In most cases, the optimal algorithm has
the minimum remaining budget, which means it fully
utilizes the resources to achieve the best performance
while for the greedy algorithms, the minimum remain-
ing budget only indicates that they once made unwise
allocations during the scheduling process. By comparing
GGB and GR, one can easily see GGB in most cases
has more remaining budget not used in the computation
than GR, which is also consistent with the observation
that GR has better performance than GGB.
In summary, based on our experiments, GR is prefer-
able as in most cases it is better than GGB. However,
when the budget is not a concern for the computation,
14. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 14
using GGB may be a better choice because while offering
a performance that is very close to GR, it always leaves
more remaining budget after computation, which could
be viewed as the profits of the cloud service providers.
5.4 Optimality Verification
We verify the optimality of the GR algorithm when r is
initially set to one by using the same set of workflows as
in the previous experiments. To this end, by following
the principle of Recursion (6), we design another dy-
namic programming algorithm as a per-stage distribu-
tion algorithm which is shown in Recursion (12) where
Ti[j, b] represents the minimum time to complete jobs
indexed from j to nj given budget b, in which 0 ≤ i < κ,
0 ≤ l ≤ ni, and 0 < b ≤ Bi.
Ti[j, b] = min
0<q≤b
{max{Ti j [q], Ti[j + 1, b − q]}}
Ti[ni, Bi ni ] = Ti ni (Bi ni ) Bi ni ≤ Bi
(12)
The optimal solution to stage i can be found in Ti[0, Bi].
Given the proof of Recursion (6), the correctness of this
algorithm is easy to follow. We combine this algorithm
with Recursion (6) to achieve the global results of the
workloads in our first experiment and compare these
results with our current ones. The comparison confirms
the optimality of the greedy algorithm (Algorithm 1 in
Section 4.1.1).
6 CONCLUSIONS
In this paper, we studied two practical constraints on
budget and deadline for the scheduling of a batch of
MapReduce jobs as a workflow on a set of (virtual) ma-
chines in the Cloud. First, we focused on the scheduling-
length optimization under budget constraints. We de-
signed a global optimal algorithm by combining dy-
namic programming techniques with a local greedy algo-
rithm for budget distribution on per-stage basis, which
was also shown to be optimal.
Then, with this result, we designed two heuristic
algorithms, GGB and GR, which are based on different
greedy strategies to reduce the time complexity in mini-
mizing the scheduling lengths of the workflows without
breaking the budget. Our empirical studies reveal that
both the GGB and GR algorithms, each exhibiting a
distinct advantage over the other, are very close to the
optimal algorithm in terms of the scheduling lengths but
entail much lower time overhead.
Finally, we briefly discussed the scheduling algorithm
under deadline constraints where we convert the prob-
lem into the standard MCKS problem via a parallel
transformation.
Admittedly, our model for the budget-driven schedul-
ing of the MapReduce workflows is relatively simple,
which might not fully reflect some advanced features in
reality such as the speculative task scheduling, redun-
dant computing for fault tolerance, dynamic pricing [46]
and so on. However, it at least makes a reasonable
use case as a baseline to demonstrate how cost-effective
scheduling of the MapReduce workflows could be avail-
able in Cloud computing. The advanced features in the
scheduling with respect to the budget constraints will be
considered in our future work.
Clearly, the full infrastructure required to manage,
schedule a batch of MapReduce jobs using the proposed
algorithms in Hadoop would be a substantial imple-
mentation project. Our current focus was on providing
simulation-based evidences to illustrate the performance
advantages of the proposed algorithms. Implementing
the full system in a real cloud computing construct (Eg.
Amazon) will also be tackled as future work.
ACKNOWLEDGMENTS
The authors also would like to thank anonymous re-
viewers who gave valuable suggestion that has helped
to improve the quality of the manuscript.
REFERENCES
[1] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berri-
man, and J. Good, “On the use of cloud computing for scientific
workflows,” in eScience, 2008. eScience ’08. IEEE Fourth Interna-
tional Conference on, Dec. 2008, pp. 640 –645.
[2] G. Juve and E. Deelman, “Scientific workflows and clouds,”
Crossroads, vol. 16, no. 3, pp. 14–18, Mar. 2010.
[3] G. Juve, E. Deelman, G. B. Berriman, B. P. Berman, and P. Maech-
ling, “An evaluation of the cost and performance of scientific
workflows on amazon ec2,” J. Grid Comput., vol. 10, no. 1, pp.
5–21, Mar. 2012.
[4] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, “The
cost of doing science on the cloud: the montage example,” in
Proceedings of the 2008 ACM/IEEE conference on Supercomputing,
ser. SC ’08, Piscataway, NJ, USA, 2008, pp. 50:1–50:12.
[5] J. Dean and S. Ghemawat, “Mapreduce: simplified data process-
ing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113,
Jan. 2008.
[6] S. Papadimitriou and J. Sun, “Disco: Distributed co-clustering
with map-reduce: A case study towards petabyte-scale end-to-
end mining,” in Proceedings of the 2008 Eighth IEEE International
Conference on Data Mining, ser. ICDM ’08, 2008, pp. 512–521.
[7] Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, and K. Chen, “Sur-
vey of mapreduce frame operation in bioinformatics,” Briefings in
Bioinformatics, pp. 1–11, 2013.
[8] Amazon Elastic MapReduce, http://aws.amazon.com/
elasticmapreduce/ [Online; accessed Jan-11-2014].
[9] Microsofts Apache Hadoop on Windows Azure Services
Preview, http://searchcloudapplications.techtarget.com/tip/
The-battle-for-cloud-services-Microsoft-vs-Amazon [Online;
accessed Jan-11-2014].
[10] Hadoop on Google Cloud Platform, https://cloud.google.com/
solutions/hadoop/ [Online; accessed Jan-11-2014].
[11] Teradata Aster Discovery Platform, http://www.asterdata.com/
product/discovery-platform.php [Online; accessed Jan-11-2014].
[12] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and
I. Stoica, “Improving mapreduce performance in heterogeneous
environments,” in Proceedings of the 8th USENIX Conference
on Operating Systems Design and Implementation, ser. OSDI’08.
Berkeley, CA, USA: USENIX Association, 2008, pp. 29–42.
[Online]. Available: http://dl.acm.org/citation.cfm?id=1855741.
1855744
[13] S. Crago, K. Dunn, P. Eads, L. Hochstein, D.-I. Kang, M. Kang,
D. Modium, K. Singh, J. Suh, and J. Walters, “Heterogeneous
cloud computing,” in Cluster Computing (CLUSTER), 2011 IEEE
International Conference on, 2011, pp. 378–385.
[14] H. Liu and D. Orban, “Cloud mapreduce: A mapreduce imple-
mentation on top of a cloud operating system,” in Cluster, Cloud
and Grid Computing (CCGrid), 2011 11th IEEE/ACM International
Symposium on, May 2011, pp. 464–474.
15. SUBMISSION TO IEEE TRANSACTIONS ON CLOUD COMPUTING 15
[15] S. Ibrahim, H. Jin, B. Cheng, H. Cao, S. Wu, and L. Qi, “Cloudlet:
towards mapreduce implementation on virtual machines,” in
Proceedings of the 18th ACM international symposium on High per-
formance distributed computing, ser. HPDC ’09, 2009, pp. 65–66.
[16] S. Ibrahim, H. Jin, L. Lu, L. Qi, S. Wu, and X. Shi, “Evaluating
mapreduce on virtual machines: The hadoop case,” in Proceedings
of the 1st International Conference on Cloud Computing, ser. Cloud-
Com ’09, Jun. 2009, pp. 519–528.
[17] M. Correia, P. Costa, M. Pasin, A. Bessani, F. Ramos, and P. Veris-
simo, “On the feasibility of byzantine fault-tolerant mapreduce
in clouds-of-clouds,” in Reliable Distributed Systems (SRDS), 2012
IEEE 31st Symposium on, 2012, pp. 448–453.
[18] F. Marozzo, D. Talia, and P. Trunfio, “Enabling reliable mapreduce
applications in dynamic cloud infrastructures,” ERCIM News, vol.
2010, no. 83, pp. 44–45, 2010.
[19] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker,
and I. Stoica, “Delay scheduling: a simple technique for achieving
locality and fairness in cluster scheduling,” in Proceedings of the
5th European conference on Computer systems, ser. EuroSys ’10, 2010,
pp. 265–278.
[20] H. Chang, M. Kodialam, R. Kompella, T. V. Lakshman, M. Lee,
and S. Mukherjee, “Scheduling in mapreduce-like systems for fast
completion time,” in INFOCOM, 2011 Proceedings IEEE, 2011, pp.
3074–3082.
[21] H.-H. You, C.-C. Yang, and J.-L. Huang, “A load-aware sched-
uler for mapreduce framework in heterogeneous cloud environ-
ments,” in Proceedings of the 2011 ACM Symposium on Applied
Computing, ser. SAC ’11, 2011, pp. 127–132.
[22] B. Thirumala Rao and L. S. S. Reddy, “Survey on Improved
Scheduling in Hadoop MapReduce in Cloud Environments,”
International Journal of Computer Applications, vol. 34, no. 9, pp.
29–33, Nov. 2011.
[23] Y. Li, H. Zhang, and K. H. Kim, “A power-aware scheduling of
mapreduce applications in the cloud,” in Dependable, Autonomic
and Secure Computing (DASC), 2011 IEEE Ninth International Con-
ference on, 2011, pp. 613–620.
[24] P. Kondikoppa, C.-H. Chiu, C. Cui, L. Xue, and S.-J. Park,
“Network-aware scheduling of mapreduce framework ondis-
tributed clusters over high speed networks,” in Proceedings of the
2012 workshop on Cloud services, federation, and the 8th open cirrus
summit, ser. FederatedClouds ’12, 2012, pp. 39–44.
[25] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares,
and X. Qin, “Improving mapreduce performance through data
placement in heterogeneous hadoop clusters,” in Parallel Dis-
tributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE
International Symposium on, 2010, pp. 1–9.
[26] K. Wang, B. Tan, J. Shi, and B. Yang, “Automatic task slots
assignment in hadoop mapreduce,” in Proceedings of the 1st
Workshop on Architectures and Systems for Big Data, ser. ASBD ’11.
New York, NY, USA: ACM, 2011, pp. 24–29. [Online]. Available:
http://doi.acm.org/10.1145/2377978.2377982
[27] Z. Guo and G. Fox, “Improving mapreduce performance in
heterogeneous network environments and resource utilization,”
in Proceedings of the 2012 12th IEEE/ACM International Symposium
on Cluster, Cloud and Grid Computing (Ccgrid 2012), ser. CCGRID
’12. Washington, DC, USA: IEEE Computer Society, 2012, pp.
714–716. [Online]. Available: http://dx.doi.org/10.1109/CCGrid.
2012.12
[28] Z. Zhang, L. Cherkasova, and B. T. Loo, “Performance modeling
of mapreduce jobs in heterogeneous cloud environments,” in
Proceedings of the 2013 IEEE Sixth International Conference on
Cloud Computing, ser. CLOUD ’13. Washington, DC, USA:
IEEE Computer Society, 2013, pp. 839–846. [Online]. Available:
http://dx.doi.org/10.1109/CLOUD.2013.107
[29] T. Sandholm and K. Lai, “Dynamic proportional share scheduling
in hadoop,” in Job Scheduling Strategies for Parallel Processing,
ser. Lecture Notes in Computer Science, E. Frachtenberg and
U. Schwiegelshohn, Eds. Springer Berlin Heidelberg, 2010, vol.
6253, pp. 110–131.
[30] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang,
S. Antony, H. Liu, and R. Murthy, “Hive - a petabyte scale data
warehouse using hadoop,” in Data Engineering (ICDE), 2010 IEEE
26th International Conference on, 2010, pp. 996–1005.
[31] P. Sinha and A. A. Zoltners, “The multiple-choice knapsack
problem,” Operations Research, vol. 27, no. 3, pp. pp. 503–515, 1979.
[32] D. Pisinger, “A minimal algorithm for the multiple-choice knap-
sack problem.” European Journal of Operational Research, vol. 83,
pp. 394–410, 1994.
[33] J. Dean and S. Ghemawat, “Mapreduce: simplified data pro-
cessing on large clusters,” in Proceedings of the 6th conference on
Symposium on Opearting Systems Design & Implementation - Volume
6, ser. OSDI’04, 2004, pp. 10–10.
[34] Apache Software Foundation. Hadoop, http://hadoop.apache.
org/core [Online; accessed Jan-11-2014].
[35] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: dis-
tributed data-parallel programs from sequential building blocks,”
in Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference
on Computer Systems 2007, ser. EuroSys ’07, 2007, pp. 59–72.
[36] Greenplum HD, http://www.greenplum.com [Online; accessed
Jan-11-2014].
[37] Hadoop FairScheduler http://hadoop.apache.org/docs/r1.2.1/
fair scheduler.html [Online; accessed Jan-11-2014].
[38] Hadoop CapacityScheduler http://hadoop.apache.org/docs/r0.
19.1/capacity scheduler.html [Online; accessed Jan-11-2014].
[39] Y. Li, H. Zhang, and K. H. Kim, “A power-aware scheduling of
mapreduce applications in the cloud,” in Dependable, Autonomic
and Secure Computing (DASC), 2011 IEEE Ninth International Con-
ference on, 2011, pp. 613–620.
[40] K. Kc and K. Anyanwu, “Scheduling hadoop jobs to meet dead-
lines,” in Cloud Computing Technology and Science (CloudCom), 2010
IEEE Second International Conference on, 2010, pp. 388–392.
[41] K. Wang, B. Tan, J. Shi, and B. Yang, “Automatic task slots assign-
ment in hadoop mapreduce,” in Proceedings of the 1st Workshop on
Architectures and Systems for Big Data, ser. ASBD ’11, 2011, pp.
24–29.
[42] J. Yu and R. Buyya, “Scheduling scientific workflow applications
with deadline and budget constraints using genetic algorithms,”
Sci. Program., vol. 14, no. 3,4, pp. 217–230, Dec. 2006.
[43] L. Zeng, B. Veeravalli, and X. Li, “Scalestar: Budget conscious
scheduling precedence-constrained many-task workflow applica-
tions in cloud,” in Proceedings of the 2012 IEEE 26th International
Conference on Advanced Information Networking and Applications, ser.
AINA ’12, 2012, pp. 534–541.
[44] E. Caron, F. Desprez, A. Muresan, and F. Suter, “Budget con-
strained resource allocation for non-deterministic workflows on
an iaas cloud,” in Proceedings of the 12th international conference on
Algorithms and Architectures for Parallel Processing - Volume Part I,
ser. ICA3PP’12, 2012, pp. 186–201.
[45] H. Xu and B. Li, “Dynamic cloud pricing for revenue maximiza-
tion,” IEEE Transactions on Cloud Computing, vol. 1, no. 2, pp. 158–
171, 2013.
[46] Amazon EC2, Spot Instances, http://aws.amazon.com/ec2/
spot-instances [Online; accessed March-29-2014].
Yang Wang received the BS degree in applied
mathematics from Ocean University of China in
1989, and the MS and PhD degrees in computer
science from Carleton University and University
of Alberta, Canada, in 2001 and 2008, respec-
tively. He is currently working with IBM Center
for Advanced Studies (CAS Atlantic), University
of New Brunswick, Fredericton, Canada. His
research interests include big data analytics in
clouds and exploiting heterogeneous multicore
processors to accelerate the execution of the
Java Virtual Machine.
Wei Shi is an assistant professor at the Uni-
versity of Ontario Institute of Technology. She is
also an adjunct professor at Carleton University.
She holds a Bachelor of Computer Engineering
from Harbin Institute of Technology in China and
received her Ph.D. of Computer Science degree
from Carleton University in Ottawa, Canada.
Prior to her academic career, as a software
developer and project manager, she was closely
involved in the design and development of a
large-scale Electronic Information system for the
distribution of welfare benefits in China, as well as of a World Wide Web
Information Filtering System for Chinas National Information Security
Centre.