Scaling MapReduce Applications
across Hybrid Clouds to Meet Soft
Deadlines
Dongseo University
Division of Computer & Infor...
2
About The Paper
 2013 IEEE International Conference on Advanced
Information Networking and Applications (AINA)
 Cloud ...
Outline
 Motivations
 Proposed Policy
 Proposed Policy Scenario
 Policy parameters
 Policy Parameters Algorithm
 Imp...
 As MapReduce is becoming the prevalent
programming model for building data processing
applications in Clouds, the need f...
Proposed Policy
5
 To tackle this limitation of current approaches for
the problem, authors proposed a novel policy for
d...
Policy System
1. The initial state of the system consists of
 local worker nodes registered with the master
node and read...
4. Requested resources register with the master node
as they become available and the scheduler assigns
Map tasks to them;...
Proposed Policy Scenario
8
9
Policy Parameters
 MARGIN is a ‘safety margin’ that is removed from the
remaining time to account for errors in the pre...
Policy Parameters Algorithm
10
The decision of
number of public
Cloud resources to
be allocated to a
MapReduce
application
11
Implementation
 Implemented in the Aneka Cloud Platform with some
changes:
 Dynamic Provisioning: Extended the existi...
12
Performance Evaluation
Experimental Testbed and Sample
Application
 The environment is a hybrid Cloud composed of a
lo...
Performance Evaluation
Experimental Testbed and Sample
Application13
14
Performance Analysis
Results – Map Phase
Authors sequentially executed several
requests for the word count application....
Performance Analysis
Results – Map Phase
Performance Analysis
Results – Reduce Phase
352 MB as Small Data
3422 MB as Big Data
Similarly, a sleep was inserted in th...
 This paper presented dynamic provisioning policy for
MapReduce applications and a prototype in the Aneka
Cloud Platform....
My Opinion
18
Q & A
Thank You!
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications across hybrid clouds to meet soft deadlines
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications across hybrid clouds to meet soft deadlines
Upcoming SlideShare
Loading in …5
×

Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications across hybrid clouds to meet soft deadlines

861 views

Published on

Scaling map reduce applications across hybrid clouds to meet soft deadlines - By Michael Mattess, Rodrigo N. Calheiros, and Rajkumar Buyya,  Proceedings of the 27th IEEE International Conference on Advanced Information Networking and Applications (AINA 2013, IEEE CS Press, USA), Barcelona, Spain, March 25-28, 2013.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
861
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications across hybrid clouds to meet soft deadlines

  1. 1. Scaling MapReduce Applications across Hybrid Clouds to Meet Soft Deadlines Dongseo University Division of Computer & Information Engineering Intelligent Smart Systems Research Lab Presented by: Ahmed Abdulhakim Al-Absi
  2. 2. 2 About The Paper  2013 IEEE International Conference on Advanced Information Networking and Applications (AINA)  Cloud Computing and Distributed Systems (CLOUDS) Laboratory  University of Melbourne, Australia Michael Mattess, Rodrigo N. Calheiros, and Rajkumar Buyya, Proceedings of the 27th IEEE International Conference on Advanced Information Networking and Applications (AINA 2013, IEEE CS Press, USA), Barcelona, Spain, March 25-28, 2013.
  3. 3. Outline  Motivations  Proposed Policy  Proposed Policy Scenario  Policy parameters  Policy Parameters Algorithm  Implementation  Performance Evaluation  Experimental Testbed and Sample Application  Performance Analysis  Experimental Testbed and Sample Application  Results – Map phase  Results – Reduce phase  Conclusion  My Opinion 3
  4. 4.  As MapReduce is becoming the prevalent programming model for building data processing applications in Clouds, the need for timely execution of such applications becomes a necessity.  Existing approaches for execution of such deadline- constrained MapReduce applications focus in meeting deadlines via admission control. However, they cannot solve conflicts when applications with Motivation 4
  5. 5. Proposed Policy 5  To tackle this limitation of current approaches for the problem, authors proposed a novel policy for dynamic provisioning of public Cloud resources to speed up execution of MapReduce applications.  Authors target the Map phase as the target of the soft deadline because this phase contains tasks that generally have uniform execution time.
  6. 6. Policy System 1. The initial state of the system consists of  local worker nodes registered with the master node and ready to execute MapReduce tasks.  Datasets are available for local workers and are also stored in the Cloud provider’s storage service. 2. A MapReduce application is submitted to the master node and the scheduler begins assigning Map tasks to worker nodes; 3. When a predefined number or fraction of the Map tasks completes, the master uses the provisioning policy and, based on the application deadline, 6
  7. 7. 4. Requested resources register with the master node as they become available and the scheduler assigns Map tasks to them; 5. When the Map phase completes, the scheduler begins assigning Reduce tasks to workers; 6. Each Reduce task obtains the intermediate data to be reduced from other nodes; 7. The output of the Reduce tasks may remain on the nodes in anticipation of another MapReduce application, which will further process the data. Alternatively, the output can be collected by the master node and sent back to the user that submitted the application. 7 Policy System
  8. 8. Proposed Policy Scenario 8
  9. 9. 9 Policy Parameters  MARGIN is a ‘safety margin’ that is removed from the remaining time to account for errors in the prediction of tasks execution time.  LOCAL FACTOR is a multiplier applied to the average run time of the first batch of Map tasks to generate an estimation of the expected Map task execution time on the Cluster considering variation in the worker performance.  REMOTE FACTOR is set to predict the expected Map task execution time on public Cloud resources. This allows accounting for the expected variation in the performance of local and public Cloud resources.  BOOT TIME is the expected amount of time between requesting a new resource from the public Cloud and
  10. 10. Policy Parameters Algorithm 10 The decision of number of public Cloud resources to be allocated to a MapReduce application
  11. 11. 11 Implementation  Implemented in the Aneka Cloud Platform with some changes:  Dynamic Provisioning: Extended the existing MapReduce Service of Aneka to enable its interaction with the Provisioning Service  Data Exchange: enable Aneka MapReduce to work across local and remote resources by HTTP and IIS to serve the intermediary data files.  Remote Storage: S3 as the source of local input data files when running on EC2.
  12. 12. 12 Performance Evaluation Experimental Testbed and Sample Application  The environment is a hybrid Cloud composed of a local Cluster and a public Cloud:  Local Cluster: 4 IBM System X3200 M3 servers running Citrix XenServer. 2 Windows 2008 virtual machines= 8 worker nodes.  Public Cloud Resources: provisioned from Amazon EC2, USA East Coast data center. m1.small instances, 1.0-1.2 GHz CPU, 1.7 GB of memory.  Dataset: 4.5 GB copied in Local and S3
  13. 13. Performance Evaluation Experimental Testbed and Sample Application13
  14. 14. 14 Performance Analysis Results – Map Phase Authors sequentially executed several requests for the word count application. On each request, they modified the sleep time in each Map task, and kept the deadline for completing the Map phase constant and equal to 30 minutes for each application.
  15. 15. Performance Analysis Results – Map Phase
  16. 16. Performance Analysis Results – Reduce Phase 352 MB as Small Data 3422 MB as Big Data Similarly, a sleep was inserted in the Reduce tasks in order to increase their execution time and observe how different sizes of Reduce tasks affect execution time of applications. Two different values for sleep, 60 seconds and 5 seconds 16
  17. 17.  This paper presented dynamic provisioning policy for MapReduce applications and a prototype in the Aneka Cloud Platform.  Results showed that the approach, even though its lower complexity, delivers good results.  The policy was able to meet deadlines of applications, which are defined in terms of completion time of the Map phase, for increasing execution times of Map Conclusion 17
  18. 18. My Opinion 18
  19. 19. Q & A Thank You!

×