Execution Environment for On-Demand Computing Services Based on Shared Clusters

343 views
230 views

Published on

This thesis talk studies resource management for on-demand computing services through a shared cluster. In such a context, the aim was to propose tools to enable allocating resources automatically for executing on-demand user requests, to enable sharing resources proportionally among those services, while maximizing their use. Funded by the Minalogic global business cluster through the Ciloe Project (http://ciloe.minalogic.net), this work targets on organizations such as SMB, which are not able to support the charge of purchasing and maintaining a dedicated computing infrastructure. Firstly, we have achieved a deep survey in the areas of on-demand computing and high performance computing. From this survey, we have defined a virtualized architecture to enable dynamic execution of user requests thanks to a special resource manager. Finally, we have proposed policies and algorithms which are so flexible to offer a suitable tradeoff between equity and resource use. Having worked in a context of industrial collaboration, we have developed a prototype of our proposal as a proof of concept. Based on open standards, this prototype relies on existing virtualization tools such as OpenNebula for allocating and manipulating virtual machines over the cluster's nodes. From this prototype along with various workloads, we have carried out experiments to evaluate our architecture and scheduling algorithms. Results have shown that our contributions allow to achieve the expected goals while being reliable and efficient.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
343
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Execution Environment for On-Demand Computing Services Based on Shared Clusters

  1. 1. 1/40 Execution Environment for On- Demand Computing Services Based on Shared Clusters PhD thesis, Grenoble University By Rodrigue Chakode (LIG/INRIA, Equipe Mescal) Advisors: - Jean-François Méhaut - Maurice Tchuenté
  2. 2. 2/40 Cloud Computing in a Nutshell ◉ Enables computing features as services ◉ Free or commercial services accessible over network ◉ On-demand and elastic accesses, plus a utility billing – Customers (users of the service) only pay for what they use, aka pay-as-you-go – Requests for more or less features should be satisfied quickly ◉ Services setup transparently against customers – They don't have to care about how the service is enabled
  3. 3. 3/40 Context Statement on Cloud Computing ◉Various sorts of cloud services – Infrastructure-as-a-Service, Platform-as-a-Service, Software- as-a-Service, Data-as-a-Service, Translation-as-a-Service... – Almost everything could be a service (XaaS) ◉Requires to set up a suitable computing infrastructure – Servers, storage, network fabrics, cooling system... ◉May need significant investments – Out of reach for many small or medium businesses (SMBs) – Market currently dominated by biggest organizations Introduction
  4. 4. 4/40 Challenges for HPC ◉ Numerous software require intensive computing capabilities – E.g. EDA Applications (Ciloe Project) – Integrated circuits need to be simulated before manufacturing ◉ Computing architectures are increasingly parallel – SMP, NUMA, GPU, Cluster... and soon many-core architectures ◉ HPC applications run on clusters of multicore nodes (SMP/NUMA) ◉ Also expensive Example of a cluster. Credit : CEA Introduction
  5. 5. 5/40 Bring HPC Services into Clouds ◉Services requiring intensive computations ◉Services enabled from a mutualized cluster – Cluster supported by several businesses – Each business providing its own service – Cluster's resources shared among the services ◉Study with the context of an industrial collaboration – The Ciloe Project [http://ciloe.minalogic.net] – Three SBEs editing EDA applications involved Introduction
  6. 6. 6/40 Outline ◉ Introduction ◉ Problem statement ◉ Background – Existing SaaS clouds and their related RM issues – Survey on existing resource sharing techniques ◉ Contributions – Overview : Scheduling Approach and Execution Model – Architecture Model and Scheduling Strategy – Prototyping ◉ Experimental evaluation – Evaluation Protocol – Results ◉ Conclusion & perspectives
  7. 7. 7/40 Resource Management for HPC SaaS Services ◉What is a service –Computes customer data with a specific application –Input specifies an application and the data –Output retrieved after the computation –No more interactions necessary Problem Statement
  8. 8. 8/40 Related Research Issues ◉Data Management ◉Resilience and Fault Tolerance ◉Security and privacy ◉Resource Management Problem Statement
  9. 9. 9/40 Scheduling Problems ◉Share the cluster's resources among the services – according to the investments of the different businesses ◉Maximize the use of resources – Use idle resources to run pending requests – Run miscellaneous tasks on idle resources in a best-effort way ◉Minimize the impact of selfish behaviors – A business can under-invest while needing a lot of resources Problem Statement
  10. 10. 10/40 Resource Allocation for On-demand Services ◉ Running requests in a dynamic way – Resources should be allocated dynamically – Allocated resources should be freed up automatically once a request completed – Handle Input/Output data in a transparent way ◉ Need to think of resource partitioning – Modern computing nodes have several cores – The amount of cores required by certain tasks can be less than the number of cores available on a node Problem Statement
  11. 11. 11/40 Outline ◉ Introduction ◉ Problem statement ◉ Background – Existing SaaS clouds and their related RM issues – Survey on existing resource sharing techniques ◉ Contributions – Overview : Scheduling Approach and Execution Model – Architecture Model and Scheduling Strategy – Prototyping ◉ Experimental evaluation – Evaluation Protocol – Results ◉ Conclusion & perspectives
  12. 12. 12/40 Background on Existing SaaS Clouds ◉ Target office and collaborative applications – E.g. Google Docs, Salesforce, Office365... – Need of interactiveness ◉ SaaS cloud as a layer on top of a PaaS – PaaS can rely on an IaaS layer – IaaS enables on-demand resource allocation • Virtualization plays an important role ◉ Resources belong to an unique organization Background on SaaS Clouds
  13. 13. 13/40 Services for Intensive Computations ◉ No need of interactiveness ◉ Requires a high dynamicity and transparency • Allocation of resources when executing a task • Release of resources once a task completed ◉ Mutualized resources =>Need to deal with sharing the resources among the services Background on SaaS Clouds
  14. 14. 14/40 Scheduling services on mutualized resources ◉ Raises conflicting objectives – Fairness against the service suppliers – Efficiency concerning the use of resources ◉ Prioritize an objective penalizes the other => Requires to make a tradeoff Background on resource management
  15. 15. 15/40 Common resource scheduling strategies ◉ First-come, First-served (FCFS) ◉ FCFS along with Backfilling (EASY/Conservative) + Fair against users – Inefficient in terms of utilization – May be unfair against some businesses in out context + Improve utilization – May significantly delay biggest tasks + Possible optimization with a conservative backfilling – Remains unfair in our context Background on resource management
  16. 16. 16/40Background on resource management How Resources are Assigned to Tasks ◉ Simple assignation strategies – Greedy and round-robin algorithms ◉ Assignations guided by performance requirements – Notion of match-making (affinities between resources and tasks) ◉ Prioritization – More prioritized tasks get access to resources first • Preemption can be introduced => Notion of best-effort when certain tasks only run on idle resources ◉ Reservation and leasing – Resources are allocated for a given time slot
  17. 17. 17/40Background on resource management Common resource sharing strategies ◉ Static sharing (partitioning) ◉ Fair-sharing (no partitioning + dynamic priorities) + Fair and easy to setup – Inefficient in terms of utilization in our context + Tradeoff between the fairness and the utilization – May still raise unfair situations in our context R1 R2 R3 R4 R5 R6 R7 R1 R2 R3 R4 R5 R6 R7 Business 1 Business 2 Business 3
  18. 18. 18/40 Partitioning Individual Node ◉ Requires isolation among tasks – A task would not access resources allocated to another task ◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...) + Low level partitioning inducing a low overhead => good performances – Non-flexible since not easy to handle dynamically ◉ Isolation with virtual machines (VMs) + High level partitioning => High flexibility in terms of automation – Possible performance overhead ―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...) Background on resource management
  19. 19. 19/40 Synthesis on Partitioning Resources ◉ Virtual Machines enable interesting features – To partition each individual node along with a high isolation – To allocate and free up resources dynamically – To suspend/restart best-effort tasks ◉ Powerful and proved VM management tools – Handle VMs on individual node – Xen, KVM, ESXi, Hyper-V... – Handle VMs on distributed environments • OpenNebula, Eucalyptus, OpenStack... ―Target IaaS clouds
  20. 20. 20/40 Problems to Address With VMs ◉ Deal with performance overhead – Generic optimizations • HVM, PCI Passthrough – Solution-specific optimizations • Paravirtualization (Xen, Hyper-V) • Virtio (KVM, Xen) ◉ Allocate custom VMs dynamically on distributed environments – Contextualization enables interesting features (OpenNebula)
  21. 21. 21/40 Lacks of the Existing According to Our Aims ◉ On-demand HPC services on a mutualized cluster – Existing SaaS clouds focus on collaborative or office applications • Resources owned by a single organization ◉ Existing resources sharing strategies don't suit our needs => Necessity to design new approaches ◉ Contributions – Scheduling strategy for sharing mutualized resources – Architecture for on-demand HPC services – Prototyping for evaluation Background on resource management
  22. 22. 22/40 Outline ◉ Introduction ◉ Problem statement ◉ Background – Existing SaaS clouds and their related RM issues – Survey on existing resource sharing techniques ◉ Contributions – Overview : Scheduling Approach and Execution Model – Architecture Model and Scheduling Strategy – Prototyping ◉ Experimental evaluation – Evaluation Protocol – Results ◉ Conclusion & perspectives
  23. 23. 23/40 Ideas for the resource sharing strategy ◉ Combines the advantages... – of a static sharing where the fairness is easy to hold – and those of a fair-sharing strategy that allows to improve the utilization ◉ Enables a elasticity in resource sharing – A business to use more resources than its investment : • When the task raising such a situation has a duration less than a acceptable duration threshold noted D • Or When the task is of best-effort type => Limits the impact of selfish behaviors from certain businesses Contributions : Overview
  24. 24. 24/40 Handling Requests Dynamically ◉ Encapsulate each task within a virtual machine (VM) – Eases the partitioning of nodes and enables dynamicity ◉ Enable a Specific SaaS Manager – Implements the scheduling strategy to address the resource sharing issues – Assumes the allocation and the destruction of VMs ◉ Exploit the Contextualization of VMs – VM created, customized and started dynamically • VM suitably set to launch the task once started – VM automatically destroyed once the task is completed
  25. 25. 25/40 Architecture Model ◉ The SaaS Manager on top of the cluster – Relies on a virtual infrastructure manager (VIM) – VIM relies on hypervisors ◉ Possibility of reusing existing tools – Avoids rewriting existing features – Benefits of features from powerful proved tools Contributions : Architecture Model
  26. 26. 26/40 Design Driven by Openness, Performances and Interoperability ◉ OpenNebula enables support for handling the VMs – Featuring the contextualization ◉ Xen manages VMs on each individual node – Exploits the paravirtualization for better performances ◉ The different components coupled though Open APIs – Ensure a better interopera- bility Contributions : Architecture Model
  27. 27. 27/40 Resource Sharing Strategy : Case study ◉ A situation with three businesses B1, B2 and B3 – B1 (with green tasks) invested for 2/7 of resources (R1, R2...R7) – B2 (with red tasks) invested for 2/7 – B3 (with blue tasks) for 3/7 ◉ On the figure, think of tasks as the related VMs Contributions : Resource Management Strategy t2 t3 t5 t6 t1 t4 Queued tasks
  28. 28. 28/40 Resource Sharing Strategy : Example 1 ◉ Assumes the duration of t1 and t5 <= D (the chosen duration threshold) – B1 and B3 are using ratios of resources geater than their investments – That representing a complementary ratio of 1/14 for each of them Contributions : Resource Management Strategy Queued tasks t5 t1 t2 t3 t6 t4
  29. 29. 29/40 Resource sharing strategy : Example 2 ◉ None of tasks has a duration <= D, but the task t2 is of best-effort type – B1 is using a ratio of resources 1/7 greater than its investment – t2 can be suspended at any time Contributions : Resource Management Strategy t4t1 Queued tasks t3 t2 t5 t6
  30. 30. 30/40 About Implementation ◉ Relies on principles of resource leasing – A lease consists in allocating a virtual machine for running a task – The duration of a lease depends on the related task • Its duration and its of the type (best-effort or not) ◉ Two kinds of leases handled specifically – Non-preemptive leases • Assigned to tasks related to the customers ―Non preemptive tasks => Resources only freed up at completion – Preemptive leases • Assigned to best-effort tasks ―VMs can be suspended to be restart later => No guaranty of completion Contributions : Resource Management Strategy
  31. 31. 31/40 Prototyping and Overview on Integration ◉ SVMSched (Smart Virtual Machine Scheduler) – Drop-in replacement for the OpenNebula's default scheduler – Proper interfaces that provide the SaaS abstraction – Deals with allocating and freeing up VMs dynamically – Implements the resource sharing strategy – Supports contextualization data stored on Network File Systems Contributions : Prototyping
  32. 32. 32/40 Outline ◉ Introduction ◉ Problem statement ◉ Background – Existing SaaS clouds and their related RM issues – Survey on existing resource sharing techniques ◉ Contributions – Overview : Scheduling Approach and Execution Model – Architecture Model and Scheduling Strategy – Prototyping ◉ Experimental evaluation – Evaluation Protocol – Results ◉ Conclusion & perspectives
  33. 33. 33/40 Evaluation Protocol ◉ Evaluation of the performances of an application – Time to setup the VM – Performance overhead induced by the virtualization ◉ Study of the scheduling strategy – Is that behaves well regarding the fairness and the utilization ? – If not, how it can be improved? ◉ Experimental conditions – Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM – Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB – Applications from the Parsec Benchmark (BodyTrack, Blackscholes, Freqmine) Evaluation
  34. 34. 34/40Evaluation Performances of the virtualization ◉ Full VMs perform better than contextualized ones => slight difference ◉ High overhead : applications requiring high disk IO ◉ VMs perform better than native machines =>concurrent tasks requiring high memory IO ◉ Contextualized VMs : require constant and low setup time – ~15s (<5% of the duration of a task of 5 mins) with an image of 500 MB ◉ Full VMs : times grow linearly
  35. 35. 35/40Evaluation Analyzing the scheduling strategy ◉ Better choice of the threshold – Businesses can benefit from the mutualization – Prevents the temptation for selfish behaviors – Best-effort tasks would allows better utilization ◉ Mutualization is not relevant – The threshold is not suitably chosen – There is no best-effort tasks – The strategy leads to a static sharing
  36. 36. 36/40 Outline ◉ Introduction ◉ Problem statement ◉ Background – Existing SaaS clouds and their related RM issues – Survey on existing resource sharing techniques ◉ Contributions – Overview : Scheduling Approach and Execution Model – Architecture Model and Scheduling Strategy – Prototyping ◉ Experimental evaluation – Evaluation Protocol – Results ◉ Conclusion & perspectives
  37. 37. 37/40 Conclusion ◉ We studied and set up an environment for enabling HPC SaaS services on shared computing resources – Designing an architecture model that relies on virtualization for executing on-demand requests – Design resource management algorithms that allow to share in a fair way the resources while maximizing their use ◉ A prototype has been developed to evaluate experimentally our contributions – Results shown the feasibility of our approach – Prototype integrated in the deliveries of the Ciloe Project ◉ Thus we have enabled a room for addressing the problem of costs that highly constraints SMBs needing HPC resources for their applications Conclusion & Perspectives
  38. 38. 38/40 Perspectives ◉ Model of predicting the duration of each task – Envisioning an approximation model based on reinforcing learning ◉ Economic model of billing – What parameters the invoicing can take into account? • Per-use costs of software licenses and computing resources + earnings ◉ Dimensioning the platform – To allow each business to have a suitable view of its needs in terms of resources Conclusion & Perspectives
  39. 39. 39/40 About this Work ◉ Awards – 1st Prize Grid'5000 Challenge, Reims 2011 ◉ Book Chapter – Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6, Springer-Verlag, April 2012. ◉ International conferences – Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service Science. Pages 352 – 361. Netherlands, May 2011. – Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand: Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information Networking and Applications. Pages 126 – 133. Australia, April 2010. ◉ National conferences – Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011. Saint-Malo, France, Mai 2011. ◉ Other publications (in the cloud community) – Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula. In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646. – Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched
  40. 40. 40/40 Thanks for your attention !

×