Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Partitioning CCGrid 2012


Published on

A workflow partitioning and resource provisioning approach to solve the execution of large-scale workflows

Published in: Technology
  • Be the first to comment

Partitioning CCGrid 2012

  1. 1. Integration of Workflow Partitioning and Resource Provisioning Weiwei Chen, Ewa Deelman {wchen,deelman} Information Sciences Institute University of Southern California CCGrid 2012, Ottawa, Canada 1  
  2. 2. Outline •  Introduction •  System Overview •  Solution –  Heuristics –  Genetic Algorithms –  Ant Colony Optimization •  Evaluation –  Heuristics •  Related Work •  Q&A 2  
  3. 3. Introduction •  Scientific Workflows –  A set of jobs and the dependencies between them. –  DAG (Directed Acyclic Graph), where nodes represent ! ! computation and directed edges represent data! flow Job1 dependencies. !•  Pegasus Workflow Management System ! Job2 Job3 Job4 –  Workflow Planner: Pegasus ! •  Abstract Workflow: portable, execution site !! independent ! Job5 •  Concrete Workflow: bound to specific sites ! ! –  Workflow Engine: DAGMan ! ! ! –  Resource Provisioner: Wrangler ! ! –  Execution/Scheduling System: Condor/Condor-G –  Environment: Grids, Clouds, Clusters, many-cores 3  
  4. 4. Introduction •  Background –  Large scale workflows require multiple execution sites to run. –  The entire CyberShake earthquake science workflow has 16,000 sub- workflows and each sub-workflow has ~24,000 jobs and requires ~58GB. –  A Montage workflow with a size of 8 degree square of sky has ~10,000 jobs and requires ~57GB data. the Galactic Plane that covers 360 degrees along the plane and +/-20 degrees on either side of it. Figure  1.1  Output  of  the  Montage  workflow.  The  image   Figure   1.2   CyberShake   workflow   above  was  recently  created  to  verify  a  bar  in  the  spiral   and   example   output   for   the   galaxy  M31.   Southern  California  Area.     4  
  5. 5. Single Site VM     Provisioner   Data   Staging  DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler  
  6. 6. Single Site •  Constraints/Concerns –  Storage systems –  File systems –  Data transfer services –  Data constraints –  Services constraints 6  
  7. 7. Multiple Sites, No Partitioning VM     Provisioner   Data   Staging   DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler   Data   Staging   VM     Provisioner  
  8. 8. Multiple Sites, No Partitioning •  Constraints/Concerns –  Job migration –  Load balancing –  Overhead –  Cost –  Deadline –  Resource utilizations 8  
  9. 9. Multiple Sites, Partitioning VM     Provisioner   Data   Staging   DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler  DAX   DAX   Workflow   VM     ParPPoner   Scheduler   Provisioner   Data   Staging   DAG   DAX   Workflow     Workflow     Job   Planner   Engine   Scheduler  
  10. 10. Solution •  A hierarchical workflow Ø  It contains workflows (sub-workflow) as its jobs. Ø  Sub-workflows are planned at the execution sites and matched to the resources in them. •  Workflow Partitioning vs Job Grouping/Clustering Ø  Heterogeneous Environments §  MPIDAG, Condor DAG, etc. Ø  Data Placement Services §  Bulk Data Transfer 10  
  11. 11. Solution •  Resource Provisioning Ø  Virtual Cluster Provisioning Ø  The number of resources and the type of VM instances (worker node, master node and I/O node) are the parameters indicating the storage and computational capability of a virtual cluster. Ø  The topology and structure of a virtual cluster: balance the load in different services (scheduling service, data transfer service, etc.) and avoid a bottleneck. Ø  On grids, usually the data transfer service is already available and does not need further configuration. 11  
  12. 12. Data Transfer across Sites •  A pre-script to transfer data before and after the job execution •  A single data transfer job on demand •  A bulk data transfer job Ø  merge data transfer Computation Data Transfer 12  
  13. 13. Backward Search Algorithm •  Targeting a workflow with a fan-in-fan-out structure •  Search operation involves three steps. It starts from the sink job and proceeds backward. –  First, check if it’s safe to add the whole fan structure into the sub-workflow (aggressive search). –  If not, a cut is issued between this fan-in job and its parents to avoid cycle dependency and increase parallelism. –  Second, a neutral search is performed on its parent jobs, which include all of its predecessors until the search reaches a fan-out job. –  If this partition is still too large, a conservative search is performed that includes all of its predecessors until it reaches a fan-in job or a fan- out job. Figure  2.3  Search  OperaPon   13  
  14. 14. Heuristics (Storage Constraints) •  Heuristics I –  Dependencies between sub-workflows should be reduced since they represent data transfer between sites. –  Usually jobs that have parent-child relationships share a lot of data. It’s reasonable to schedule such jobs into the same sub-workflow. –  Heuristic I only checks three types of nodes: the fan-out job, the fan-in job, and the parents of the fan-in job and search for the potential candidate jobs that have parent-child relationships between them. –  Check operation means checking whether one job and its potential candidate jobs can be added to a sub-workflow without violating constraints. –  Our algorithm reduces the time complexity of check operations by n folds, while n equals to the average depth of the fan-in-fan-out structure. 14  
  15. 15. J1 Heuristic ISearch Operation: Less Aggressive Search Aggressive Search J2 J3 J4 J5Candidate List(CL): {J1, J2, J3, J4, J5, J6, J7, J8, J9} {J4, J5, J7} {J2, J3, J6}Job to be examined(J): J10 J1 J9 J8 J6 J7Partition (P): PP3={} J3, J6, J8} P4={J1} P4={} P3={J4, P2={J2, P2={} J5, J7, J9} 1={} Check Operation: J8 J9 Sum (CL+J+P)=100 50 (CL+J+P)=10 (CL+J+P)=80 (CL+J+P)=40 50 J10 Final Results: P1={J10} Scheduled Being Examined Partition Candidate Not Examined 15  
  16. 16. Heuristics/Hints •  Two other heuristics –  Heuristic II adds a job to a sub-workflow if all of its unscheduled children can be added to that sub-workflow. –  For a job with multiple children, Heuristic III adds it to a sub- workflow when all of its children has been scheduled. Figure  2.4    HeurisPc  I,  II,  and  III  (from  leW   to   right)   parPPon   an   example   workflow   into  different  sub-­‐workflows.     16  
  17. 17. Similar step we put J2, J3, The firstto J8, is similar to J6 into P2. Heuristic I that puts J10 into P1 J1 Heuristic II: check unscheduled childrenSearch Operation: J2 J3 J4 J5Candidate List(CL): {J6} J5, J7, J9} {J4,Job to be examined(J): J8 J1 J6 J7Partition (P): P1={J10} P3={J1,J4,J5,J7,J9} P3={} P2={J8, J2,J3, J6} P2={} J6} Check Operation: J8 J9 Sum (CL+J+P)=20 50 (CL+J+P)=50 (CL+J+P)=90 J10 Final Results: P1={J10} P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9} Scheduled Being Examined Partition Candidate Not Examined 17  
  18. 18. Similar to J8,we put J9, J7, J6 The firstto J8, is similar to J3,J4, J5, Similar step we put J2, J1 into P3. Heuristic I that puts J10 into P1 into P2. J1 Heuristic III: all children should be examined Search Operation: J2 J3 J4 J5Candidate List(CL): {J6} {J4}Job to be examined(J): J8 J1 J6 J7Partition(P): P1={J10} P3={J1, P3={} J4,J5, J7, P2={J8, J2,J3, J6} J9} P2={} J6} Check Operation: J8 J9 J1 has a child Non-examined job J4 Sum (CL+J+P)=20 50 and J6 has no Non-examined job J10 Final Results: P1={J10} P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9} Scheduled Being Examined Partition Candidate Not Examined 18  
  19. 19. Genetics Algorithm Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6   1   2   2   1   2   2   2   2   1   1   1   19  
  20. 20. Fitness Functions Makespan CostMin(! + ) Deadline BudgetMin(Makespan), Cost Budget Min(Cost), Makespan Deadline With Constraints 20  
  21. 21. Ant Colony Optimization Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6  Global 1   2   2   1   2   2   2   2   1   1   1  Optimization: Local Job1  Job2  Job3   VM1  VM2  VM3  Optimization: 1   1   1   1   1   1   Job4  Job5   VM4  VM5  VM6   2   2   2   2   2   21  
  22. 22. Scheduling Sub-workflows •  Estimating the overall runtime of sub-workflows –  Critical Path –  Average CPU Time is cumulative CPU time of all jobs divided by the number of available resources. –  Earliest Finish Time is the moment the last sink job completes •  Provisioning resources based on the estimation results •  Scheduling Sub-workflows on Sites 22  
  23. 23. Evaluation: Heuristics •  In this example, we aim to reduce data movement and makespan with storage constraints. •  Workflows used: –  Montage: an astronomy application, I/O intensive, ~24,000 tasks and 58GB data. –  CyberShake: a seismology application, memory intensive, ~10,000 tasks and 57GB data. –  Epigenomics: a bioinformatics application, CPU intensive, ~1,500 tasks and 23GB data. –  Each were run five times. 23  
  24. 24. Performance: CyberShake •  Heuristic II produces 5 sub-workflows with 10 dependencies between them. Heuristic I produces 4 sub-workflows and 3 dependencies. Heuristic III produces 4 sub-workflows and 5 dependencies •  Heuristic II and III simply add a job if it doesn’t violate the storage or cross dependency constraints. •  Heuristic I performs better in terms of both runtime reduction and disk usage because it tends to put the whole fan structure into the same sub-workflow. 24  
  25. 25. Performance: CyberShake •  Storage Constraints •  With more sites and partitions, data movement is increased although computational capability is improved. •  The CyberShake workflow across two sites with a storage constraint of 35GB performs best. 25  
  26. 26. Performance of Estimator and Scheduler •  Three estimators and two schedulers are evaluated with CyberShake workflow. •  The combination of EFT estimator + HEFT scheduler (EFT +HEFT) performs best (10%). •  HEFT scheduler is slightly better than MinMin scheduler with all three estimators. 26  
  27. 27. Publications Integration of Workflow Partitioning and Resource Provisioning, Weiwei Chen, Ewa Deelman,accepted, The 12th IEEE/ACM International Symposium on Cluster, Cloud and GridComputing (CCGrid 2012), Doctoral Symposium, Ottawa, Canada, May 13-15, 2012 Improving Scientific Workflow Performance using Policy Based Data Placement, MuhammadAli Amer, Ann Chervenak and Weiwei Chen, accepted, 2012 IEEE International Symposium onPolicies for Distributed Systems and Networks, Chapel Hill, NC, July 2012 Fault Tolerant Clustering in Scientific Workflows, Weiwei Chen, Ewa Deelman, IEEEInternational Workshop on Scientific Workflows (SWF), accepted, in conjunction with 8th IEEEWorld Congress on Servicess, Honolulu, Hawaii, Jun 2012 Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6thWorkshop on Workflows in Support of Large-Scale Science, in conjunction withSupercomputing 2011, Seattle, Nov 2011 Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, WeiweiChen, Ewa Deelman, 9th International Conference on Parallel Processing and AppliedMathematics (PPAM 2011), Poland, Sep 2011 27  
  28. 28. Future Work •  GA and ACO: Efficiency •  Provisioning Algorithms •  Other Algorithms 28  
  29. 29. QA Thank you! For further info: 29