A science-gateway workload archive application to the self-healing of workflow incidents

371 views

Published on

Presentation held at Mesogrilles 2012 - Paris - France

Abstract. Information about the execution of distributed workload is important for studies in computer science and engineering, but workloads acquired at the infrastructure-level reputably lack information about users and application-level middleware. Meanwhile, workloads acquired at science-gateway level contain detailed information about users, pilot jobs, task sub-steps, bag of tasks and workflow executions. In this work, we present a science-gateway archive, we illustrate its possibilities on a few case studies, and we use it for the autonomic handling of workflow incidents.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
371
On SlideShare
0
From Embeds
0
Number of Embeds
72
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A science-gateway workload archive application to the self-healing of workflow incidents

  1. 1. A science-gateway workload archive application to the self-healing of workflow incidents Rafael FERREIRA DA SILVA, Tristan GLATARD Frédéric DESPREZ University of Lyon, CNRS, INSERM, CREATIS INRIA, University of Lyon, LIP ENS Lyon , Villeurbanne, France Lyon, France Journées Scientifiques Mésocentres et France Grilles October 1st-3rd 20121 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  2. 2. Context: Workload Archives Assumptions validation exit_code task_status useful for submit_time ime t ion_t Computational activity site_name execu modeling inpu t _file id workflow_ activity_name Methods evaluation (simulation or experimental) Information produced by grid workflow executions2 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  3. 3. Science-gateway architecture 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler3 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  4. 4. State of the Art Grid Workload Archives exit_code task_status submit_time time tion_ execu site_name inpu t _file d workflow_i Information gathered activity_name at infrastructure-level tasks Lack of critical information: •  Dependencies among tasks •  Parallel Workloads Archive (http://www.cs.huji.ac.il/labs/parallel/workload/) •  Task sub-steps •  Grid Workloads Archive •  Application-level scheduling artifacts (http://gwa.ewi.tudelft.nl/pmwiki/) •  User4 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  5. 5. At infrastructure-level 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler5 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  6. 6. Outline  A science-gateway workload archive  Case studies   Pilot Jobs   Accounting   Task analysis   Bag of tasks  Workflow Self-Healing  Conclusions6 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  7. 7. Our approach Science-Gateway Workload Archive exit_code task_status submit_time time tion_ execu site_name inpu t _file d Information gathered workflow_i activity_name at science-gateway level Advantages: workflow executions •  Fine-grained information about tasks •  Dependencies among tasks •  Workflow characterization •  Accounting7 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  8. 8. At science-gateway level 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler8 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  9. 9. Virtual Imaging Platform  Virtual Imaging Platform (VIP)   Medical imaging science-gateway   Grid of 129 sites (EGI – http://www.egi.eu) Applications  Significant usage   Registered users: 244 from 26 countries   Applications: 18 File transfer   Consumed 32 CPU years in 2011 VIP – http://vip.creatis.insa-lyon.fr VIP usage in 2011: CPU consumption of VIP and related platforms on EGI.9 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  10. 10. SGWA  Science Gateway Workload Archive (SGWA)   Archive is extracted from VIP Science-gateway archive model Task, Site and Workflow Execution File and Pilot Job extracted from acquired from databases populated the parsing of task standard by the workflow engine at runtime output and error files10 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  11. 11. Workload for Case Studies  Based on the workload of VIP   January 2011 to April 2012 338,989 completed 138,480 error 105,488 aborted 15,576 aborted replicas 48,293 stalled 34,162 queued 112 users 2,941 workflow executions 680,988 tasks 339,545 pilot jobs11 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  12. 12. Pilot Jobs  A single pilot can wrap several tasks and users 282331 250000 200000  At infrastructure-level 150000 Frequency 100000   Assimilates pilot jobs to tasks and 50000 28121 users 11885 6721 10487   Valid for only 62% of the tasks 0 1 2 3 4 5 Tasks per pilot   Valid for 95% of user-task associations 323214 300000 250000 200000 150000 Frequency  At science-gateway level 100000 50000   Users and tasks are correctly 15178 associated to pilots 1079 70 4 0 1 2 3 4 5 Users per pilot12 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  13. 13. Accounting: Users  Authentications based on login and password are mapped to X.509 robot certificates  At infrastructure-level   All VIP users are reported as a single user  At science-gateway level   Maps task executions to VIP users 40 30 Users EGI 20 VIP 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Months Number of reported EGI and VIP users13 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  14. 14. Accounting: CPU and Wall-clock Time  Huge discrepancy of values 6e+05 VIP jobs   Pilot jobs do not register to Number of jobs 5e+05 EGI jobs the pilot system 4e+05 3e+05   Absence of workload 2e+05 1e+05   Outputs unretrievable 5 10 15 Month   Pilot setup time Number of submitted pilot jobs by EGI and VIP   Lost tasks (a.k.a. stalled) 150 VIP CPU time VIP Wall−clock time 100  Undetectable at infrastructure-level EGI CPU time Years EGI Wall−clock time 50 5 10 15 Month Consumed CPU and wall-clock time by EGI and VIP14 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  15. 15. Task Analysis  At infrastructure-level   Limited to task exit codes 55165 50925 50000 48293 Number of tasks 40000 30000  At science-gateway level 20000 19463   Fine-grained information 10000 1123 0   Steps in task life application input stalled Error causes output folder   Error causes   Replicas per task 1.0 0.8 download execution 0.6 upload CDF 0.4 0.2 1 100 10000 Time(s) Different steps in task life15 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  16. 16. Bag of Tasks: at Infrastructure level  Evaluation of the accuracy of Iosup et al.[8] method to detect bag of tasks (BoT) Task 1 Task 2  Two successively submitted tasks are in the same BoT if Δ1,2 Δ2,3 Task 3 the time interval between submission times is lower t1 t2 t3 time or equal to Δ. Δ Δ BoT 1 BoT 2 Task 1 Δ1,2 ≤Δ Task 3 Δ2,3 >Δ |t1 – t2|≤Δ |t2 – t3|>Δ Task 216 [8] Iosup, A., Jan, M., Sonmez, O., Epema, D.: The Characteristics and performance of groups of jobs in grids. In: Euro-Par. (2007) 382-393 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  17. 17. Bag of Tasks: Size and Duration Infrastructure vs science-gateway  90% of Batch BoTs size ranges 0.8 from 2 to 10 while it represents 0.6 CDF 50% of Real Batch 0.4 0.2 Real Batch Batch 0.0 200 400 600 800 1000 Size (number of tasks) 0.8  Non-Batch duration is 0.6 overestimated up to 400% CDF Real Batch 0.4 Real Non−Batch 0.2 Batch Non−Batch 0.0 10000 20000 30000 40000 50000 Duration (s) Real Batch = ground-truth BoT Real Non-Batch = ground-truth non-BoT Batch = Iosup et al. BoT Non-Batch = Iosup et al. non-BoT17 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  18. 18. Bag of Tasks: Inter-arrival Time and Consumed CPU Time  Batch and Non-Batch inter-arrival 0.8 times are underestimated by 0.6 CDF about 30% 0.4 Real Batch Real Non−Batch 0.2 Batch Non−Batch 0.0 2000 4000 6000 8000 10000 Inter−Arrival Time (s) 0.8  CPU times are underestimated of 0.6 25% for Non-Batch and of about CDF 20% for Batch Real Batch 0.4 Real Non−Batch 0.2 Batch Non−Batch 0 5000 10000 15000 20000 25000 30000 Consumed CPUTime (KCPUs) Real Batch = ground-truth BoT Real Non-Batch = ground-truth non-BoT Batch = Iosup et al. BoT Non-Batch = Iosup et al. non-BoT18 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  19. 19. Outline  A science-gateway workload archive  Case studies   Pilot Jobs   Accounting   Task analysis   Bag of tasks  Workflow Self-Healing  Conclusions19 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  20. 20. Workflow Self-Healing  Problem: costly manual operations   Rescheduling tasks, restarting services, killing misbehaving experiments or replicating data files  Objective: automated platform administration   Autonomous detection of operational incidents   Perform appropriate set of actions  Assumptions: online and non-clairvoyant   Only partial information available   Decisions must be fast   Production conditions, no user activity and workloads prediction20 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  21. 21. General MAPE-K loop event Incident 1 Incident 2 Incident 3(job completion and failures) degree η = 0.8 degree η = 0.4 degree η = 0.1 or timeout level level level level level level level level level 1 2 3 1 2 3 1 2 3 Monitoring Analysis 0.07 Monitoring data x2 ηi 15000 Frequency 0.30 = n ∑ ηj 0 5000 Set of Actions 0.61 j =1 0.0 0.2 0.4 0.6 0.8 1.0 Estimation by Median !b Execution Knowledge Roulette wheel selection € Planning Rule Confidence (ρ) ρxη Selected 0.37 2 1 0.8 0.32 Selected Incident 2 0.66 31 0.2 0.02 Incident 1 1  1 1.0 0.80 0.16 Roulette wheel selection Association rules based on association rules for incident 121 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  22. 22. Incident: Activity Blocked  An invocation is late compared to the others FIELD-II/pasa - workflow-9SIeNv 80 100 Completed Jobs 60 40 20 0 0.0e+00 4.0e+06 8.0e+06 1.2e+07 Time (s) Invocations completion rate for a simulation Job flow for a simulation  Possible causes   Longer waiting times   Lost tasks (e.g. killed by site due to quota violation)   Resources with poor performance22 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  23. 23. Activity blocked: degree   Degree computed from all completed jobs of the activity   Job phases: setup  inputs download  execution  outputs upload   Assumption: bag-of-tasks (all jobs have equal durations)   Median-based estimation: Median duration Estimated job Real job of jobs phases duration duration 50s 42s 42s completed 250s 300s 300s 400s 400s* 20s current 15s 15s ? Mi = 715s Ei = 757s *: max(400s, 20s) = 400s   Incident degree: job performance w.r.t median Ei d= ∈ [0,1] Mi + Ei23 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr€
  24. 24. Activity blocked: levels and actions  Levels: identified from the platform logs τ1 Level 1 Level 2 15000 (no actions) Frequency € action: replicate jobs 0 5000 0.0 0.2 0.4 0.6 0.8 1.0 d Estimation by Median !b Replication process for one task  Actions   Job replication   Cancel replicas with bad performance   Replicate only if all active replicas are running24 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  25. 25. Experiments  Goal: Self-Healing vs No-Healing   Cope with recoverable errors  Metrics   Makespan of the activity execution   Resource waste (CPU + data) self −healing w= −1 (CPU + data) no−healing   For w < 0: self-healing consumed less resources € For w > 0: self-healing wasted resources  25 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  26. 26. Experiment Conditions  Software   Virtual Imaging Platform   MOTEUR workflow engine   DIRAC pilot job system  Infrastructure   European Grid Infrastructure (EGI): production, shared   Self-Healing and No-Healing launched simultaneously  Experiment parameters   Task and file replication limited to 5   Failed task resubmission limited to 526 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  27. 27. Applications FIELD-II/pasa Mean-Shift/hs3 •  Ultrasound imaging •  Image denoising simulation •  250 invocations •  122 invocations •  CPU Time: 1 hour •  CPU Time: 15 min •  ~182 MB •  ~210 MB •  CPU-intensive •  Data-intensive Image courtesy of ANR project US-Tagging Image courtesy of Ting Li http://www.creatis.insa-lyon.fr/us-tagging/news http://www.creatis.insa-lyon.fr O. Bernard, M. Alessandrini27 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  28. 28. Results  Experiment: tests if recoverable errors are detected FIELD-II/pasa Mean-Shift/hs3 12000 20000 10000 Makespan (s) Makespan (s) 8000 15000 No−Healing No−Healing 6000 Self−Healing 10000 Self−Healing 4000 5000 2000 0 0 1 2 3 4 5 1 2 3 4 5 Repetitions Repetitions speeds up execution up to 4 speeds up execution up to 2.6 Repetition w Repetition w 1 –0.10 1 –0.02 Self-Healing process reduced resource 2 –0.15 2 –0.20 consumption up to 26% when compared 3 –0.09 to the No-Healing execution 3 –0.02 4 0.05 4 –0.02 5 –0.26 5 –0.0128 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  29. 29. Conclusions  Science-gateway model of workload archive   Illustration by using traces of the VIP from 2011/2012  Added value when compared to infrastructure-level traces   Exactly identify tasks and users   Distinguishes additional workload artifacts from real workload   Fine-grained information about tasks   Ground-truth of bag of tasks  Self-healing of worklfow incidents   Implements a generic MAPE-K loop   Incident degrees computed online   Speeds up execution up to a factor of 4   Reduced resource consumption up to 26%   Successfull example of self-healing loop deployed in production  VIP is openly available at http://vip.creatis.insa-lyon.fr  Traces are available to the community in the Grid Observatory: http://www.grid-observatory.org29 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  30. 30. A science-gateway workload archive application to the self-healing of workflow incidents Thank you for your attention. Questions? ACKNOWLEDGMENTS VIP users and project members French National Agency for Research (ANR-09-COSI-03) European Grid Initiative (EGI) France-Grilles Rafael FERREIRA DA SILVA, Tristan GLATARD Frédéric DESPREZ University of Lyon, CNRS, INSERM, CREATIS INRIA, University of Lyon, LIP ENS Lyon , Villeurbanne, France Lyon, France30 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr

×