SlideShare a Scribd company logo
1
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Workflow Fairness Control
on Online and Non-Clairvoyant
Distributed Computing Platforms
Rafael FERREIRA DA SILVA, Tristan GLATARD
University of Lyon, CNRS, INSERM, CREATIS
Villeurbanne, France
Frédéric DESPREZ
INRIA, University of Lyon, LIP, ENS Lyon
Lyon, France
Euro-Par 2013
August 26-30, 2013
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Fairness among workflow executions
  Self-healing of workflow executions on grids
  Fairness control process
  Experiments and results
  Conclusion
2
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Fairness among workflow executions
  Self-healing of workflow executions on grids
  Fairness control process
  Experiments and results
  Conclusion
3
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Context
  Virtual Imaging Platform (VIP)
  Medical imaging science-gateway
  Grid of ~180 sites (EGI – http://www.egi.eu)
  Significant usage
  452 registered users from 50 countries
  Consumed 472 CPU years from
August 2012 to July 2013
http://dirac.france-grilles.fr
4
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP consumption since August 2012
Workflow Execution
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
2. User launches
a simulation
3. MOTEUR generates
invocations
4. GASW generates
grid jobs
5. Jobs are submitted
to DIRAC
6. Pilot jobs are
submitted to EGI
1. Input data
upload
7. Pilot jobs
fetch grid jobs
8. Inputs download
10. Results upload
11. Download results
9. Execution
5
  Under resource contention workflows are unequally slowed down
by concurrent executions
Fairness among workflow executions
6
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
3 identical workflows
submitted sequentially
(ti,j = 10s)
t2,2
t2,3
t3,1
t2,4
t2,1
t1,2
t1,1
t1,3
t1,4
t3,2
t3,3
t3,4
t1,5 t3,5t2,5
time
R1
R2
R3
Resources
t1,1 t1,4
t1,5t1,2
t1,3 t2,1
t2,2
t2,3
t2,4
t2,5
t3,1
t3,2
t3,3
t3,4
t3,5
0 10 20 30 40
€
slowdown(s) =
Mmulti
Mown
€
s1 =
20
20
=1.0
€
s2 =
40
20
= 2.0
€
s3 =
50
20
= 2.5
Identical workflow executions do not
experience the same slowdown
Makespan with
concurrent executions
Makespan without
concurrent executions
  Under resource contention workflows are unequally slowed down
by concurrent executions
Fairness among workflow executions
7
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Very short workflow
(t = 2s)
t3,1
t3,2
t3,3
t3,4
t3,5
time
R1
R2
R3
Resources
t1,1 t1,4
t1,5t1,2
t1,3 t2,1
t2,2
t2,3
t2,4
t2,5
0 10 20 30 40
2 identical workflows
submitted sequentially
(ti,j = 10s)
t1,2
t1,1
t1,3
t1,4
t1,5
t2,2
t2,3
t2,4
t2,1
t2,5
t3,1
t3,2
t3,3
t3,4
t3,5
€
slowdown(s) =
Mmulti
Mown
€
s1 =
20
20
=1.0
€
s2 =
40
20
= 2.0
€
s3 =
36
6
= 6.0
Very short workflow
executions are
extremely slowed
down
Workflow Self-Healing
8
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Problem: costly manual operations
  Rescheduling tasks, restarting services or replicating data files
  In this work: fairly allocating computing resources among
workflow executions
  Objective: automated platform administration
  Autonomous detection of unfairness among workflow executions
  Perform appropriate set of actions
  Assumptions: online and non-clairvoyant
  Only partial information available
  Decisions must be fast
  Production conditions, no user activity and workloads prediction
General MAPE-K loop
9
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Incident 1
degree η = 0.8
Incident 2
degree η = 0.4
Incident 3
degree η = 0.1
level
1
level
2
level
3
Roulette wheel selection
Incident 1
Selected
Rule Confidence (ρ) ρxη
2 1 0.8 0.32
3  1 0.2 0.02
1  1	

 1.0 0.80
Association rules
for incident 1
Incident 2
Selected
Roulette wheel selection
based on association rules
Set of Actions
x2
level
1
level
2
level
3
level
1
level
2
level
3
€
=
ηi
ηjj=1
n
∑
event
(job completion and failures)
or
timeout
Monitoring Analysis
Execution Knowledge
Planning
Monitoring data
R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents
on distributed computing infrastructures, Future Generation Computer Systems
(FGCS), in press, 2013.
  Incident degrees are quantified in discrete incident levels
  Thresholds are determined from visual mode clustering
or K-means
Incident Levels and Actions
10
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
No actions are triggered Triggers a set of actions
Thresholds cluster platform
configurations into groups
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Fairness among workflow executions
  Self-healing of workflow executions on grids
  Fairness control process
  Experiments and results
  Conclusion
11
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Unfairness degree
where:
Fairness control: degree
12
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
ηu = Wmax −Wmin
€
Wi = max j∈[1,ni ]
Qi, j
Qi, j + Ri, j ⋅ Pi, j
⋅ Ti, j
⎧
⎨
⎩
⎫
⎬
⎭
i = activity, ni = active activities
Qi,j = number of waiting tasks
Ri,j = number of running tasks
€
Ti, j =
t
~
i, j
maxv∈[1,m],w∈[1,ni
*
]
(t
~
v,w )
Relative observed duration
€
Pi, j = 2⋅ 1− maxu∈[1,k j ]
tu
t
~
i, j + tu
⎧
⎨
⎪
⎩⎪
⎫
⎬
⎪
⎭⎪
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
Performance
Median task phase durations
Max difference between the
fractions of pending work
A low Pi,j indicates that resources
allocated to the activity have bad
performance for the activity
Fairness control: task estimation
  Estimation of task durations
  Job phases: setup  inputs download  execution  outputs upload
  Assumption: bag of tasks (all jobs have equal durations)
  Median-based estimation:
13
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Median duration
of jobs phases
Real job
duration
42s
300s
20s
?
42s
300s
400s*
15s
Estimated job
duration
50s
250s
400s
15s
completed
current
*: max(400s, 20s) = 400s
€
t
~
= 715s
€
ti, j = 757s
  Levels: identified from the platform logs
  Actions
  Task prioritization
  Task priority is an integer initialized to 1
  Increase priority of Δi,j tasks:
Fairness control: levels and actions
14
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
τuLevel 1
(no actions)
Level 2
(action: task prioritization)
Δi, j = Qi, j −
(τu +Wmin )(Qi, j + Ri, j Pi, j )
Ti, j
⎢
⎣
⎢
⎥
⎦
⎥
Workload for Case Studies
  Based on the workload of VIP
  January 2011 to April 2012
  Case Studies on:
  Pilot Jobs
  User accounting
  Task analysis
  Bag of tasks
  Workflows
112 users 2,941 workflow executions 680,988 tasks
338,989 completed
138,480 error
105,488 aborted
15,576 aborted replicas
48,293 stalled
34,162 queued
339,545 pilot jobs
15
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, 	

user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM 	

Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Fairness among workflow executions
  Self-healing of workflow executions on grids
  Fairness control process
  Experiments and results
  Conclusion
16
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Experiment 1
  Tests whether unfairness among identical workflows is properly addressed
  Experiment 2
  Tests whether the performance of very short workflow executions is
improved by the fairness mechanism
  Experiment 3
  Tests whether unfairness among different workflows is detected and
properly handled
  Workflows characteristics
Experiment Conditions
17
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
The experiments are performed in the Virtual Imaging Platform
Experiments: metrics
18
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Unfairness
  Is the area under the curve ηu during the execution:
  Slowdown
where:
€
s =
Mmulti
Mown
€
µ = ηu(ti)⋅ (ti − ti−1)
i=2
M
∑
€
Mown = maxp∈Ω tu
u∈p
∑
This metric measures if the fairness process
can indeed minimize its own criterion ηu
19
Results: identical workflows
19
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
makespans and unfairness degree values are significantly reduced
reduced σm up to a factor of 15, σs up to a factor of 7, and µ by about 2
20
Results: very short workflows
20
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
makespans of very short workflow executions are significantly reduced
reduced σs up to a factor of 5.9, and µ up to a factor 1.9
21
Results: very short workflows (2)
21
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Speeds up executions up to a factor of 2.9, reduces task average
waiting time up to a factor of 4.4 and slowdown up to a factor of 5.9
22
Results: different workflows
22
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
reduced σs up to a factor of 3.8, and µ up to a factor 1.9
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Fairness among workflow executions
  Self-healing of workflow executions on grids
  Fairness control process
  Experiments and results
  Conclusion
23
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Concluding remarks
24
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Context
  Autonomous handling of unfairness among workflow executions
  No strong assumptions on resource characteristics and workload
  Summary of the proposed method
  Implements a generic MAPE-K loop
  Quantifies unfairness based on the fraction of pending work:
  Ratio of queuing tasks, relative durations, and performance
  Controlling fairness among workflow executions
  Properly detects and handles unfairness among workflow executions
  Significantly reduced the standard deviation of the slowdown and
unfairness metric for:
  Identical workflows
  Very short workflow execution
  Different workflows
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Thank you for your attention.
Questions?
Rafael FERREIRA DA SILVA, Tristan GLATARD
University of Lyon, CNRS, INSERM, CREATIS
Villeurbanne, France
Frédéric DESPREZ
INRIA, University of Lyon, LIP, ENS Lyon
Lyon, France
Workflow Fairness Control on Online and
Non-Clairvoyant Distributed Computing Platforms
Acknowledgments:
VIP users and project members
French National Agency for Research (ANR-09-COSI-03, ANR-11-LABX-0063)
EC FP7 Programme (312579 ER-flow)
European Grid Initiative (EGI)
France-Grilles

More Related Content

Viewers also liked

Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Pooyan Jamshidi
 
Cloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
Cloud Server - brz, siguran i povoljan virtualni server u HrvatskojCloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
Cloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
Hrvatski Telekom
 
Linux Cloud Server Performance Benchmarks
Linux Cloud Server Performance BenchmarksLinux Cloud Server Performance Benchmarks
Linux Cloud Server Performance Benchmarks
NephoScale
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
Zbigniew Jerzak
 
Pre-Con Ed: Configuring CA Workload Automation AE for optimal results
Pre-Con Ed: Configuring CA Workload Automation AE for optimal resultsPre-Con Ed: Configuring CA Workload Automation AE for optimal results
Pre-Con Ed: Configuring CA Workload Automation AE for optimal results
CA Technologies
 
Pre-Con Ed: CA Workload Automation AE: Tips and Tricks
Pre-Con Ed: CA Workload Automation AE: Tips and TricksPre-Con Ed: CA Workload Automation AE: Tips and Tricks
Pre-Con Ed: CA Workload Automation AE: Tips and Tricks
CA Technologies
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined Storage
Aidan Finn
 
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage MattersRed Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red_Hat_Storage
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
Ashita Agrawal
 
Software-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief IntroductionSoftware-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief Introduction
Jason TC HOU (侯宗成)
 

Viewers also liked (10)

Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
 
Cloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
Cloud Server - brz, siguran i povoljan virtualni server u HrvatskojCloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
Cloud Server - brz, siguran i povoljan virtualni server u Hrvatskoj
 
Linux Cloud Server Performance Benchmarks
Linux Cloud Server Performance BenchmarksLinux Cloud Server Performance Benchmarks
Linux Cloud Server Performance Benchmarks
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Pre-Con Ed: Configuring CA Workload Automation AE for optimal results
Pre-Con Ed: Configuring CA Workload Automation AE for optimal resultsPre-Con Ed: Configuring CA Workload Automation AE for optimal results
Pre-Con Ed: Configuring CA Workload Automation AE for optimal results
 
Pre-Con Ed: CA Workload Automation AE: Tips and Tricks
Pre-Con Ed: CA Workload Automation AE: Tips and TricksPre-Con Ed: CA Workload Automation AE: Tips and Tricks
Pre-Con Ed: CA Workload Automation AE: Tips and Tricks
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined Storage
 
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage MattersRed Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage Matters
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
 
Software-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief IntroductionSoftware-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief Introduction
 

Similar to Workflow fairness control on online and non-clairvoyant distributed computing platforms

A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Rafael Ferreira da Silva
 
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
Ptidej Team
 
Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance Systems
Bahram Lavi
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
George Markomanolis
 
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
GSE Systems, Inc.
 
Real life test; real life situations
Real life test; real life situationsReal life test; real life situations
Real life test; real life situations
Andre Verschelling
 
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration EnvironmentValidating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
streamspotter
 
Software metrics
Software metricsSoftware metrics
Software metrics
Dr. C.V. Suresh Babu
 
IRJET- Next Generation System Assistant
IRJET- Next Generation System AssistantIRJET- Next Generation System Assistant
IRJET- Next Generation System Assistant
IRJET Journal
 
Innoslate 4.5 and Sopatra
Innoslate 4.5 and SopatraInnoslate 4.5 and Sopatra
Innoslate 4.5 and Sopatra
Elizabeth Steiner
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
Po-Ting Wu
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
swy351
 
Adaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service CallsAdaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service Calls
Sabesan Manivasakan
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
Dongmin Choi
 
Gluecon 2013 Keynote Ravello Systems
Gluecon 2013 Keynote Ravello SystemsGluecon 2013 Keynote Ravello Systems
Gluecon 2013 Keynote Ravello Systems
navinthadani
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?
Roberto Franchini
 
The SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationThe SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and Computation
Jean-Jacques Dubray
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
recsysfr
 
Ikc 2015
Ikc 2015Ikc 2015

Similar to Workflow fairness control on online and non-clairvoyant distributed computing platforms (20)

A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
 
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
 
Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance Systems
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
From Fundamentals to New Nuclear Plants: Interactive 3D is the Most Effective...
 
Real life test; real life situations
Real life test; real life situationsReal life test; real life situations
Real life test; real life situations
 
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration EnvironmentValidating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
IRJET- Next Generation System Assistant
IRJET- Next Generation System AssistantIRJET- Next Generation System Assistant
IRJET- Next Generation System Assistant
 
Innoslate 4.5 and Sopatra
Innoslate 4.5 and SopatraInnoslate 4.5 and Sopatra
Innoslate 4.5 and Sopatra
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
Adaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service CallsAdaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service Calls
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
Gluecon 2013 Keynote Ravello Systems
Gluecon 2013 Keynote Ravello SystemsGluecon 2013 Keynote Ravello Systems
Gluecon 2013 Keynote Ravello Systems
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?
 
The SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationThe SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and Computation
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 

More from Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
Rafael Ferreira da Silva
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Rafael Ferreira da Silva
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Rafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
Rafael Ferreira da Silva
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
Rafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
Rafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Rafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
Rafael Ferreira da Silva
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
Rafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Rafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Rafael Ferreira da Silva
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Rafael Ferreira da Silva
 

More from Rafael Ferreira da Silva (20)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 

Workflow fairness control on online and non-clairvoyant distributed computing platforms

  • 1. 1 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Workflow Fairness Control on Online and Non-Clairvoyant Distributed Computing Platforms Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France Euro-Par 2013 August 26-30, 2013
  • 2. Outline   Context   The Virtual Imaging Platform   Problem definition   Fairness among workflow executions   Self-healing of workflow executions on grids   Fairness control process   Experiments and results   Conclusion 2 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 3. Outline   Context   The Virtual Imaging Platform   Problem definition   Fairness among workflow executions   Self-healing of workflow executions on grids   Fairness control process   Experiments and results   Conclusion 3 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 4. Context   Virtual Imaging Platform (VIP)   Medical imaging science-gateway   Grid of ~180 sites (EGI – http://www.egi.eu)   Significant usage   452 registered users from 50 countries   Consumed 472 CPU years from August 2012 to July 2013 http://dirac.france-grilles.fr 4 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr VIP consumption since August 2012
  • 5. Workflow Execution Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr 2. User launches a simulation 3. MOTEUR generates invocations 4. GASW generates grid jobs 5. Jobs are submitted to DIRAC 6. Pilot jobs are submitted to EGI 1. Input data upload 7. Pilot jobs fetch grid jobs 8. Inputs download 10. Results upload 11. Download results 9. Execution 5
  • 6.   Under resource contention workflows are unequally slowed down by concurrent executions Fairness among workflow executions 6 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr 3 identical workflows submitted sequentially (ti,j = 10s) t2,2 t2,3 t3,1 t2,4 t2,1 t1,2 t1,1 t1,3 t1,4 t3,2 t3,3 t3,4 t1,5 t3,5t2,5 time R1 R2 R3 Resources t1,1 t1,4 t1,5t1,2 t1,3 t2,1 t2,2 t2,3 t2,4 t2,5 t3,1 t3,2 t3,3 t3,4 t3,5 0 10 20 30 40 € slowdown(s) = Mmulti Mown € s1 = 20 20 =1.0 € s2 = 40 20 = 2.0 € s3 = 50 20 = 2.5 Identical workflow executions do not experience the same slowdown Makespan with concurrent executions Makespan without concurrent executions
  • 7.   Under resource contention workflows are unequally slowed down by concurrent executions Fairness among workflow executions 7 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Very short workflow (t = 2s) t3,1 t3,2 t3,3 t3,4 t3,5 time R1 R2 R3 Resources t1,1 t1,4 t1,5t1,2 t1,3 t2,1 t2,2 t2,3 t2,4 t2,5 0 10 20 30 40 2 identical workflows submitted sequentially (ti,j = 10s) t1,2 t1,1 t1,3 t1,4 t1,5 t2,2 t2,3 t2,4 t2,1 t2,5 t3,1 t3,2 t3,3 t3,4 t3,5 € slowdown(s) = Mmulti Mown € s1 = 20 20 =1.0 € s2 = 40 20 = 2.0 € s3 = 36 6 = 6.0 Very short workflow executions are extremely slowed down
  • 8. Workflow Self-Healing 8 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Problem: costly manual operations   Rescheduling tasks, restarting services or replicating data files   In this work: fairly allocating computing resources among workflow executions   Objective: automated platform administration   Autonomous detection of unfairness among workflow executions   Perform appropriate set of actions   Assumptions: online and non-clairvoyant   Only partial information available   Decisions must be fast   Production conditions, no user activity and workloads prediction
  • 9. General MAPE-K loop 9 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Incident 1 degree η = 0.8 Incident 2 degree η = 0.4 Incident 3 degree η = 0.1 level 1 level 2 level 3 Roulette wheel selection Incident 1 Selected Rule Confidence (ρ) ρxη 2 1 0.8 0.32 3  1 0.2 0.02 1  1 1.0 0.80 Association rules for incident 1 Incident 2 Selected Roulette wheel selection based on association rules Set of Actions x2 level 1 level 2 level 3 level 1 level 2 level 3 € = ηi ηjj=1 n ∑ event (job completion and failures) or timeout Monitoring Analysis Execution Knowledge Planning Monitoring data R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents on distributed computing infrastructures, Future Generation Computer Systems (FGCS), in press, 2013.
  • 10.   Incident degrees are quantified in discrete incident levels   Thresholds are determined from visual mode clustering or K-means Incident Levels and Actions 10 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr No actions are triggered Triggers a set of actions Thresholds cluster platform configurations into groups
  • 11. Outline   Context   The Virtual Imaging Platform   Problem definition   Fairness among workflow executions   Self-healing of workflow executions on grids   Fairness control process   Experiments and results   Conclusion 11 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 12.   Unfairness degree where: Fairness control: degree 12 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr € ηu = Wmax −Wmin € Wi = max j∈[1,ni ] Qi, j Qi, j + Ri, j ⋅ Pi, j ⋅ Ti, j ⎧ ⎨ ⎩ ⎫ ⎬ ⎭ i = activity, ni = active activities Qi,j = number of waiting tasks Ri,j = number of running tasks € Ti, j = t ~ i, j maxv∈[1,m],w∈[1,ni * ] (t ~ v,w ) Relative observed duration € Pi, j = 2⋅ 1− maxu∈[1,k j ] tu t ~ i, j + tu ⎧ ⎨ ⎪ ⎩⎪ ⎫ ⎬ ⎪ ⎭⎪ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ Performance Median task phase durations Max difference between the fractions of pending work A low Pi,j indicates that resources allocated to the activity have bad performance for the activity
  • 13. Fairness control: task estimation   Estimation of task durations   Job phases: setup  inputs download  execution  outputs upload   Assumption: bag of tasks (all jobs have equal durations)   Median-based estimation: 13 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Median duration of jobs phases Real job duration 42s 300s 20s ? 42s 300s 400s* 15s Estimated job duration 50s 250s 400s 15s completed current *: max(400s, 20s) = 400s € t ~ = 715s € ti, j = 757s
  • 14.   Levels: identified from the platform logs   Actions   Task prioritization   Task priority is an integer initialized to 1   Increase priority of Δi,j tasks: Fairness control: levels and actions 14 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr € τuLevel 1 (no actions) Level 2 (action: task prioritization) Δi, j = Qi, j − (τu +Wmin )(Qi, j + Ri, j Pi, j ) Ti, j ⎢ ⎣ ⎢ ⎥ ⎦ ⎥
  • 15. Workload for Case Studies   Based on the workload of VIP   January 2011 to April 2012   Case Studies on:   Pilot Jobs   User accounting   Task analysis   Bag of tasks   Workflows 112 users 2,941 workflow executions 680,988 tasks 338,989 completed 138,480 error 105,488 aborted 15,576 aborted replicas 48,293 stalled 34,162 queued 339,545 pilot jobs 15 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
  • 16. Outline   Context   The Virtual Imaging Platform   Problem definition   Fairness among workflow executions   Self-healing of workflow executions on grids   Fairness control process   Experiments and results   Conclusion 16 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 17.   Experiment 1   Tests whether unfairness among identical workflows is properly addressed   Experiment 2   Tests whether the performance of very short workflow executions is improved by the fairness mechanism   Experiment 3   Tests whether unfairness among different workflows is detected and properly handled   Workflows characteristics Experiment Conditions 17 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr The experiments are performed in the Virtual Imaging Platform
  • 18. Experiments: metrics 18 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Unfairness   Is the area under the curve ηu during the execution:   Slowdown where: € s = Mmulti Mown € µ = ηu(ti)⋅ (ti − ti−1) i=2 M ∑ € Mown = maxp∈Ω tu u∈p ∑ This metric measures if the fairness process can indeed minimize its own criterion ηu
  • 19. 19 Results: identical workflows 19 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr makespans and unfairness degree values are significantly reduced reduced σm up to a factor of 15, σs up to a factor of 7, and µ by about 2
  • 20. 20 Results: very short workflows 20 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr makespans of very short workflow executions are significantly reduced reduced σs up to a factor of 5.9, and µ up to a factor 1.9
  • 21. 21 Results: very short workflows (2) 21 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Speeds up executions up to a factor of 2.9, reduces task average waiting time up to a factor of 4.4 and slowdown up to a factor of 5.9
  • 22. 22 Results: different workflows 22 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr reduced σs up to a factor of 3.8, and µ up to a factor 1.9
  • 23. Outline   Context   The Virtual Imaging Platform   Problem definition   Fairness among workflow executions   Self-healing of workflow executions on grids   Fairness control process   Experiments and results   Conclusion 23 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 24. Concluding remarks 24 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Context   Autonomous handling of unfairness among workflow executions   No strong assumptions on resource characteristics and workload   Summary of the proposed method   Implements a generic MAPE-K loop   Quantifies unfairness based on the fraction of pending work:   Ratio of queuing tasks, relative durations, and performance   Controlling fairness among workflow executions   Properly detects and handles unfairness among workflow executions   Significantly reduced the standard deviation of the slowdown and unfairness metric for:   Identical workflows   Very short workflow execution   Different workflows
  • 25. Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Thank you for your attention. Questions? Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France Workflow Fairness Control on Online and Non-Clairvoyant Distributed Computing Platforms Acknowledgments: VIP users and project members French National Agency for Research (ANR-09-COSI-03, ANR-11-LABX-0063) EC FP7 Programme (312579 ER-flow) European Grid Initiative (EGI) France-Grilles