SlideShare a Scribd company logo
SpeQuloS: A QoS Service for BoT Applications Using
Best Effort Distributed Computing Infrastructures
Simon Delamare 1
Gilles Fedak 2
Derrick Kondo 3
Oleg Lodygensky 4
1
LIP/CNRS, Univ. Lyon, France
2
LIP/INRIA, Univ. Lyon, France
3
LIG/INRIA, Univ. Grenoble, France
4
LAL/CNRS, Univ. Paris XI, France
High-Performance Parallel and Distributed Computing, 2012
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 1 / 18
Introduction
BE-DCI = “Best-Effort” Distributed Computing Infrastructure
→ Large computing power at low cost, Avoid wasting resources
→ No availability guarantee
Desktop Grids
→ BOINC projects: Peta FLOPS for free
Grids used in Best-Effort mode
→ ≈ 40% of utilization in Grid5000@Lyon
Cloud “Spot” Instances
→ c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular)
Relevant for BoT execution ...
Bag of Tasks: Set of independent tasks to compute
→ but Low QoS level
Especially compared to regular infrastructures
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 2 / 18
Performance Problem Addressed
BoT completion rate increases at the end of execution
→ Tail Effect
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100
BoTcompletionratio
Time
Continuation is performed
at 90% of completion
Ideal Time Actual Completion Time
Tail Duration
Slowdown = (Tail Duration + Ideal Time) / Ideal Time
BoT completion
Tail part of the BoT
Measured by Slowdown:
S =
IdealCompletionTime
RealCompletionTime
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 3 / 18
Slowdown by Tail Effect
Slowdown reported on BoT execution
0
0.2
0.4
0.6
0.8
1
0.1 1 10 100
Fractionofexecutionwheretailslowdown<S
Tail Slowdown S (Completion time observed divided by ideal completion time)
BOINC
XWHEP
Best 50% ⇒ S < 1.3
25% to 33% ⇒ S > 2
Worst 5% ⇒ S> 4 to 10
Avg. % of BoT in tail Avg. % of time in tail
BE-DCI Trace BOINC XWHEP BOINC XWHEP
Desktop Grids 4.65 5.11 51.8 45.2
Best Effort Grids 3.74 6.40 27.4 16.5
Spot Instances 2.94 5.19 22.7 21.6
→ Caused by no more than the last 7% of
BoT
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 4 / 18
How to improve the situation ?
Better scheduling
QoS in Grid scheduling ([12], [20], [38])
→ Require heavy modification of middleware
→ No satisfactory solution for unreliable infrastructure ([7])
Addressing the tail effect
→ e.g. in MapReduce ([3], [39]), but require precise information from compute
nodes, hard in large DCIs.
Building Hybrid DCIs
Grid & Desktop Grid ([35],[36])
→ Mostly to offload Grid usage
Using Cloud computing ([10],[28],[37])
→ To address peak demands
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 5 / 18
SpeQuloS Service
→ Improving BE-DCIs users perceived QoS
Speeding up BoT execution
Bring information on expected BoT execution time
By dynamic provision of Cloud resources
→ Monitoring BoT execution
→ Execute the tail on Cloud
Features:
1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes
2 Interface with users: QoS requests, State of completion, Prediction on
remaining time
3 Careful utilization of Cloud resources w/ Billing & Accounting of usage
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 6 / 18
Framework
SpeQuloS modules:
Information: Collect QoS-related
information from DGs
Oracle: Strategies to appropriately
use Cloud resources / QoS
prediction for users
Scheduler: Start/Stop Cloud
resources, usage accounting
Credit System: Bill Cloud usage to
user, using “credits” to buy Cloud
resource cpu.h
Implementation
Independant modules using Python & MySQL
Supported Clouds: EC2, OpenNebula, etc.
Supported DG middleware: BOINC & XtremWeb-HEP
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 7 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Cloud Provisioning Strategies
When to start Cloud resources ?
At 90% of BoT completion (9C)
At 90% of BoT assignment (9A)
When Tail appear, by monitoring execution time variance (V)
How many Cloud resources to start (for a given amount of Credits) ?
Greedy: As much as possible, for 1 hour of cloud usage (G)
Conservative: To ensure that there will be enough credits to run Cloud up to
an estimated completion time (C)
How to use Cloud resources ?
Flat: Cloud worker not differentiated from BE-DCI workers (F)
Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R)
Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud
infrastructure (D)
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
Experimentation Setup (1)
Simulations using real BE-DCI infrastructures availability traces, various BoT
workloads, BOINC and XWEP middleware
BE-DCIs availability traces :
Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA)
Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon &
Grenoble clusters in December 2010)
Cloud Spot instances: spot10, spot100 (Maximum number of instances for a
renting cost of 10 or 100 $ per hour, fluctuates according to market price)
trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power
(days) (nops/s) std. dev.
seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250
nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250
g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0
g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0
spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300
spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 9 / 18
Experimentation Setup (2)
BoT workloads:
Size nops / task Arrival time
SMALL 1000 3600000 0
BIG 10000 60000 0
RANDOM norm(µ = 1000, σ2
= 200) norm(µ = 60000, σ2
= 10000) weib(λ = 91.98, k = 0.57)
Simulations methodology:
Reproducible executions wo & w/ SpeQuloS
SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource
cpu.hour equivalent)
→ 25000 BoT execution traces
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 10 / 18
Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
Strategies Comparison
Tail Removal Efficiency
→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-F
9A-G-F
V-G-F
9C-C-F
9A-C-F
V-C-F
Flat deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P)
9C-G-R
9A-G-R
V-G-R
9C-C-R
9A-C-R
V-C-R
Reschedule deployment
strategy
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
FractionofBoTwheretailefficiency>P
Tail Removal Efficiency (Percentage P)
9C-G-D
9A-G-D
V-G-D
9C-C-D
9A-C-D
V-C-D
Cloud duplication
deployment strategy
Best strategies are able to
Suppress tail for 50% of execution
Half the tail for 80% of execution
Flat (F) < Reschedule (R) & Cloud Duplication (D)
Tail Detection (V) triggers Cloud too late
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
Cloud Resources Consumption
Percentage of credits spent vs
credits provisioned (=10% of BoT
workload).
10% to 25% of what has been
provisioned are actually used by
Cloud resources
0
10
20
30
40
50
9C-G
-F
9C-G
-R
9C-G
-D9C-C-F
9C-C-R
9C-C-D9A
-G
-F
9A
-G
-R
9A
-G
-D9A
-C-F
9A
-C-R
9A
-C-DV
-G
-F
V
-G
-R
V
-G
-DV
-C-F
V
-C-R
V
-C-D
Percentageofcreditsused
Combination of SpeQuloS strategies
→ ≈2.5% of BoT workload is executed on Cloud
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
Completion Time
Combination of strategies used: 9C-C-R
0
20000
40000
60000
80000
100000
120000
140000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & SMALL BoT
0
5000
10000
15000
20000
25000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & BIG BoT
0
10000
20000
30000
40000
50000
60000
70000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
BOINC & RANDOM BoT
0
5000
10000
15000
20000
25000
30000
35000
40000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & SMALL BoT
0
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & BIG BoT
1000
2000
3000
4000
5000
6000
7000
8000
SETI
N
D
G
5K
LY
OG
5K
G
RESPO
T10
SPO
T100
Completiontime(s)
BE-DCI
No SpeQuloS
SpeQuloS
XWHEP & RANDOM BoT
→ Up to 9x speedup
→ Depend on middleware used, BE-DCI volatility
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 13 / 18
Completion Time Prediction
→ User can ask prediction at any moment of BoT execution
Predicted completion time:
tp = α ×
t(r)
r
Current completion ratio: r
Time elapsed since submission: t(r)
α: adjustment factor, depend on execution environment:
DG server & middlware
Application & BoT size
→ Adjusted after BoT execution to minimize difference w/ completion time
observed
Statistical uncertainty (±x%): Success rate of prediction vs previous execution
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 14 / 18
Prediction Results
Completion Time Predication:
Made at 50% of BoT execution
Uncertainty: ± 20%
α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload
BoT category & Middleware
SMALL BIG RANDOM
BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixed
seti 100 100 100 82.8 100 87.0 94.1
nd 100 100 100 100 100 96.0 99.4
g5klyo 88.0 89.3 96.0 87.5 75 75 85.6
g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3
spot10 100 100 100 100 100 100 100
spot100 100 100 100 100 76 3.6 78.3
Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2
→ Successful prediction in 9 cases out of 10
→ Lower results with heterogeneous BoT
→ Needs a learning phase, with same BoT (at least same app.), executed on
same BE-DCI.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 15 / 18
SpeQuloS Deployment in European Desktop Grid Initiative
EDGI project: Bringing European Desktop Grids computing resources to scientific
communities.
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 16 / 18
Conclusion
BE-DCIs: “Low-cost” solution but poor QoS (tail effect)
SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI users
Efficiently removes the tail problem
→ Speed up BoT execution
→ Only require few % of workload to be executed on Cloud
Enable completion time prediction for users
→ A step towards BE-DCIs usability in computing landscape ?
Future work:
Better strategies to anticipate problems (tail effect)
Analysis from users feedback in SpeQuloS deployments
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 17 / 18
S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 18 / 18

More Related Content

What's hot

Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Ilham Amezzane
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Edge AI and Vision Alliance
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Intel® Software
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
inside-BigData.com
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
OW2
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Larry Smarr
 

What's hot (6)

Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
 

Viewers also liked

Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
Gilles Fedak
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13
Gilles Fedak
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Gilles Fedak
 
Mapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, OptimizationsMapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, Optimizations
Gilles Fedak
 
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesThe iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
Gilles Fedak
 
iExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud ComputingiExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud Computing
Gilles Fedak
 
How Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the InternetHow Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the Internet
Gilles Fedak
 

Viewers also liked (7)

Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Mapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, OptimizationsMapreduce Runtime Environments: Design, Performance, Optimizations
Mapreduce Runtime Environments: Design, Performance, Optimizations
 
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and PerspectivesThe iEx.ec Distributed Cloud: Latest Developments and Perspectives
The iEx.ec Distributed Cloud: Latest Developments and Perspectives
 
iExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud ComputingiExec: Blockchain-based Fully Distributed Cloud Computing
iExec: Blockchain-based Fully Distributed Cloud Computing
 
How Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the InternetHow Blockchain and Smart Buildings can Reshape the Internet
How Blockchain and Smart Buildings can Reshape the Internet
 

Similar to SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures

Research portfolio
Research portfolio Research portfolio
Research portfolio
Mehdi Bennis
 
DSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - VerkaikDSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - Verkaik
Deltares
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
Deltares
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
Deltares
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
Inderjeet Singh
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
Wolfgang Gentzsch
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to Product
The UberCloud
 
Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...
Phidias
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
rodrickmero
 
Edge-Fog Cloud
Edge-Fog CloudEdge-Fog Cloud
Edge-Fog Cloud
Nitinder Mohan
 
B4 greengrid
B4 greengridB4 greengrid
B4 greengrid
Régis Gautheron
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
Helix Nebula The Science Cloud
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
Dai Yang
 
Presentation Template.pptx for raesech paper
Presentation Template.pptx  for raesech paperPresentation Template.pptx  for raesech paper
Presentation Template.pptx for raesech paper
Hina636704
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
Ndjido Ardo BAR
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
OpenACC
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
Mellanox Technologies
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
jaliyae
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
ISSGC Summer School
 

Similar to SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures (20)

Research portfolio
Research portfolio Research portfolio
Research portfolio
 
DSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - VerkaikDSD-INT 2019 Parallelization project for the USGS - Verkaik
DSD-INT 2019 Parallelization project for the USGS - Verkaik
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to Product
 
Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...Bridging the gap to facilitate selection and image analysis activities for la...
Bridging the gap to facilitate selection and image analysis activities for la...
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
 
Edge-Fog Cloud
Edge-Fog CloudEdge-Fog Cloud
Edge-Fog Cloud
 
B4 greengrid
B4 greengridB4 greengrid
B4 greengrid
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
 
Presentation Template.pptx for raesech paper
Presentation Template.pptx  for raesech paperPresentation Template.pptx  for raesech paper
Presentation Template.pptx for raesech paper
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 

SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures

  • 1. SpeQuloS: A QoS Service for BoT Applications Using Best Effort Distributed Computing Infrastructures Simon Delamare 1 Gilles Fedak 2 Derrick Kondo 3 Oleg Lodygensky 4 1 LIP/CNRS, Univ. Lyon, France 2 LIP/INRIA, Univ. Lyon, France 3 LIG/INRIA, Univ. Grenoble, France 4 LAL/CNRS, Univ. Paris XI, France High-Performance Parallel and Distributed Computing, 2012 S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 1 / 18
  • 2. Introduction BE-DCI = “Best-Effort” Distributed Computing Infrastructure → Large computing power at low cost, Avoid wasting resources → No availability guarantee Desktop Grids → BOINC projects: Peta FLOPS for free Grids used in Best-Effort mode → ≈ 40% of utilization in Grid5000@Lyon Cloud “Spot” Instances → c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular) Relevant for BoT execution ... Bag of Tasks: Set of independent tasks to compute → but Low QoS level Especially compared to regular infrastructures S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 2 / 18
  • 3. Performance Problem Addressed BoT completion rate increases at the end of execution → Tail Effect 0 0.2 0.4 0.6 0.8 1 1.2 0 20 40 60 80 100 BoTcompletionratio Time Continuation is performed at 90% of completion Ideal Time Actual Completion Time Tail Duration Slowdown = (Tail Duration + Ideal Time) / Ideal Time BoT completion Tail part of the BoT Measured by Slowdown: S = IdealCompletionTime RealCompletionTime S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 3 / 18
  • 4. Slowdown by Tail Effect Slowdown reported on BoT execution 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 Fractionofexecutionwheretailslowdown<S Tail Slowdown S (Completion time observed divided by ideal completion time) BOINC XWHEP Best 50% ⇒ S < 1.3 25% to 33% ⇒ S > 2 Worst 5% ⇒ S> 4 to 10 Avg. % of BoT in tail Avg. % of time in tail BE-DCI Trace BOINC XWHEP BOINC XWHEP Desktop Grids 4.65 5.11 51.8 45.2 Best Effort Grids 3.74 6.40 27.4 16.5 Spot Instances 2.94 5.19 22.7 21.6 → Caused by no more than the last 7% of BoT S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 4 / 18
  • 5. How to improve the situation ? Better scheduling QoS in Grid scheduling ([12], [20], [38]) → Require heavy modification of middleware → No satisfactory solution for unreliable infrastructure ([7]) Addressing the tail effect → e.g. in MapReduce ([3], [39]), but require precise information from compute nodes, hard in large DCIs. Building Hybrid DCIs Grid & Desktop Grid ([35],[36]) → Mostly to offload Grid usage Using Cloud computing ([10],[28],[37]) → To address peak demands S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 5 / 18
  • 6. SpeQuloS Service → Improving BE-DCIs users perceived QoS Speeding up BoT execution Bring information on expected BoT execution time By dynamic provision of Cloud resources → Monitoring BoT execution → Execute the tail on Cloud Features: 1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes 2 Interface with users: QoS requests, State of completion, Prediction on remaining time 3 Careful utilization of Cloud resources w/ Billing & Accounting of usage S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 6 / 18
  • 7. Framework SpeQuloS modules: Information: Collect QoS-related information from DGs Oracle: Strategies to appropriately use Cloud resources / QoS prediction for users Scheduler: Start/Stop Cloud resources, usage accounting Credit System: Bill Cloud usage to user, using “credits” to buy Cloud resource cpu.h Implementation Independant modules using Python & MySQL Supported Clouds: EC2, OpenNebula, etc. Supported DG middleware: BOINC & XtremWeb-HEP S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 7 / 18
  • 8. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 9. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) How many Cloud resources to start (for a given amount of Credits) ? Greedy: As much as possible, for 1 hour of cloud usage (G) Conservative: To ensure that there will be enough credits to run Cloud up to an estimated completion time (C) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 10. Cloud Provisioning Strategies When to start Cloud resources ? At 90% of BoT completion (9C) At 90% of BoT assignment (9A) When Tail appear, by monitoring execution time variance (V) How many Cloud resources to start (for a given amount of Credits) ? Greedy: As much as possible, for 1 hour of cloud usage (G) Conservative: To ensure that there will be enough credits to run Cloud up to an estimated completion time (C) How to use Cloud resources ? Flat: Cloud worker not differentiated from BE-DCI workers (F) Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R) Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud infrastructure (D) S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 8 / 18
  • 11. Experimentation Setup (1) Simulations using real BE-DCI infrastructures availability traces, various BoT workloads, BOINC and XWEP middleware BE-DCIs availability traces : Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA) Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon & Grenoble clusters in December 2010) Cloud Spot instances: spot10, spot100 (Maximum number of instances for a renting cost of 10 or 100 $ per hour, fluctuates according to market price) trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power (days) (nops/s) std. dev. seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250 nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250 g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0 g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0 spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300 spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300 S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 9 / 18
  • 12. Experimentation Setup (2) BoT workloads: Size nops / task Arrival time SMALL 1000 3600000 0 BIG 10000 60000 0 RANDOM norm(µ = 1000, σ2 = 200) norm(µ = 60000, σ2 = 10000) weib(λ = 91.98, k = 0.57) Simulations methodology: Reproducible executions wo & w/ SpeQuloS SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource cpu.hour equivalent) → 25000 BoT execution traces S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 10 / 18
  • 13. Strategies Comparison Tail Removal Efficiency → Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-F 9A-G-F V-G-F 9C-C-F 9A-C-F V-C-F Flat deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-R 9A-G-R V-G-R 9C-C-R 9A-C-R V-C-R Reschedule deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-D 9A-G-D V-G-D 9C-C-D 9A-C-D V-C-D Cloud duplication deployment strategy Best strategies are able to Suppress tail for 50% of execution Half the tail for 80% of execution S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
  • 14. Strategies Comparison Tail Removal Efficiency → Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-F 9A-G-F V-G-F 9C-C-F 9A-C-F V-C-F Flat deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-R 9A-G-R V-G-R 9C-C-R 9A-C-R V-C-R Reschedule deployment strategy 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 FractionofBoTwheretailefficiency>P Tail Removal Efficiency (Percentage P) 9C-G-D 9A-G-D V-G-D 9C-C-D 9A-C-D V-C-D Cloud duplication deployment strategy Best strategies are able to Suppress tail for 50% of execution Half the tail for 80% of execution Flat (F) < Reschedule (R) & Cloud Duplication (D) Tail Detection (V) triggers Cloud too late S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 11 / 18
  • 15. Cloud Resources Consumption Percentage of credits spent vs credits provisioned (=10% of BoT workload). 10% to 25% of what has been provisioned are actually used by Cloud resources 0 10 20 30 40 50 9C-G -F 9C-G -R 9C-G -D9C-C-F 9C-C-R 9C-C-D9A -G -F 9A -G -R 9A -G -D9A -C-F 9A -C-R 9A -C-DV -G -F V -G -R V -G -DV -C-F V -C-R V -C-D Percentageofcreditsused Combination of SpeQuloS strategies S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
  • 16. Cloud Resources Consumption Percentage of credits spent vs credits provisioned (=10% of BoT workload). 10% to 25% of what has been provisioned are actually used by Cloud resources 0 10 20 30 40 50 9C-G -F 9C-G -R 9C-G -D9C-C-F 9C-C-R 9C-C-D9A -G -F 9A -G -R 9A -G -D9A -C-F 9A -C-R 9A -C-DV -G -F V -G -R V -G -DV -C-F V -C-R V -C-D Percentageofcreditsused Combination of SpeQuloS strategies → ≈2.5% of BoT workload is executed on Cloud S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 12 / 18
  • 17. Completion Time Combination of strategies used: 9C-C-R 0 20000 40000 60000 80000 100000 120000 140000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & SMALL BoT 0 5000 10000 15000 20000 25000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & BIG BoT 0 10000 20000 30000 40000 50000 60000 70000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS BOINC & RANDOM BoT 0 5000 10000 15000 20000 25000 30000 35000 40000 SETI N D G 5K LY OG 5K G RESPO T10SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & SMALL BoT 0 1000 2000 3000 4000 5000 6000 7000 8000 SETI N D G 5K LY OG 5K G RESPO T10 SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & BIG BoT 1000 2000 3000 4000 5000 6000 7000 8000 SETI N D G 5K LY OG 5K G RESPO T10 SPO T100 Completiontime(s) BE-DCI No SpeQuloS SpeQuloS XWHEP & RANDOM BoT → Up to 9x speedup → Depend on middleware used, BE-DCI volatility S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 13 / 18
  • 18. Completion Time Prediction → User can ask prediction at any moment of BoT execution Predicted completion time: tp = α × t(r) r Current completion ratio: r Time elapsed since submission: t(r) α: adjustment factor, depend on execution environment: DG server & middlware Application & BoT size → Adjusted after BoT execution to minimize difference w/ completion time observed Statistical uncertainty (±x%): Success rate of prediction vs previous execution S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 14 / 18
  • 19. Prediction Results Completion Time Predication: Made at 50% of BoT execution Uncertainty: ± 20% α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload BoT category & Middleware SMALL BIG RANDOM BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixed seti 100 100 100 82.8 100 87.0 94.1 nd 100 100 100 100 100 96.0 99.4 g5klyo 88.0 89.3 96.0 87.5 75 75 85.6 g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3 spot10 100 100 100 100 100 100 100 spot100 100 100 100 100 76 3.6 78.3 Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2 → Successful prediction in 9 cases out of 10 → Lower results with heterogeneous BoT → Needs a learning phase, with same BoT (at least same app.), executed on same BE-DCI. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 15 / 18
  • 20. SpeQuloS Deployment in European Desktop Grid Initiative EDGI project: Bringing European Desktop Grids computing resources to scientific communities. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 16 / 18
  • 21. Conclusion BE-DCIs: “Low-cost” solution but poor QoS (tail effect) SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI users Efficiently removes the tail problem → Speed up BoT execution → Only require few % of workload to be executed on Cloud Enable completion time prediction for users → A step towards BE-DCIs usability in computing landscape ? Future work: Better strategies to anticipate problems (tail effect) Analysis from users feedback in SpeQuloS deployments S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 17 / 18
  • 22. S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/SpeQuloS HPDC’12 18 / 18