SLALOM Webinar
George Kousiouris, Andreas Menychtas,
Dimosthenis Kyriazis
ICCS/NTUA
Outline
• SLALOM Project
• Our background
• Overview of contributions
• SLA specification / reference model
• Abstract metric function / definition
• Conclusions
2
SLALOM Project
• EU funded project started on January 2015
• Aims at developing a core SLA specification
– As a basis for service interactions between providers and customers
– Considering current SLA landscape and research outcomes from
various projects and initiatives
• Current status
– Analysis of SLA landscape, standardization efforts, relevant research
outcomes
– Development of SLA specification (v1) including main blocks and
components for each component
– Development of abstract metric / function (v1) applicable to different
metrics
– Submission of our work to ISO SLA WG for standardization
• SLALOM is officially accepted as liaison body to ISO SLA WG
• Attended and presented our work in the last ISO SLA WG meeting (Dublin)
3
Our Background
• European Commission Expert Group on SLAs,
http://ec.europa.eu/digital-agenda/en/news/cloud-computing-
service-level-agreements-exploitation-research-results
• Real time SLAs (IRMOS FP7 project, 2008-2011)
– Real Time Cloud for enabling performance guarantees on soft real-
time applications via SLAs
• Admission Control and Legal Aspects (OPTIMIS FP7 project, 2010-
2013)
– Risk, Eco-efficiency and Cost as parameters
– Data Location considerations, Contractual terms
• Abstracted Auditing/Monitoring of SLAs (ARTIST FP7 project, 2012-
2015)
– 3ALib abstracted library implementation
– CloudML@ARTIST definition of a UML profile for SLA descriptions
4
Outline
• SLALOM Project
• Our background
• Overview of contributions
• SLA specification / reference model
• Abstract metric function / definition
• Conclusions
5
SLA Specification Contribution (1/2)
• The proposed specification / reference model takes into
account
– Standardization approaches and working groups outcomes
– Current SLAs offered by commercial cloud providers
– Expressed views by cloud providers and adopters
– Research outcomes
• With respect to ISO SLA WG outcomes
– Follows ISO 19086-2: Core blocks (i.e. metric definition,
parameters definition, rule definition) and the corresponding
elements (e.g. ID, name, unit, scale, etc) of the SLA
– Follows ISO 3534-2: Metric in different scales (such as interval,
ratio, nominal or ordinal)
6
SLA Specification Contribution (2/2)
• With respect to ISO WG outcomes (cont.)
– Suggests changes to ISO 19086-2: Naming of specific
elements (e.g. referenceId is used both for metricId
for parameterId and for ruleId, while SLALOM
proposes the use of different identifiers)
– Suggests changes to ISO 19086-2: Inclusion of
additional elements and blocks:
• Dependencies with other metrics - e.g. availability of storage
service and dependency to latency or response time and
dependency to bandwidth
• Importance of a metric (i.e. gradeOfImportance)
7
SLA Specification / Reference model:
Main blocks
• Follows ISO 19086-2 SLA specification
• Metric
– Corresponds to the service metric / objective (e.g. availability)
• Parameter
– Links the metric with a parameter to express how it can be
monitored and validated (e.g. time to provide resources
following an elasticity trigger)
• Rule
– Refers to metric “constraints” (e.g. number of concurrent
connections for a number of users metric)
• SLALOM proposed the addition of the Dependency block
– Captures dependencies between metrics (e.g. response time
and bandwidth)
8
Components
• Specific components are used for all building blocks
– ID
– Name
– Definition / Expression
– Unit
– Notes
• SLALOM proposed the following additions to the ISO
specification
– gradeOfImportance component for the Metric definition, to
define the metrics importance within an SLA
– consequenceOfViolation component for the Rule definition, to
define the potential consequence of violation on the service
provisioning
9
Outline
• SLALOM Project
• Our background
• Overview of contributions
• SLA specification / reference model
• Abstract metric function / definition
• Conclusions
10
Abstract metric function / definition primary goal
• Have a standard that forces ambiguities to be clarified
• Help in the measurement/auditing process of an SLA
– Especially by 3rd party providers
– What is the purpose of having SLAs if one is not able to measure them
non-repudiably?
• Abstract the process
11
Example of ambiguity in the measurement process: AWS EC2 SLA
• ““Unavailable” and “Unavailability” mean:
– For Amazon EC2, when all of your running instances have no
external connectivity.”
• Determination of external connectivity. How?
– E.g. pinging (ICMP)?
• Security threat
– Application layer (endpoint checking)?
• Includes application downtime (Not the responsibility of AWS EC2)
12
SLALOM 3-Layer Definition
Metric (Ratio) Layer
Final ratio calculation of the defined metric Correlates individual periods to the overall metric
Period (Time) Layer
Size of period, limit of base period and on the error
rate needed
Aggregates samples at a time interval granularity,
deciding on this level
Sample (Measurement) layer
Boolean expression based on concrete measurement
constraints
Dictates success or not of a sample
13
Sample Layer
• Sample Condition - sc: the condition stating whether a sample has been successful.
– operator: the operator can either be a boolean one (i.e. AND, OR, NOT) or a comparison
operator (<, >, <=, >=, ==, !=).
– value: the actual value of the condition that can be arithmetic, non-arithmetic (e.g. a
string such as “exception”) or an enumeration (e.g. HTTP response code == 200).
– unit: the unit for the value of the condition.
• Sample - s: the sample used to evaluate a parameter against the condition sc.
• Successful Sample - ss: the sample satisfying the condition sc.
• Unsuccessful Sample - us: the sample not satisfying the condition sc.
• Type of Operation field: defined sampling process e.g. pair of protocol- response
Sample definition
For a given type of operation as specified in the corresponding field (described
previously)
sc = operator + value + unit
ss = s if (sc is true)
us = s if (sc is false)
14
Period Layer
• Boundary Period - bp: the period for which the analysis of a parameter (through samples) should be
taken into account. Any sample that is not meeting this criterion (i.e. falls within the period) is
excluded even though if it is successful (i.e. ss according to the sample definition).
– operator: a comparison operator (<, >, <=, >=, ==, !=).
– value: the actual arithmetic value of the condition.
– unit: the unit in this case is always a time unit (e.g. seconds, minutes, etc).
• Error Condition - ec: the error condition ratio for which the analysis of a parameter (through samples)
should be taken into account. The ratio is always expressed in a percentage (%) format.
– operator: a comparison operator (<, >, <=, >=, ==, !=).
– value: the actual arithmetic value of the condition.
• Error Ratio - er: the error ratio calculated based on the total set of samples and the successful
samples.
• Period - p: the period in which samples (sc and uc) are examined according to the boundary period
and the error condition.
• Valid Period - vp: the period for which the error ratio value meets the error condition ratio and the
boundary period condition is also satisfied.
• Non-valid Period - np: the period for which the error ratio value does not meet the error condition
ratio (the boundary period condition is satisfied). Boundary period and error definitions
bp = operator + value + unit
ec = operator + value + %
er = Σus/Σs ∀us ∈ p
vp = p if ((er<=ec) && (p>=bp))
np = p if ((er>=ec) && (p>=bp))
15
Metric Layer
• Metric Condition - mc: the condition regarding a specific metric. The
condition is always expressed in a percentage (Ratio Type?) (%) format to
enable its evaluation as proposed through the metric evaluation.
– operator: a comparison operator (<, >, <=, >=, ==, !=).
– value: the actual arithmetic value of the condition.
• Metric Evaluation - me: the evaluation of the metric based on the valid
and non-valid period samples. The evaluation should be smaller than the
condition (i.e. me <mc).
Abstract metric definition
mc = operator + value + %
me = Σnp / (Σvp+Σnp)
16
Mapping of EC2 SLA@SLALOM
Amazon EC2
Level / definition Expression Notes
Sample definition sc: UNDEFINED (assumed ‘ping’->
ICMP)
The sampling condition is not defined in the
Amazon EC2 SLA. The concrete wording is “when
all of your running instances have no external
connectivity”. Nonetheless, the way to specify /
measure “external connectivity” is not defined.
For example a customer could use a ping
operation or a custom monitoring mechanism.
Type of operation: ping Not defined how the condition of connectivity
can be actually measured (e.g. the ping
operation mentioned previously).
Boundary period
and error
definitions
bp > 60 sec The exact wording is “the percentage of
minutes”, thus the period is 60 seconds.
ec = 100% Error condition reflecting that the error ratio is
that for the entire bp the resource must be
continuously “unavailable”.
Abstract metric
definition
availability < 99.95 % Availability metric definition given the boundary
period and error condition.
17
Mapping of SLALOM @ GAE Datastore
Google AppEngine Datastore
Level / definition Expression Notes
Sample definition sc: INTERNAL_ERROR Several sampling conditions are
defined per type of operation. For
example it is specified (exact wording)
“INTERNAL_ERROR, TIMEOUT, …” for
API calls.
Type of operation: API calls Several type of operations are defined.
An example is provided here.
Boundary period
and error
definitions
bp > 300 sec The exact wording is “five consecutive
minutes”.
ec > 10% Error condition reflecting that the
error ratio is (exact wording) “ten
percent Error Rate”.
Abstract metric
definition
availability < 99.95 % Availability metric definition given the
boundary period and error condition.
18
Mapping of SLALOM @ Microsoft
Azure SLA
19
Microsoft Azure Storage
Level / definition Expression Notes
Sample definition sc = 60 sec Several sampling conditions are defined
per type of operation. For example it is
specified (exact wording) “Sixty (60)
seconds” for PutBlockList and
GetBlockList.
Type of operation: PutBlockList and
GetBlockList
Several type of operations are defined.
An example is provided here.
Boundary period
and error
definitions
bp > 3600 sec The exact wording is “given one-hour
interval”.
ec > 0% Error condition reflecting that all periods
should be taken into account for the
availability metric evaluation (exact
wording) “is the sum of Error Rates for
each hour”.
Abstract metric
definition
availability < 99.9 % Availability metric definition given the
boundary period and error condition.
Preconditions
• For any SLA to apply, a number of preconditions
typically exist per provider
• Examples
– Deployment: Number of Availability Zones used
– Deployment: Replication options used
– Usage/Measurement: Restarting of resources when
unavailable
– Usage/Measurement: Applied Throttling of requests
20
Outline
• SLALOM Project
• Our background
• Overview of contributions
• SLA specification / reference model
• Abstract metric function / definition
• Conclusions
21
Snapshot of SLA contributions
• An SLA specification / reference model (captured in
SLALOM SLA Specification and Reference Model v1:
http://slalom-project.eu/content/d32-%E2%80%93-sla-
specification-and-reference-model)
– Follows and proposes changes to ISO 19086-2
– Follows ISO 3534-2
• An abstract metric function / definition that can be
exploited to specify any SLA metric (captured in
SLALOM SLA Metric Specification v1: http://slalom-
project.eu/content/slalom-sla-specification-v1-sep-
2015)
– Submitted as contribution to ISO 19086-2
22
Feedback
• Please provide us feedback on the presented
concepts at:
• https://docs.google.com/forms/d/1Ljnc2x2WS
aAXrWzHglDyy31xdCNC3qDiQqhC1FoxDZI/vie
wform
23
Any questions?
24
www.slalom-project.eu
3ALib (Availability SLA Benchmark and Auditor)
• Abstracted Availability Auditor Library (Java-based) Implementation:
– Based on the conceptual abstractions of different providers SLAs
– Abstracted at the code level, for efficient replacement of processes
and inclusion of providers
• Purpose:
– Align monitoring with specific provider definitions and adapt
availability calculations
– Check preconditions of SLA applicability for a specific deployment and
give feedback
– Adapt to dynamic Cloud Services user behavior
25
Details for GAE Datastore SLA
26
Details for Azure SLA
Transaction Type Maximum Processing Time*
 PutBlob and GetBlob (includes blocks
and pages)
 Get Valid Page Blob Ranges
Two (2) seconds multiplied by the number of MBs
transferred in the course of processing the
request
 Copy Blob Ninety (90) seconds (where the source and
destination blobs are within the same storage
account)
 PutBlockList
 GetBlockList
Sixty (60) seconds
 Table Query
 List Operations
Ten (10) seconds (to complete processing or
return a continuation)
 Batch Table Operations Thirty (30) seconds
 All Single Entity Table Operations
 All other Blob and Message Operations
Two (2) seconds
27

SLALOM Project Technical Webinar 20151111

  • 1.
    SLALOM Webinar George Kousiouris,Andreas Menychtas, Dimosthenis Kyriazis ICCS/NTUA
  • 2.
    Outline • SLALOM Project •Our background • Overview of contributions • SLA specification / reference model • Abstract metric function / definition • Conclusions 2
  • 3.
    SLALOM Project • EUfunded project started on January 2015 • Aims at developing a core SLA specification – As a basis for service interactions between providers and customers – Considering current SLA landscape and research outcomes from various projects and initiatives • Current status – Analysis of SLA landscape, standardization efforts, relevant research outcomes – Development of SLA specification (v1) including main blocks and components for each component – Development of abstract metric / function (v1) applicable to different metrics – Submission of our work to ISO SLA WG for standardization • SLALOM is officially accepted as liaison body to ISO SLA WG • Attended and presented our work in the last ISO SLA WG meeting (Dublin) 3
  • 4.
    Our Background • EuropeanCommission Expert Group on SLAs, http://ec.europa.eu/digital-agenda/en/news/cloud-computing- service-level-agreements-exploitation-research-results • Real time SLAs (IRMOS FP7 project, 2008-2011) – Real Time Cloud for enabling performance guarantees on soft real- time applications via SLAs • Admission Control and Legal Aspects (OPTIMIS FP7 project, 2010- 2013) – Risk, Eco-efficiency and Cost as parameters – Data Location considerations, Contractual terms • Abstracted Auditing/Monitoring of SLAs (ARTIST FP7 project, 2012- 2015) – 3ALib abstracted library implementation – CloudML@ARTIST definition of a UML profile for SLA descriptions 4
  • 5.
    Outline • SLALOM Project •Our background • Overview of contributions • SLA specification / reference model • Abstract metric function / definition • Conclusions 5
  • 6.
    SLA Specification Contribution(1/2) • The proposed specification / reference model takes into account – Standardization approaches and working groups outcomes – Current SLAs offered by commercial cloud providers – Expressed views by cloud providers and adopters – Research outcomes • With respect to ISO SLA WG outcomes – Follows ISO 19086-2: Core blocks (i.e. metric definition, parameters definition, rule definition) and the corresponding elements (e.g. ID, name, unit, scale, etc) of the SLA – Follows ISO 3534-2: Metric in different scales (such as interval, ratio, nominal or ordinal) 6
  • 7.
    SLA Specification Contribution(2/2) • With respect to ISO WG outcomes (cont.) – Suggests changes to ISO 19086-2: Naming of specific elements (e.g. referenceId is used both for metricId for parameterId and for ruleId, while SLALOM proposes the use of different identifiers) – Suggests changes to ISO 19086-2: Inclusion of additional elements and blocks: • Dependencies with other metrics - e.g. availability of storage service and dependency to latency or response time and dependency to bandwidth • Importance of a metric (i.e. gradeOfImportance) 7
  • 8.
    SLA Specification /Reference model: Main blocks • Follows ISO 19086-2 SLA specification • Metric – Corresponds to the service metric / objective (e.g. availability) • Parameter – Links the metric with a parameter to express how it can be monitored and validated (e.g. time to provide resources following an elasticity trigger) • Rule – Refers to metric “constraints” (e.g. number of concurrent connections for a number of users metric) • SLALOM proposed the addition of the Dependency block – Captures dependencies between metrics (e.g. response time and bandwidth) 8
  • 9.
    Components • Specific componentsare used for all building blocks – ID – Name – Definition / Expression – Unit – Notes • SLALOM proposed the following additions to the ISO specification – gradeOfImportance component for the Metric definition, to define the metrics importance within an SLA – consequenceOfViolation component for the Rule definition, to define the potential consequence of violation on the service provisioning 9
  • 10.
    Outline • SLALOM Project •Our background • Overview of contributions • SLA specification / reference model • Abstract metric function / definition • Conclusions 10
  • 11.
    Abstract metric function/ definition primary goal • Have a standard that forces ambiguities to be clarified • Help in the measurement/auditing process of an SLA – Especially by 3rd party providers – What is the purpose of having SLAs if one is not able to measure them non-repudiably? • Abstract the process 11
  • 12.
    Example of ambiguityin the measurement process: AWS EC2 SLA • ““Unavailable” and “Unavailability” mean: – For Amazon EC2, when all of your running instances have no external connectivity.” • Determination of external connectivity. How? – E.g. pinging (ICMP)? • Security threat – Application layer (endpoint checking)? • Includes application downtime (Not the responsibility of AWS EC2) 12
  • 13.
    SLALOM 3-Layer Definition Metric(Ratio) Layer Final ratio calculation of the defined metric Correlates individual periods to the overall metric Period (Time) Layer Size of period, limit of base period and on the error rate needed Aggregates samples at a time interval granularity, deciding on this level Sample (Measurement) layer Boolean expression based on concrete measurement constraints Dictates success or not of a sample 13
  • 14.
    Sample Layer • SampleCondition - sc: the condition stating whether a sample has been successful. – operator: the operator can either be a boolean one (i.e. AND, OR, NOT) or a comparison operator (<, >, <=, >=, ==, !=). – value: the actual value of the condition that can be arithmetic, non-arithmetic (e.g. a string such as “exception”) or an enumeration (e.g. HTTP response code == 200). – unit: the unit for the value of the condition. • Sample - s: the sample used to evaluate a parameter against the condition sc. • Successful Sample - ss: the sample satisfying the condition sc. • Unsuccessful Sample - us: the sample not satisfying the condition sc. • Type of Operation field: defined sampling process e.g. pair of protocol- response Sample definition For a given type of operation as specified in the corresponding field (described previously) sc = operator + value + unit ss = s if (sc is true) us = s if (sc is false) 14
  • 15.
    Period Layer • BoundaryPeriod - bp: the period for which the analysis of a parameter (through samples) should be taken into account. Any sample that is not meeting this criterion (i.e. falls within the period) is excluded even though if it is successful (i.e. ss according to the sample definition). – operator: a comparison operator (<, >, <=, >=, ==, !=). – value: the actual arithmetic value of the condition. – unit: the unit in this case is always a time unit (e.g. seconds, minutes, etc). • Error Condition - ec: the error condition ratio for which the analysis of a parameter (through samples) should be taken into account. The ratio is always expressed in a percentage (%) format. – operator: a comparison operator (<, >, <=, >=, ==, !=). – value: the actual arithmetic value of the condition. • Error Ratio - er: the error ratio calculated based on the total set of samples and the successful samples. • Period - p: the period in which samples (sc and uc) are examined according to the boundary period and the error condition. • Valid Period - vp: the period for which the error ratio value meets the error condition ratio and the boundary period condition is also satisfied. • Non-valid Period - np: the period for which the error ratio value does not meet the error condition ratio (the boundary period condition is satisfied). Boundary period and error definitions bp = operator + value + unit ec = operator + value + % er = Σus/Σs ∀us ∈ p vp = p if ((er<=ec) && (p>=bp)) np = p if ((er>=ec) && (p>=bp)) 15
  • 16.
    Metric Layer • MetricCondition - mc: the condition regarding a specific metric. The condition is always expressed in a percentage (Ratio Type?) (%) format to enable its evaluation as proposed through the metric evaluation. – operator: a comparison operator (<, >, <=, >=, ==, !=). – value: the actual arithmetic value of the condition. • Metric Evaluation - me: the evaluation of the metric based on the valid and non-valid period samples. The evaluation should be smaller than the condition (i.e. me <mc). Abstract metric definition mc = operator + value + % me = Σnp / (Σvp+Σnp) 16
  • 17.
    Mapping of EC2SLA@SLALOM Amazon EC2 Level / definition Expression Notes Sample definition sc: UNDEFINED (assumed ‘ping’-> ICMP) The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example a customer could use a ping operation or a custom monitoring mechanism. Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously). Boundary period and error definitions bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds. ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 17
  • 18.
    Mapping of SLALOM@ GAE Datastore Google AppEngine Datastore Level / definition Expression Notes Sample definition sc: INTERNAL_ERROR Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls. Type of operation: API calls Several type of operations are defined. An example is provided here. Boundary period and error definitions bp > 300 sec The exact wording is “five consecutive minutes”. ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”. Abstract metric definition availability < 99.95 % Availability metric definition given the boundary period and error condition. 18
  • 19.
    Mapping of SLALOM@ Microsoft Azure SLA 19 Microsoft Azure Storage Level / definition Expression Notes Sample definition sc = 60 sec Several sampling conditions are defined per type of operation. For example it is specified (exact wording) “Sixty (60) seconds” for PutBlockList and GetBlockList. Type of operation: PutBlockList and GetBlockList Several type of operations are defined. An example is provided here. Boundary period and error definitions bp > 3600 sec The exact wording is “given one-hour interval”. ec > 0% Error condition reflecting that all periods should be taken into account for the availability metric evaluation (exact wording) “is the sum of Error Rates for each hour”. Abstract metric definition availability < 99.9 % Availability metric definition given the boundary period and error condition.
  • 20.
    Preconditions • For anySLA to apply, a number of preconditions typically exist per provider • Examples – Deployment: Number of Availability Zones used – Deployment: Replication options used – Usage/Measurement: Restarting of resources when unavailable – Usage/Measurement: Applied Throttling of requests 20
  • 21.
    Outline • SLALOM Project •Our background • Overview of contributions • SLA specification / reference model • Abstract metric function / definition • Conclusions 21
  • 22.
    Snapshot of SLAcontributions • An SLA specification / reference model (captured in SLALOM SLA Specification and Reference Model v1: http://slalom-project.eu/content/d32-%E2%80%93-sla- specification-and-reference-model) – Follows and proposes changes to ISO 19086-2 – Follows ISO 3534-2 • An abstract metric function / definition that can be exploited to specify any SLA metric (captured in SLALOM SLA Metric Specification v1: http://slalom- project.eu/content/slalom-sla-specification-v1-sep- 2015) – Submitted as contribution to ISO 19086-2 22
  • 23.
    Feedback • Please provideus feedback on the presented concepts at: • https://docs.google.com/forms/d/1Ljnc2x2WS aAXrWzHglDyy31xdCNC3qDiQqhC1FoxDZI/vie wform 23
  • 24.
  • 25.
    3ALib (Availability SLABenchmark and Auditor) • Abstracted Availability Auditor Library (Java-based) Implementation: – Based on the conceptual abstractions of different providers SLAs – Abstracted at the code level, for efficient replacement of processes and inclusion of providers • Purpose: – Align monitoring with specific provider definitions and adapt availability calculations – Check preconditions of SLA applicability for a specific deployment and give feedback – Adapt to dynamic Cloud Services user behavior 25
  • 26.
    Details for GAEDatastore SLA 26
  • 27.
    Details for AzureSLA Transaction Type Maximum Processing Time*  PutBlob and GetBlob (includes blocks and pages)  Get Valid Page Blob Ranges Two (2) seconds multiplied by the number of MBs transferred in the course of processing the request  Copy Blob Ninety (90) seconds (where the source and destination blobs are within the same storage account)  PutBlockList  GetBlockList Sixty (60) seconds  Table Query  List Operations Ten (10) seconds (to complete processing or return a continuation)  Batch Table Operations Thirty (30) seconds  All Single Entity Table Operations  All other Blob and Message Operations Two (2) seconds 27