08448380779 Call Girls In Friends Colony Women Seeking Men
Intelligent Cloud Automation
1. 1
cloud automation. revolutionized.
Intelligent Cloud Automation
- A Research perspective on (semi-)autonomous
cloud management
Erik Elmroth
erik.elmroth@elastisys.com
Unavailable or slow Internet services …
Lost sale?
Lost customer?
Lost reputation?
Probably like everyone else…
• 82% of customers give up on a lost payment transaction*
• 25% of users leave if load time > 4 s**
• 1% reduced sale per 100 ms load time**
• 0.5 s longer load time è 20% reduced income***
Insufficient capacity costs money for service owners!
What do you do …
• if your hotel search takes 5 secs for each hotel?
• if the web session crashes during payment?
* JupiterResearch ** Amazon ***Google
4. 4
Motivation: faults
Question: what is the probability of
a hard drive failure?
In my laptop?
Will happen every few years,
hopefully not right now…
In a large data center?
More than 100k nodes
Will happen during this talk!
Motivation: personnel costs
Question: How many servers can be handled by a system
administrator?
Very old question…
Some numbers:
10 - very complex systems
~300 - standard large-scale organization
Several 1000s – virtualized data center
26k (Facebook 2013)
Higher-level management and better abstractions are needed
Alternative: exponential increase in need for systems management
6. 6
The autonomic approach
• Autonomic computing
– Named after autonomic nervous
system
– Systems manage themselves
according to admin goals
– Self-governing operation of entire
system, not just parts of it
– New components integrate
effortlessly - as a new cell
establishes itself in the body
Autonomic Computing
• IBM initiative in early 2000’s
• Landmark paper published 2003
in IEEE Computer by Kephart and Chess
@ IBM
• Active research field since,
during 2003-2013:
– 200 conferences/workshops
– 8000+ papers
• Lots of funding
– EC FP6, FP7, H2020
– WASP
• Industry uptake
– Many big IT vendors & startups
• Key point
– Self-management of IT systems
7. 7
Self-management?
• Four aspects of self-management
– Self-configuration
• Configure themselves automatically
• High-level policies (what is desired, not how)
– Self-optimization
• Continually seek ways to improve their operation
• Hundreds of tunable parameters
– Self-healing
• Handle faults and errors
• Analyze information from logs and monitors
– Self-protection
• Malicious attacks
• Cascading failures
• Admin mistakes
The MAPE loop
• Fundamental architecture
– Managed element(s)
• Server, database, storage
system, etc.
– Autonomic manager
• Responsible for:
– Providing its service
– Managing behavior
according to goals
Interacting with other
autonomic elements
interactions among autonomic elements as it will
from the internal self-management of the individual
autonomic elements—just as the social intelligence
of an ant colony arises largely from the interactions
among individual ants. A distributed, service-ori-
ented infrastructure will support autonomic ele-
ments and their interactions.
As Figure 2 shows, an autonomic element will
typically consist of one or more managed elements
coupled with a single autonomic manager that con-
trols and represents them. The managed element
will essentially be equivalent to what is found in
ordinary nonautonomic systems, although it can
be adapted to enable the autonomic manager to
of this information, the autonomic m
relieve humans of the responsibility of d
aging the managed element.
Fully autonomic computing is likely
designers gradually add increasingly s
autonomic managers to existing manag
Ultimately, the distinction between the
manager and the managed element m
merely conceptual rather than archite
may melt away—leaving fully integr
nomic elements with well-defined be
interfaces, but also with few constrai
internal structure.
Each autonomic element will be res
managing its own internal state and b
for managing its interactions with an e
that consists largely of signals and me
other elements and the external world. A
internal behavior and its relationship
elements will be driven by goals that
has embedded in it, by other elemen
authority over it, or by subcontracts
ments with its tacit or explicit consent.
may require assistance from other
achieve its goals. If so, it will be resp
obtaining necessary resources from oth
and for dealing with exception cases,
failure of a required resource.
Autonomic elements will function at
from individual computing compone
disk drives to small-scale computing s
as workstations or servers to entire
enterprises in the largest autonomic sys
the global economy.
At the lower levels, an autonomic ele
of internal behaviors and relationship
elements, and the set of elements with
interact, may be relatively limited and
Particularly at the level of individual c
well-established techniques—many o
under the rubric of fault tolerance—ha
development of elements that rarely f
one important aspect of being autonom
Autonomic manager
Knowledge
Managed element
Analyze Plan
Monitor Execute
Figure 2. Structure of an autonomic element. Elements interact with other
elements and with human programmers via their autonomic managers.
8. 8
Specifying goals (1/3)
• Rules
– Often simple condition-action pairs
• If something happens, do this
• If something else happens, do that
• …
– Can use more complex languages to express states,
context, etc.
– Explicit enumeration tedious
– Very limited ability to express complex actions
Specifying goals (2/3)
• Utility functions
– Mathematical expressions
– Maps system state to scalar value
– Represents high-level objectives
– What parts of system state to include?
– What should function look like?
9. 9
Specifying goals (3/3)
• Policies
– (higher-level) descriptions of goals and constraints
for operation
– How to map to lower-level behavior?
– Composition of multiple policies
– What high-level language to use?
• Turing-complete?
• No widely used languages available today
• Human operators used to explicit steering
– Not used to indirect goal specification
Autonomic management
techniques - requirements
• Robustness
– Keep things working
– Minimize oscillations or behavioral changes
• Scalability
– Internet-scale: millions of servers and networks,
even more autonomic agents (50 billion devices?)
• Adaptive to changing workloads
– Some methods reliable for certain load patterns, but unstable
once the load or system dynamics change
• Performance
– Need to make decisions fast enough to react timely
– Optimal solutions vs. approximations
• Simplicity
– Key to adoption
– Complex models vs. model-free?
– Learning phase required before deployment?
10. 10
Gradual transition to autonomic?
1. Collect and aggregate information
– Input do human administrators’ decision-making
2. Decision-support systems suggesting possible
actions by humans
3. Autonomic systems entrusted with lower-level
decisions
4. Over time, less frequent and more high-level
decisions by operator
– Carried out by numerous autonomic actions
at lower level
The nature
of the challenge
12. 12
Extreme scale
• Enorma byggnader med servrar,
lagringsutrustning, nätverk, kylning
• En fabrik för IT-tjänster
25
13. 13
Extreme load variations
Wikipedia:Michael Jackson’s wiki page at
the time of his death and funeral service
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1e+06
May09
23
Jun09
06
Jun09
20
Jul09
04
Jul09
18
Aug09
01
Aug09
15
Aug09
29
Sep09
12
Requests
Date
Load Collateral
Challenges for today’s Clouds
• Extreme scale
• Extreme load variations
• Low level of determinism and predictability
• No hard performance guarantees
• Data centers consume a lot of energy
• Data centers have low utilization
Need for better resource management
14. 14
Resource management challenge
• Robustness & performance
• Cost- & energy efficiency
Approach
Autonomic resource management based on
control, analytics, learning, and optimization
Analyze Plan
Monitor Execute
Knowledge
Sensors and
actuators of
managed object
30
How much and what type
of resources to allocate and
when and where to deploy them?
16. 16
Anomalies vs. bottlenecks
O. Ibidunmoye, F. Hernandez-Rodriguez, and E. Elmroth. Performance Anomaly Detection and
Bottleneck Identification, ACM Computing Surveys, Vol. 48, No. 1, Article no. 4, 2015.
O. Ibidunmoye, A. Rezaie, and E. Elmroth. Adaptive Anomaly Detection in Performance Metric Streams.
IEEE Transactions on Network and Service Management, Accepted, 2017.
O. Ibidunmoye, E.B. Lakew, and E. Elmroth. A Black-box Approach for Detecting Systems Anomalies in
Virtualized Enviroments. The 2017 International Conference on Cloud and Autonomic Computing (ICCAC
2017), IEEE Computer Society, Accepted, 2017.
Datacenter Landscape Graphs and Coloring
O. Ibidunmoye, T. Metsch, V. Bayon-Molino, E. Elmroth. Performance Anomaly Detection using
Datacenter Landscape Graphs, IWQoS, 2016.
T. Metsch, O. Ibidunmoye, V. Bayon-Molino, J. Butler, F. Hernández-Rodriguez, and E. Elmroth. "Apex
Lake: A Framework for Enabling Smart Orchestration." In Proceedings of the Industrial Track of the 16th
International Middleware Conference, paper 1, ACM, 2015.
18. 18
Day
Requests
0 5 10 15 20 25 30
0
25M
50M
75M
100M
Workload Decomposition
Day
Requests
0 5 10 15 20 25 30
−50M
−25M
0
25M
50M
Day
Requests
0 5 10 15 20 25 30
−50M
−25M
0
25M
50M
Day
Requests
0 5 10 15 20 25 30
0
25M
50M
75M
100M
+
+ Seasonality Residuals
Trend
Wikipedia, January 2013,
daily seasonality
Sample control theoretic model
G/G/N queue with variable N (#VMs)
Horizontal Capacity Autoscaling
38
A. Ali-Eldin, M. Kihl, J. Tordsson, and E. Elmroth. Efficient Provisioning of Bursty Scientific
Workloads on the Cloud Using Adaptive Elasticity Control, In Proceedings of the 3rd Workshop
on Scientific Cloud Computing (ScienceCloud 2012), ACM New York, pp. 31-40, 2012.
A. Ali-Eldin, J. Tordsson, and E. Elmroth. An Adaptive Hybrid Elasticity Controller for Cloud
Infrastructures, The 13th IEEE/IFIP Network Operations and Management Symposium
(NOMS 2012), IEEE, pp. 204-212, 2012.
20. 20
Several Autoscaling Methods + Auto selection
A. Ali-Eldin, J. Tordsson, M. Kihl, and E. Elmroth. WAC: A Workload Analysis and
Classification Tool for On-line Selection of Cloud Auto-scaling Methods, submitted.
Controlling Average Response Time
through Vertical Scaling
42
E.B. Lakew, A.V. Papadopoulos, M. Maggio, C. Klein, and E. Elmroth. KPI-agnostic Control for Fine-Grained
Vertical Elasticity. In Proceedings of The 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGrid 2017), pp. 589-598, 2017.
E.B. Lakew, C. Klein, F. Hernandez-Rodriguez and E. Elmroth. Towards Faster Response Time Models for
Vertical Elasticity. In The 6th Cloud Control Workshop, part of the Proceedings of the 2014 IEEE Conference on
Utility and Cloud Computing (UCC 2014), pp. 560-565, 2014.
A couple of seconds control interval
21. 21
Controlling Tail Response Time
43
Response Time Controller:
f (tail response time) -> average response time
Then
Capacity Controller:
f(average response time) -> capacity
E.g., ensuring 95% of requests meet target response time
Unifying CPU and Memory Control
44
S. Farokhi, P. Jamshidi, E.B. Lakew, I. Brandic, and E. Elmroth. A Hybrid Cloud Controller for Vertical Memory
Elasticity: A Control-theoretic Approach. Future Generation Computer Systems, Elsevier, Vol. 65, pp. 57-72, 2016.
S. Farokhi, E.B. Lakew, C. Klein, I. Brandic, and E. Elmroth. Coordinating CPU and Memory Elasticity Controllers to
Meet Service Response Time Constraints, The 2015 International Conference on Cloud and Autonomic Computing
(ICCAC), IEEE Computer Society, pp. 69-80, 2015.
22. 22
Autoscaler subsystems
Core subsystems (required but pluggable for replacement).
Metronome: drives the execution: periodic resize iterations - sets the new desired size on the cloudpool endpoint.
Monitoring subsystem: metric streamer collecting data from a metric store (such as OpenTSDB) and a system
historian (capturing monitoring and performance data from the autoscaler itself in(configurable) metric store.
Prediction subsystem: predicts the machine pool size needed
Cloudpool proxy: local proxy for sending commands to a remote cloudpool endpoint over the cloudpool REST API.
Alerter: notifies the outside world about interesting events that are raised on the autoscaler's event bus.
Supports aditional Add-on subsystems e.g., for accounting or high-availability
www.cloudresearch.org
23. 23
VM placement
47
• Map VMs to resources
• After admission
• After scaling
• To reconsolidate
• Across datacenters
(Geo-placement)
• e.g., linear programming problem
• Within datacenter
• Load mixing
• Multi-dimensional multi-knapsack problem
VM Geo-Placement
Modeling (Cost Goals)
Minimize TIP = H ⇤
l
X
j=1
m
X
k=1
Pjk(
n
X
i=1
xijk)
Subject to
TIC = H ⇤
l
X
j=1
Cj(
n
X
i=1
m
X
k=1
xijk)
n
X
i=1
( i ⇤ i) > Threshold (1)
8i 2 [1..n] :
l
X
j=1
m
X
k=1
xijk = 1 (2)
8k 2 [1..m] :
LOCmin (
n
X
i=1
l
X
j=1
xijk)/n LOCmax (3)
Total cost
Capacity constraints
Load balance
constraints
W. Li, J. Tordsson, E. Elmroth. Modelling for Dynamic Cloud Scheduling via Migration of Virtual
Machines, 2011 Third IEEE International Conference on Cloud Computing Technology and Science
(Cloudcom 2011), IEEE Computer Society, pp. 163-171, 2011.
D. Espling, L. Larsson, W. Li, J. Tordsson, and E. Elmroth. Modeling and Placement of Structured
Cloud Services, IEEE Transactions on Cloud Computing, Vol. 4, No. 4, pp. 429-439, 2016.
24. 24
Intra Datacenter Placement
• Workload mixing (time & space)
• Multi-dimensional, multi-knapsack
• Application Specific
• Heterogeneous
hardware
W. Li, J. Tordsson, and E. Elmroth. Virtual Machine Placement for Predictable and Time-
Constrained Peak Loads. In Proceedings of the 8th International Workshop on Economics of
Grids, Clouds, Systems, and Services (GECON 2011), Lecture notes of Computer Science,
Springer-Verlag, Vol. 7150, pp. 120-134, 2012.
Relaxed box model virtualization
For enhanced workload mixing (space)
P. Svärd, J. Tordsson, B. Hudzia, E. Elmroth. Hecatonchire: Towards Multi-Host
Virtual Machines by Server Disaggregation. In Euro-Par 2014: Parallel Processing
Workshops, Lecture Notes in Computer Science, Vol. 8806, pp 519-529, 2014.
25. 25
Decentralized Placement
M. Sedaghat, F. Hernandez-Rodriguez, E. Elmroth, and G. Sarunas. Divide the Task, Multiply the
Outcome: Cooperative VM Consolidation, In Proceedings of The 6th IEEE International
Conference on Cloud Computing Technology and Science (CloudCom 2014), pp. 300-305, 2014.
M. Sedaghat, F. Hernandez-Rodriguez, and E. Elmroth. Autonomic Resource Allocation for Cloud
Data Centers: A Peer to Peer Approach. The ACM Cloud and Autonomic Computing Conference
(CAC'14), pp. 131-140, 2014.
Replication control for fault tolerance
• Multi-task jobs in presence of correlated failures
• Ensure that specified number of tasks complete
with certain probability
• Both
• #replicas
• placement
M. Sedaghat, E. Wadbro, J. Wilkes, S. De Luna, O. Seleznjev, and E. Elmroth. Die-Hard:
Reliable Scheduling to Survive Correlated Failures in Cloud Data Centers, IEEE/ACM Inter-
national Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, pp. 52-59, 2016.
26. 26
Live VM migration (without service interruption)
Pre Post Hybrid
Continuous service ( )
Resource usage
Robustness
Predictability
Transparency
P. Svärd, S. Walsh, B. Hudzia, J. Tordsson, and E. Elmroth. Principles and Performance
Characteristics of Algorithms for Live VM Migration. ACM Operating Systems Review,
Vol. 49, No. 1, pp. 142-155, 2015.
P. Svärd, B. Hudzia, J. Tordsson, and E. Elmroth. Evaluation of Delta Compression
Techniques for Efficient Live Migration of Large Virtual Machines, ACM SIGPLAN Notices,
Vol. 46, No. 7, ACM New York, NY, USA, pp. 111-120, 2011.
Pre-copy migration Post-copy migration
54
www.cloudresearch.org
27. 27
Energy-efficient management
S. K. Tesfatsion, E. Wadbro, J. Tordsson, A Combined Frequency Scaling and Application Elasticity
Approach for Energy-Efficient Clouds, IEEE Transactions on Cloud Computing, 2014.
Z. Li, S. Tesfatsion, S. Bastani, A. Hassan, E. Elmroth, M. Kihl, and R. Ranjan, A Survey on Modeling
Energy Consumption of Cloud Applications: Deconstruction, State of the Art, and Trade-off Debates.
IEEE Transactions on Sustainable Computing, Accepted, 2017.
Energy-efficient management
Performance-power trade-off
0
5
10
15
20
25
30
35
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Throughput
(fps)
Time (sec)
Frequency
VM
Core
Combined
Target
(a) Performance.
100
150
200
250
300
350
400
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Power
(watt)
Time (sec)
Frequency
VM
Core
Combined
(b) Power usage.
Figure 5. Achieved performance and power for four different policies.
10
12
14
16
18
20
22
24
26
0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640
Throughput
(fps)
Time (sec)
Target
Lower power savings:α-0.1, γ-0.9
Higher power savings: α-0.9, γ-0.1
(a) Performance.
240
250
260
270
280
290
300
310
320
330
0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640
Power
(watt)
Time (sec)
Lower power savings:α-0.1, γ-0.9
Higher power savings: α-0.9, γ-0.1
(b) Power usage.
0
5
10
15
20
25
30
35
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Throughput
(fps)
Time (sec)
Frequency
VM
Core
Combined
Target
(a) Performance.
100
150
200
250
300
350
400
0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800
Power
(watt)
Time (sec)
Frequency
VM
Core
Combined
(b) Power usage.
Figure 5. Achieved performance and power for four different policies.
10
12
14
16
18
20
22
24
26
0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640
Throughput
(fps)
Time (sec)
Target
Lower power savings:α-0.1, γ-0.9
Higher power savings: α-0.9, γ-0.1
(a) Performance.
240
250
260
270
280
290
300
310
320
330
0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640
Power
(watt)
Time (sec)
Lower power savings:α-0.1, γ-0.9
Higher power savings: α-0.9, γ-0.1
(b) Power usage.
Throughput Power
S. K. Tesfatsion, E. Wadbro, J. Tordsson, A Combined Frequency Scaling and Application Elasticity
Approach for Energy-Efficient Clouds, IEEE Transactions on Cloud Computing, 2014.
Z. Li, S. Tesfatsion, S. Bastani, A. Hassan, E. Elmroth, M. Kihl, and R. Ranjan, A Survey on Modeling
Energy Consumption of Cloud Applications: Deconstruction, State of the Art, and Trade-off Debates.
IEEE Transactions on Sustainable Computing, Accepted, 2017.
28. 28
www.cloudresearch.org
Dynamic Resource Rationing
Where to cut when resources are insufficient?
Two approaches
1. Strict QoS-level
adherence
2. Overall cost-benefit
with QoS-level weights
• Constrained optimization
• Substantial dependency
on KPI-type (e.g. response vs. throughput)
• System feedback on KPI and dimmer effect
• Ideally combined with brownout and self-driven capping
E.B. Lakew, C. Klein, F. Hernandez-Rodriguez and E. Elmroth. Performance-Based Service Differentiation in Clouds,
In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid
2015), IEEE Computer Society, pp. 505-514, 2015.
L. Tomas, E.B. Lakew, and E. Elmroth. Service Level and Performance Aware Dynamic Resource Allocation in
Overbooked Data Centers, The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(CCGrid 2016), pp. 42-51, 2016.
A.V. Papadopoulos, J. Krzywda, E. Elmroth, and M. Maggio. Power-Aware Cloud Brownout: response time and power
consumption control, In Proceedings of the 56th IEEE Conference on Decision and Control, Accepted, 2017.
M. Shahrad, C. Klein, L. Zheng, M. Chiang, E. Elmroth, and David Wentzlaff. Incentivizing Self-Resource-Capping with
Graceful Degradation, in Proceedings of the ACM Symposium on Cloud Computing 2017 (SoCC '17), Accepted, 2017.
30. 30
Single datacenters and edge clouds
W. Tärneberg, A. Mehta, E. Wadbro, J. Tordsson, J. Eker, M. Kihl, and E. Elmroth. Dynamic Application
Placement in the Telco-cloud, Future Generation Computer Systems, Elsevier, Vol. 70, pp. 163-177, 2017.
J. Krzywda, W. Tärneberg, P-O. Östberg, M. Kihl, and E. Elmroth. Telco Clouds: Modelling and Simulation,
Proceedings of the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015),
SCITEPRESS, pp. 597-609, 2015.
A. Mehta, W. Tärneberg, C Klein, J. Tordsson, M. Kihl, E. Elmroth. How beneficial are intermediate layer Data
Centers in Mobile Edge Networks? In Foundations and Applications of Self* Systems (FAS* 2016), 2016.
A. Mehta, R. Baddour, H. Gustafsson, F. Svensson, and E. Elmroth. Calvin Constrained - A Framework for IoT
Applications in Heterogeneous Environments, The 37th IEEE International Conference on Distributed
Computing (ICDCS 2017), pp. 1063-1073, 2017.
Controlling end-user performance
and network load
W. Tärneberg, A. Mehta, E. Wadbro, J. Tordsson, J. Eker, M. Kihl, and E. Elmroth. Dynamic Application
Placement in the Telco-cloud, Future Generation Computer Systems, Elsevier, Vol. 70, pp. 163-177, 2017.
J. Krzywda, W. Tärneberg, P-O. Östberg, M. Kihl, and E. Elmroth. Telco Clouds: Modelling and Simulation,
Proceedings of the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015),
SCITEPRESS, pp. 597-609, 2015.
A. Mehta, W. Tärneberg, C Klein, J. Tordsson, M. Kihl, E. Elmroth. How beneficial are intermediate layer Data
Centers in Mobile Edge Networks? In Foundations and Applications of Self* Systems (FAS* 2016), 2016.
A. Mehta, R. Baddour, H. Gustafsson, F. Svensson, and E. Elmroth. Calvin Constrained - A Framework for IoT
Applications in Heterogeneous Environments, The 37th IEEE International Conference on Distributed
Computing (ICDCS 2017), pp. 1063-1073, 2017.
Assume these are self-driving cars,
supported by on-line traffic control
31. 31
Controlling end-user performance
and network load
W. Tärneberg, A. Mehta, E. Wadbro, J. Tordsson, J. Eker, M. Kihl, and E. Elmroth. Dynamic Application
Placement in the Telco-cloud, Future Generation Computer Systems, Elsevier, Vol. 70, pp. 163-177, 2017.
J. Krzywda, W. Tärneberg, P-O. Östberg, M. Kihl, and E. Elmroth. Telco Clouds: Modelling and Simulation,
Proceedings of the 5th International Conference on Cloud Computing and Services Science (CLOSER 2015),
SCITEPRESS, pp. 597-609, 2015.
A. Mehta, W. Tärneberg, C Klein, J. Tordsson, M. Kihl, E. Elmroth. How beneficial are intermediate layer Data
Centers in Mobile Edge Networks? In Foundations and Applications of Self* Systems (FAS* 2016), 2016.
A. Mehta, R. Baddour, H. Gustafsson, F. Svensson, and E. Elmroth. Calvin Constrained - A Framework for IoT
Applications in Heterogeneous Environments, The 37th IEEE International Conference on Distributed
Computing (ICDCS 2017), pp. 1063-1073, 2017.
Future Computer
Systems will be
defined in Software,
not in Hardware
32. 32
Software-Defined Infrastructures
• Massive scale disaggregated hardware
• Dynamic definition (and redefinition) of virtual system
• Arbitrarily large “imbalance” between virtual systems’ CPU-
memory-network
• Less constraints in resource management optimization
• Higher density
• Greater flexibility
• Allows for easier programming models
G. Goumas, K. Nikas, E.B. Lakew, C. Kotselidis, A. Attwood, E. Elmroth, M. Flouris, N. Foutris, J.
Goodacre, D. Grohmann, V. Karakostas, P. Koutsourakis, M. Kersten, M. Lujàn, E. Rustad, J.
Thomson, L. Tomás, A. Vesterkjaer, J. Webber, Y. Zhang, and N. Koziris. ACTiCLOUD: Enabling the
Next Generation of Cloud Applications. The 37th IEEE International Conference on Distributed
Computing (ICDCS 2017), pp. 1836-1845, 2017.
Additional challenges for SDIs
• All ”traditional” resource allocation problems still relevant
• Vertical scaling can be performed on much larger scale!
• Enhanced by non-uniform performance characteristics
• Additional resource management for applications’ virtual
systems (VSys) after resources are assigned
• Hide latencies, move compute to data or data to
compute, trade-offs for performance – consistency
• Feedback between VSys management and the outer SDI
resource allocation
G. Goumas, K. Nikas, E.B. Lakew, C. Kotselidis, A. Attwood, E. Elmroth, M. Flouris, N. Foutris, J. Goodacre,
D. Grohmann, V. Karakostas, P. Koutsourakis, M. Kersten, M. Lujàn, E. Rustad, J. Thomson, L. Tomás, A.
Vesterkjaer, J. Webber, Y. Zhang, and N. Koziris. ACTiCLOUD: Enabling the Next Generation of Cloud
Applications. The 37th IEEE International Conference on Distributed Computing (ICDCS 2017), pp. 1836-
1845, 2017.
34. This training material is part of the FogGuru project that has
received funding from the European Union’s Horizon 2020
research and innovation programme under the Marie
Skłodowska-Curie grant agreement No 765452. The
information and views set out in this material are those of the
author(s) and do not necessarily reflect the official opinion of
the European Union. Neither the European Union institutions
and bodies nor any person acting on their behalf may be held
responsible for the use which may be made of the information
contained therein.