Adopting Autonomic Computing to Improve Existing Systems Performance

Adopting Autonomic Computing
Capabilities in Existing Large-Scale
Systems
Heng Li Tse-Hsun (Peter) Chen Ahmed E. Hassan
Mohamed Nasser, Parminder Flora

Manually configuring large-scale
software systems is costly & error-prone
Software systemWorkload Performance
4

Manually configuring large-scale
software systems is costly & error-prone
Software systemWorkload Performance
Configuration
5
Unsatisfied
perf. ?
Workloads are constantly evolving, requiring constant human
intervention to ensure optimal performance

6
Business
logic
Autonomic
computing
system
Autonomic computing aims to reduce
manual efforts

Adding autonomic computing capabilities
to existing large-scale systems
7
Software
system

Adding autonomic computing capabilities
to existing large-scale systems
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Perf.
measures
8

Our subject system is designed without
an autonomic computing mindset
9
Millions of lines of code
Actively evolving (many
full-time developers)
Fast-paced agile
development
Thousands of organizations
use it for mission-critical
operations

Adding autonomic computing capabilities to
existing mission-critical systems is challenging
10
Millions of lines of code
Actively evolving (many
full-time developers)
Fast-paced agile
development
Thousands of organizations
use it for mission-critical
operations

Adopting autonomic computing
capabilities in existing large-scale systems
Understanding
runtime
behavior
Minimizing
footprint
Minimizing
risk
12
Statistics &
test
automation
Separation
from original
business logic
Getting
developers
involved
Expecting
failures & full
control
Proof of
concept &
transparencyThese challenges can generally apply to many other existing
systems that need autonomic computing capabilities

Minimizing
footprint
Minimizing
risk
Getting
developers
involved
Statistics &
test
automation
Separation
from original
business logic
13
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior

The runtime behaviour of a large-scale
system is usually not well understood
Environment Workload
Performance
Configuration
14

Developers may not fully understand the
performance impact of configurations
Environment Workload
Performance
Configuration
15

Studying the relationship between
configuration parameters and performance
Asking domain
experts
16
Perf. related
parameters Perf. data
Statistical
analysis
Perf. critical
parameters
Running
tests
Only a few out of many candidate parameters significantly
impact the system performance
Very costly!

Using test automation to minimize
experimental overhead
17
Cleanup system
Test run
Configuring
New configuration
Start system
Stop system
The original system only read
configs at startup
Removing the impact of the
previous test
Test automation is critical for minimizing manual effort and
leveraging testing resources during machine-spare time

Minimizing
footprint
Minimizing
risk
Getting
developers
involved
18
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior

Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
19
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Minimizing
footprint

Separating autonomic computing
concerns from the original business logic
Original system
Self-
monitoring
parameter values
Perf.
measures
22

Original system
Self-
monitoring
parameter values
24
Separate! Perf.
measures

Original system
Self-
monitoring
parameter values
Remote
management
Separate!
Streamed
log data
(Readily
available) 25
Perf.
measures

Original system
Self-
monitoring
parameter values
Remote
management
Separate!
Streamed
log data
(Readily
available) 26
Perf.
measures
Only changed a few hundreds lines of code to
the original system
Negligible performance overhead

Leveraging readily-existing logs to
monitor system runtime behavior
27
Log
streamsOriginal system
Real-time perf.
measures
Fast response to workload changes
(within seconds)
Log files
Perf. opt.

Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
28
Statistics &
test
automation
Expecting
failures & full
control
Proof of
concept &
transparency
Separation
from original
business logic
Minimizing
footprint

Understanding
runtime
behavior
Minimizing
risk
Minimizing
footprint
Process &
product
transparency
Statistics &
test
automation
Expecting
failures &
transparency
29
Separation
from original
business logic
Getting
developers
involved

We need to drive developers to change
their code
Original system
Self-
monitoring
parameter values
Remote
management
Streamed
log data
(Readily
available) 30
Perf.
measures
(Involving
developers)

Motivate developers by proving the
concept of autonomic computing
31
Prove
the
concept

Motivate developers by proving the
concept of autonomic computing
Original system
Self-
monitoring
parameter values
Remote
management
Streamed
log data
(Readily
available) 32
Perf.
measures
(Involving
developers)
Stop and
re-config

Proving the concept of autonomic
computing
33
We regularly meet with the stakeholders to present/demo our
work, ensuring they are on the same pages as us
Prove
the
concept
Developers
change
code
Integrate code
changes & Prove
the concept
Developers
change
code
…
…

Visualizing the dynamics of autonomic
computing (transparency)
34
Monitoring
autonomic
computing
progress
Dashboard
Providing real-time debugging support to developers

Understanding
runtime
behavior
Minimizing
risk
Minimizing
footprint
Statistics &
test
automation
Expecting
failures &
transparency
35
Separation
from original
business logic
Proof of
concept &
transparency
Getting
developers
involved

Understanding
runtime
behavior
Getting
developers
involved
Minimizing
footprint
Statistics &
test
automation
Expecting
failures &
transparency
36
Separation
from original
business logic
Proof of
concept &
transparency
Minimizing
risk

Expecting failures
37
In case of a problem (e.g., crash):
- Ability to automatically unplug
the autonomic computing
capabilities;
- Ability to recover to a default
state.
The failures of the autonomic computing capabilities will not
interrupt the normal execution of the original system

Providing full manual control
38
Full
manual
control
Dashboard
Practitioners can turn on/off autonomic computing capabilities or
manually change configurations at any time

Understanding
runtime
behavior
Getting
developers
involved
Minimizing
footprint
39
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Minimizing
risk

The autonomic computing capabilities
significantly improve system performance
40
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●●●
●
●
●
●
●
●●●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 60 120 180 240 300
Running Time (minutes)
Key
Performance
Indicator
Autonomic computing capabilities
● Turned off
Turned on
Low workload, KPI is optimal
High workload, KPI drops
(KPI)

The autonomic computing capabilities
significantly improve system performance
41
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●●●
●
●
●
●
●
●●●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 60 120 180 240 300
Running Time (minutes)
Key
Performance
Indicator
Autonomic computing capabilities
● Turned off
Turned on
KPI is always optimal
Adapting quickly to the changing workloads
(KPI)

42
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
Minimizing
footprint

43
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
Minimizing
footprint
Thank you!
http://hengli.org
hengli@cs.queensu.ca
@henglli

Adopting Autonomic Computing to Improve Existing Systems Performance

Recommended

Recommended

More Related Content

Similar to Adopting Autonomic Computing to Improve Existing Systems Performance

Similar to Adopting Autonomic Computing to Improve Existing Systems Performance (20)

More from Concordia University

More from Concordia University (14)

Recently uploaded

Recently uploaded (20)

Adopting Autonomic Computing to Improve Existing Systems Performance