6. Adding autonomic computing capabilities
to existing large-scale systems
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Perf.
measures
8
7. Our subject system is designed without
an autonomic computing mindset
9
Millions of lines of code
Actively evolving (many
full-time developers)
Fast-paced agile
development
Thousands of organizations
use it for mission-critical
operations
8. Adding autonomic computing capabilities to
existing mission-critical systems is challenging
10
Millions of lines of code
Actively evolving (many
full-time developers)
Fast-paced agile
development
Thousands of organizations
use it for mission-critical
operations
9. Adopting autonomic computing
capabilities in existing large-scale systems
Understanding
runtime
behavior
Minimizing
footprint
Minimizing
risk
12
Statistics &
test
automation
Separation
from original
business logic
Getting
developers
involved
Expecting
failures & full
control
Proof of
concept &
transparencyThese challenges can generally apply to many other existing
systems that need autonomic computing capabilities
11. The runtime behaviour of a large-scale
system is usually not well understood
Environment Workload
Performance
Configuration
14
12. Developers may not fully understand the
performance impact of configurations
Environment Workload
Performance
Configuration
15
13. Studying the relationship between
configuration parameters and performance
Asking domain
experts
16
Perf. related
parameters Perf. data
Statistical
analysis
Perf. critical
parameters
Running
tests
Only a few out of many candidate parameters significantly
impact the system performance
Very costly!
14. Using test automation to minimize
experimental overhead
17
Cleanup system
Test run
Configuring
New configuration
Start system
Stop system
The original system only read
configs at startup
Removing the impact of the
previous test
Test automation is critical for minimizing manual effort and
leveraging testing resources during machine-spare time
17. Separating autonomic computing
concerns from the original business logic
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Perf.
measures
22
18. Separating autonomic computing
concerns from the original business logic
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
24
Separate! Perf.
measures
19. Separating autonomic computing
concerns from the original business logic
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Remote
management
Separate!
Streamed
log data
(Readily
available) 25
Perf.
measures
20. Separating autonomic computing
concerns from the original business logic
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Remote
management
Separate!
Streamed
log data
(Readily
available) 26
Perf.
measures
Only changed a few hundreds lines of code to
the original system
Negligible performance overhead
21. Leveraging readily-existing logs to
monitor system runtime behavior
27
Log
streamsOriginal system
Real-time perf.
measures
Fast response to workload changes
(within seconds)
Log files
Perf. opt.
24. We need to drive developers to change
their code
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Remote
management
Streamed
log data
(Readily
available) 30
Perf.
measures
(Involving
developers)
26. Motivate developers by proving the
concept of autonomic computing
Original system
Self-
monitoring
Self-configuring Self-optimizingOptimized
parameter values
Remote
management
Streamed
log data
(Readily
available) 32
Perf.
measures
(Involving
developers)
Stop and
re-config
27. Proving the concept of autonomic
computing
33
We regularly meet with the stakeholders to present/demo our
work, ensuring they are on the same pages as us
Prove
the
concept
Developers
change
code
Integrate code
changes & Prove
the concept
Developers
change
code
…
…
28. Visualizing the dynamics of autonomic
computing (transparency)
34
Monitoring
autonomic
computing
progress
Dashboard
Providing real-time debugging support to developers
31. Expecting failures
37
In case of a problem (e.g., crash):
- Ability to automatically unplug
the autonomic computing
capabilities;
- Ability to recover to a default
state.
The failures of the autonomic computing capabilities will not
interrupt the normal execution of the original system
32. Providing full manual control
38
Full
manual
control
Dashboard
Practitioners can turn on/off autonomic computing capabilities or
manually change configurations at any time
36. Adopting autonomic computing
capabilities in existing large-scale systems
42
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
Minimizing
footprint
37. Adopting autonomic computing
capabilities in existing large-scale systems
43
Statistics &
test
automation
Separation
from original
business logic
Expecting
failures & full
control
Proof of
concept &
transparency
Understanding
runtime
behavior
Minimizing
risk
Getting
developers
involved
Minimizing
footprint
Thank you!
http://hengli.org
hengli@cs.queensu.ca
@henglli