SlideShare a Scribd company logo
Autonomic Computing and Self-Healing Systems
William Chipman
Spring 2011
Colorado State University
Dr. France / Dr. Georg
Self-adapting systems have been developed to address the needs and are an
important part of many critical systems. While the ideas and models for self-adapting
systems have been around for quite a while, the current research and development has
made many strides towards true self-healing, auto-modifying systems. These systems can
be designed to adapt to the needs of the underlying system at run-time or while running
based on changes in the system as a whole. Self-adapting systems have both advantages
and disadvantages; some of which are general to all self-adapting systems and others that
are specific to implementations. This paper will include a thorough description of several
systems as well as the specific advantages, disadvantages and known issues along with
suggested solutions.
There are four characteristics of self-managing systems: self-configuration, self-
optimization, self-healing and self-protection [1]. For the system to be complete in its
implementation all four of the characteristics must be satisfied. In the current
environment this has proven to be a difficult endeavor. Addressing all four points can
make a system grow to an unwieldy size and level of complication. There have been
many advances in the design of tools and implementations in recent years that have made
these autonomic, self-healing systems close to a reality instead of a pipe dream.
The beginnings of the development of these self-managing systems are built
around Model Driven Engineering. “In MDE, a model is an abstraction or reduced
representation of a system that is built for specific purposes” [2]. This reduced abstraction
simplifies the system design so that the overall behavior and interactions can be mapped
out. Once this overview is completed, appropriate self-representations become important
to the continuation of the development. “It is critical that such representations be causally
connected” [2]. This is important because (1) “the model as interrogated should provide
up-to-date and exact information about the system to drive subsequent adaptation
decisions; and (2) if the model is causally connected, then adaptations can be made at the
model level rather than at the system level” [2].
In order to achieve these systems, developers have had to both develop the tools
for designing the systems and use the tools to build the actual systems. This paper will
begin with a thorough description of autonomic computing and self-healing systems. It
will then describe the tools utilized and will follow with detailed descriptions of several
past and current systems that incorporate the self-managing ideals.
Autonomic Computing
“The term autonomic computing was first used by IBM in 2001 to describe
computing systems that are said to be self-managing” [10]. Autonomic computing is
centered on the idea of removing human intervention in the system. The main goal is to
design and then develop systems that will adapt to changes in their environment on their
own. The description of autonomic computing from IBM compared it to the complexity
of the human body and the autonomic nervous system because of its self-managing
“A system with autonomic capabilities installs, configures, tunes and maintains its
own components at runtime” [3]. IBM describes the four main properties of self-
management as self-configuration, self-optimization, self-healing and self-protection.
Self-configuration says that a system can reconfigure itself based on high-level goals and
modeling. Self-optimization says that a system will make the best use of its resources.
Self-healing is the ability to detect and diagnose issues or problems and self-protection in
a system means that system can protect itself from malicious attacks and from unintended
or inadvertent changes.
There are five levels of autonomicity proposed by IBM as the Autonomic
Computing Adoption Model Levels. Level 1 is the basic level. At this level, the system
elements are managed by highly skilled staff and changes are made manually. Level 2 is
referred to as the Managed level. At this level, the system monitors itself and is
intelligent enough to reduce some of the burden on the system administrators. The
Predictive level, level 3, is characterized by the system using modeling of behavior to
recognize system-wide behavioral patterns and suggests fixes to the IT staff. Level 4 is
the Adaptive level. At this level, human interaction is minimized and the tools that were
used at level 3 and automated more so that the burden on the IT staff is minimal. The
fully autonomic level is the final level. At level 5, systems are able to self-manage almost
all functionality related to the needs of the system.
The basic building block in an autonomic system is the Autonomic Element (AE).
An autonomic element is a software-based component that is responsible for managing
sub-systems. “Autonomic elements may cooperate to achieve a common goal … [such
as] servers in a cluster optimizing the allocation of resources to application to minimize
the overall response time or execution time of the applications” [10]. Autonomic
elements implement the MAPE-K loop as the control loop for the managing of the sub-
system. A complete description of the MAPE-K loop will be seen in the Tools section of
this paper.
Variability models can be used to build autonomic elements and systems at
runtime. “The use of variability models at runtime brings new opportunities for
autonomic capabilities by reutilizing the efforts invested at design time” [3]. Systems
leveraging the variability model can use knowledge of the design to attain autonomic
modifications at compile-time and can further use system modeling to self-modify at
runtime. Standards such as the meta-data exchange allow the models that were used at
design-time to additionally be used at run-time. The negative aspect of variability models
is the potential for exponential explosion of the possible state transitions. “In order to
manage variability and avoid the combinatorial explosion of artifacts needed to support
this variability, [software tools] focus on variation points and variants instead of focusing
on whole configurations” [16]. A very specific type of autonomic system is the self-
healing system. These systems are often designed using the variability model.
Self-healing Systems
“A self-healing system is one that: replaces traditional error messages with robust
error detection, handling and correction that produces telemetry for automated diagnosis,
provides automated diagnosis and response from the error telemetry for hardware and
software entities, provides recursive fine-grained restart of services based upon
knowledge of their dependencies, presents simplified administrative interactions for
diagnosed problems and their effects on services and resources” [17]. There are several
ways that self-healing systems can be implemented. The simplest implementation is
through redundancy. Adding duplicate components for critical systems all the way up to
duplication of the entire system can allow for fail-safe operations. The issue with this is
that this is inherently wasteful of resources. The redundant components could be used
productively with a more innovative implementation. In addition to the wastefulness of
this, it only addresses total failures of components in the system. It does not address
degradation in the system. To heal these types of issues, a more robust solution was
In many implementations of self-healing systems, a multi-faceted approach is
needed. “Two distinct elements are required for the development of self-healing systems.
First, an automated or semi-automated agent must be present to make the decision of
when and how to affect repair on a system. Second, an infrastructure for actually
executing the repair strategy must be available to that agent” [7]. The use of managers is
the favored approach. In Solaris 10, managers to address faults and service issues were
used. The fault manager uses a system level model to determine when a failure or
degradation has occurred and searches through a dynamic list of solutions to determine
the most opportune solution. The service manager allows for restarting of services and
applications that have failed or degraded below a pre-determined threshold. This pre-
determined threshold is given by the application to the service manager when it enters use
on the system.
A very popular approach to developing self-healing systems is architecture-
centric. “An architectural style is a collection of constraints on components, connectors
and their configurations targeted towards a family of systems with shared characteristics”
[13]. In architecture-based self-healing system, to repair a running system, the changes
have to be machine readable by the underlying system as well as the describing systems.
This machine readable change instruction is referred to as an architectural difference or
diff. A diff describes the difference in the system before and after the repair. A diff is
comprised of components, links, connectors and interfaces.
According to Mikic-Rakic, “self-healing systems should satisfy:
adaptability, dynamicity, awareness, autonomy, robustness, distributability, mobility and
traceability” [13].
1. Adaptability: The system must allow changes to both the static
and dynamic portions of the system.
2. Dynamicity: Addresses the run-time changes that a system is
able to make.
3. Awareness: The architectural style must allow for self-
monitoring in the system.
4. Autonomy: Autonomy is completed through the system being
able to address the anomalies discovered.
5. Robustness: The architectural style should allow for the system
to respond to unforeseen conditions.
6. Distributability: The system must have good performance in
different distributions.
7. Mobility: The architecture should allow for modifications to
the location of components in the system.
8. Traceability: A system should allow for a direct correlation
between the model and the run-time execution. [13]
Some of these requirements are basic building blocks for systems and are used to enforce
a system level hierarchy on the data-flow and basic structure of the system while others
administer the dynamic changes to the system based on the data-flows. These dynamic
indicators analyze specific aspects of the system and address and implement the needed
changes. “The ability to dynamically repair a system at runtime based on its architecture
requires several capabilities: the ability to describe the current architecture of the system;
the ability to express an arbitrary change to that architecture that will serve as a repair
plan; the ability to analyze the result of the repair to gain confidence that the change is
valid; and the ability to execute the repair plan on a running system without restarting the
system” [7].
While self-healing systems are an innovative idea, there are still many issues in
their design and implementation that must be overcome in order for them to develop into
an integral part of a computer system. “Self-healing functionality for users and
administrators of a modern operating system [must] provide fine-grained fault isolation
and restart where possible of any component –hardware or software – that experiences a
problem” [17]. Without this fine-grain fault isolation, fixing the problem becomes
overkill in most situations. Any general fault will be addressed with a similar approach:
restart the component or application. This is not always an optimal or applicable solution.
With systems that are considered real-time and critical, often the downtime required for
such an overzealous solution is not available. With the finer level of fault isolation, small
problems can be fixed in a way that does not cripple the system even for a short period.
A second issue is tool integration. Seamless integration is “especially important in
the context of self-healing systems since no human can be involved in manually
transforming tool outputs or invoking tools” [7]. This integration has to be performed
with multiple tools. Current self-healing systems are so complex that no single tool
encompasses all the needs. With multiple tools added to an already complex system, the
system can become unwieldy. To ameliorate this complexity, using tools that have been
thoroughly tested is of the utmost importance. In addition to this, managing the growing
complexity of fix models can grow exponentially as the system grows. Predetermined
solutions and exhaustive solution searches can also lead to problems. The solution space
can grow exponentially and determining the best solution to a problem can be subjective
with computing systems obviously only capable of making objective decisions.
The third issue facing self-healing systems follows from a potential solution to the
previously described solution. Building solutions to problems at runtime or at
component/application integration time based on detailed models of the system can be a
potential solution but the main issue with this is that “in an open system, upfront system
analysis is at best of limited, heuristic usefulness” [8]. Because of the fact that open
systems tend to be dynamic and are molded by their environment, design-time models
become less relevant as the system grows and changes during runtime. While these
design-time models are important, a combination of all types of solutions is needed to
build robust systems that can grow and morph into what is truly needed.
There are many tools designed for building autonomic and self-healing systems.
The first and most widely used tool is the MAPE-K loop. The MAPE-K loop stands for
monitor, analyze, plan, execute and knowledge. The monitor “collates and aggregates
information received into the system and attempts to characterize any symptoms relating
to the way the system is running” [12]. The analyze phase processes the symptoms to
determine if the issues at hand need to be addressed. At the plan step, the system decides
what and how changes can be handled for a successful implementation. The execute stage
is where the plan is implemented and activated. After a completed execute phase, the
knowledge phase of the cycle attempts to determine if the implementation of the plan was
successful in instigating the necessary changes.
Figure 1. MAPE-K Loop Design [12]
The MAPE-K loop design was first proposed by IBM as a solution to autonomic
computing design. Systems built with the MAPE-K ideal tend to be robust. “The MAPE-
K loop is controlled by a manager, an embedded part of the autonomic element that
coordinates the individual activities” [1]. This manager is an integrated part of the
underlying system. Often systems will have several managers running the MAPE-K loop
simultaneously, autonomously and distributed. The way that the autonomic managers
interact with each other is determined by the autonomic computing architecture.
This integration of several managers leads to the next logical step of developing
autonomic software product lines (ASPLs). ASPLs can self-manage a large and complex
system and interact with other systems both local and distributed in order to deal with
product variations and dynamic system changes. There has been a good deal of research
into the ASPL concept and the Software Engineering Institute (SEI) has developed a
framework. This framework divides systems into three general categories: core assets
development, product development and management. The core assets are the basic
components in the SPL. They can range from business artifacts to reference architectures.
The product development is what is built with the core assets. They comprise the larger
portions of the system that perform the objectives of the design. The final portion is the
management. The management is what provides maintenance to all product
developments as well as monitors the system to determine where potential problems will
likely occur based on models.
“Fractal is an advanced component model and associated on-growing(sic)
programming and management support devised initially by France Telecom and INRIA
since 2001” [5]. The Fractal model is based on component-based software engineering. It
makes use of components, interfaces, which are interaction points between those
components, and bindings which are the communication channels between components.
Fractal also uses the concepts of membranes and contents. “The membrane exercises an
arbitrary reflexive control over its content” [5]. A membrane is made up from a set of
controllers. “The model is recursive with sharing at arbitrary levels” [5]. The model is
programming language independent. It is extensible. Bindings are controlled through the
specific programming. “The Fractal project targets the development of a reflective
component technology for the construction of highly adaptable and reconfigurable
distributed systems” [5]. Fractal enforces a limited number of architectural structures.
This allows the systems to be more robust as the specified component need not exist at
runtime in order for successful management.
The final tool analyzed is SmartAdapters. SmartAdapters are used to decrease the
complexity of dynamically adaptive systems. The first step in the use of SmartAdapters is
the maintenance of a high-level representation model of the running systems. Maintaining
the high-level model allows for a quicker and more through response to issues as they
arrive. “SmartAdapters automatically generate an extensible Aspect-Oriented Modeling
framework specific to [the] metamodel” [16]. These metamodels are used to control the
potentially exponential growth of solutions. “Using Aspect-Oriented weavers, whole
configurations can be built on-demand by selecting a set of aspects in practice [using]
SmartAdapters” [16]. Components that occur in all configurations are the base models
and are used to weave the aspects of the system.
“In SmartAdapters, an aspect is composed of three parts: i) an advice model,
representing what [to] weave, ii) a pointcut model, representing where [to] weave the
aspect and iii) weaving directives specifying how to weave the advice model at the join
points matching the pointcut model” [16]. The advice model is a portion of the model that
is potentially having an issue. The pointcut model also represents a portion of the model
but it is described by the roles in the system. The weaving directives specify how to
weave an aspect building from the advice model to the pointcut model using the domain-
specific language of the system.
Early self-managing projects were funded by DARPA for the military. The first
was Situational Awareness System (SAS). It was created to aid in communication
between soldiers on the battlefield. The communication devices had to be durable and
able to deal with harsh conditions and potential jammers to the communication channels.
The design of the system was a distributed peer-to-peer systems with self-healing
communication channels.
“The DARPA Self-Regenerative Systems program started in 2004 is a project that
aims to develop technology for building military computing systems that provide critical
functionality at all times, in spite of damage caused by unintentional errors or attacks”
[10]. The four aspects of this project are (1) software made resistant to errors and attacks,
(2) binary code that is modifiable to make attacks harder when trying to exploit
vulnerabilities, (3) a scalable architecture that is intrusion tolerant and (4) the ability to
build systems that can attempt to detect malicious inside users and block attempts to
attack the system.
NASA, in 2005, began work on the Autonomous NanoTechnology Swarm
(ANTS). The project was designed to launch a swarm of 1000 small spacecrafts into an
asteroid belt and use the information gathered to determine which asteroids were deemed
interesting for further investigation. The ships would be required to use autonomic
techniques to continually elect a leader and rebuild communication channels.
MAPE-K implementations include the Autonomic toolkit, ABLE, Kinesthetics
eXtreme and Self-Management Tightly Coupled with Application [10]. The Autonomic
toolkit is a prototype implementation of the MAPE-K loop built in Java but able to
communicate through XML to other applications. ABLE is also a toolkit designed by
IBM but it is designated for use in multi-tangent systems that need self-management
implementations. Kinesthetics eXtreme is a complete autonomic loop designed mainly in
Java that is focused on adding autonomic abilities to legacy systems that may not have
been designed with autonomic capabilities. Finally, Self-Management Tightly Coupled
with Application is a project with the goal of developing middleware frameworks that
offer self-management functionality to applications.
There are currently eight platforms that support Fractal components in multiple
programming languages. “Julia was historically (2002) the first Fractal implementation”
[5]. It was developed and used by the France Telecom. It is based in Java and was
developed to prove that component-based systems did not have to perform inefficiently.
“Think is a C implementation of Fractal” [5]. Think is also available through France
Telecom but has development assistance through STMicroelectronics. Think is used to
build kernels of all sorts. The kernels range from exo-kernels and micro-kernels to low
memory complete operating systems. “ProActive is a distributed and asynchronous
implementation of Fractal targeting grid computing” [5]. France Telecom developed
ProActive as middleware for parallel, concurrent and distributed computing grids. It is
object based and allows for asynchronicity deployment and management. “AOKell is a
Java implementation by INRIA Jacquard” [5]. It is similar to Julia but based on AOP
using membranes for load time weaving. Its performance is similar to that of Julia.
“FractNet is a .Net implementation of the Fractal component model developed by the
LSR laboratory” [5]. It is a port of AOKell to the .Net platform. It is similar in design and
performance also. “Flone is a Java implementation of the Fractal component model
developed by INRIA Sardes for teaching purposes” [5]. It is not a full implementation but
instead is a group of APIs that simplify the Fractal model so that it is more easily
understood by students. “FracTalk is an experimental Smalltalk implementation of the
Fractal component model developed at Ecole des Mines de Douai” [5]. FracTalk focuses
on dynamic elements in component-based programming. “Plasma is a C++ experimental
implementation of Fractal developed at INRIA Sardes” [5]. It is dedicated to building
multimedia applications that are self-adaptive. Fractal also has a complete repertoire of
open components for middleware and operating systems.
The smart home feature model is an autonomic computing solution described by
Cetina [3]. The design is such that a smart home can be fully automated and yet still
dynamically adjust to the changing patterns of the residents and the influx and removal of
components. The goal of autonomic computing for smart homes is “to reduce …
configuration effort, [so that] smart homes can provide the following autonomic
capabilities: Self-configuration. New kinds of devices can be incorporated into the
system; Self-healing. When a device is removed or fails, the system should adapt itself to
offer its services using alternative components; Self-adaptation. Users’ needs differ and
change over time. The system should adjust its services to fulfill user preferences” [3].
This behavior is similar to context adaptation. The Model-Based Reconfiguration Engine
(MoRE) is used to implement the management of the models used in the system. The
operations of the engine are used to determine how to evolve the system to meet future
needs and reconfigurations. Reconfiguration actions fall in to three categories:
1. Component actions: components that must be installed, uninstalled or
2. Channel actions: Communications for active components
3. Model actions: updates to the MoRE model after the component and channel
actions occur. [3]
Figure 2. Smart Home Model [3]
Because any change to the system can trigger a need for a change to the model,
the high-level models must be maintained and updated. This allows the system to
continually gather and process information about the dynamicity of the system and
affords the MoRE the ability to develop and implement solutions.
Self-healing and autonomic systems have begun to integrate themselves in to
many more generalized computing systems. The tools that are used to build these systems
have been developed and optimized to make the best use of the underlying models and
component architectures. The MAPE-K loop and the Fractal modeling system are two of
the most accepted and widely used tools for developing these autonomic systems.
The MAPE-K loop, proposed by IBM, is used to develop manager driven systems
that interact to build solutions. These solutions are then integrated into the running
system. Many active self-healing systems are based on MAPE-K. These include the
Autonomic toolkit, ABLE, Kinesthetics eXtreme and Self-Management Tightly Coupled
with Application. All of these implementations have well-documented success.
The Fractal modeling system, based on component models, was designed and
implemented by France Telecom and has been used to implemented multiple systems
across many different languages. The Fractal modeling system is much more complicated
than the MAPE-K loop but the systems appear to be more straight-forward to implement.
These and other tools have been used to build multiple systems ranging from
smart homes to deep-space multi-object space probes. The unifying aspect of all the
systems built to be autonomic and self-healing is that they tend to be complicated. This
level of complexity can grow exponentially as the size of the system grows. To overcome
this complexity, models and metamodels are often used to describe the runtime systems.
These models are built to speed the processing of solutions but as often as not just add
more complexity and sprawl to the system.
While all of the tools described and used to build the implementations of self-
healing systems are useful and well-developed, they would likely work best in a
conglomeration. By merging the best aspects of the tools and design models,
functionality will grow at a faster rate than the complexity of the implementation.
Managers used in the MAPE-K loop could be utilized in the MoRE to build more
cohesive systems. These systems and managers using the Fractal model management
ideals would be more likely to have lower reaction time to changes in the environment
and additionally would suffer from less model creep when describing the potential
solution space. In addition to this merger of the best portions from multiple tools, there is
a potential to use database-style storage of known good configurations as they are
implemented along with new advanced and streamlined searching technologies to
expedite the implementation of changes as needed.
Autonomic systems and self-healing systems are going to become a part of most
systems as the computing industry embraces the ideas and realizes the inherent good in
the design. These systems will become more complex and all-encompassing as they
become more commonplace. New tools will have to be developed to aid in the
implementation of these new systems and a better understanding of system modeling will
be needed by all individuals that interact with the development of these systems. Industry
will embrace the usefulness of these systems but in order for there to be success in the
implementations steps must be taken to merge the best aspects of the tools and to train the
engineers on modeling and best practices.
[1] Abbas, N.; Andersson, J.; Loewe, W.; , “Autonomic Software Product Lines (ASPL).
Proceeding of the 7th international conference on Autonomic computing (ICAC
'10). ACM, New York, NY, USA, pp.324-331. 2010
[2] Blair, G.; Bencomo, N.; France, R.B.; , "Models@ run.time," Computer , vol.42,
no.10, pp.22-27, Oct. 2009
[3] Cetina, C.; Giner, P.; Fons, J.; Pelechano, V.; , “Autonomic computing through reuse
of variability models at runtime: the case of the smart homes” Computer , vol.42,
no.10, pp.37-43, Oct. 2009
[4] Cheung-Foo-Wo, D.; Tigli, J.; Lavirotte, S.; Riveill, M.; “Self-adaptation of event-
driven component-oriented middleware using aspects of assembly.” Proceedings
of the 5th international workshop on Middleware for pervasive and ad-hoc
computing: ACM, New York, NY, USA, pp.31-36. 2006
[5] Coupaye, T.; Stefani, J-B.; “Fractal component-based software engineering.”
Proceedings of the 2006 conference on Object-oriented technology: ECOOP
2006 workshop reader (ECOOP'06), Springer-Verlag, Berlin, Heidelberg, pp.117-
129. 2006
[6] Dabrowski, C.; Mills, K.; , “Understanding self-healing in service-discovery
systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02),
ACM, New York, NY, USA, pp.15-20. 2002
[7] Dashofy, E.; Hoek, A.; Taylor, R.; , “Towards architecture-based self-healing
systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02),
ACM, New York, NY, USA, pp.21-26. 2002.
[8] Fickas, S.; Hall, R.; , “Self-Healing Open Systems.” Proceedings of the first
workshop on Self-healing systems (WOSS '02), ACM, New York, NY, USA, 99-
101. 2002
[9] George, S.; Evans, D.; Davidson, L.; ,“A biologically inspired programming model
for self-healing systems.” Proceedings of the first workshop on Self-healing
systems (WOSS '02), ACM, New York, NY, USA, 102-104. 2002
[10] Huebscher, M.; McCann, J.; “A survey of autonomic computing—degrees, models,
and applications.” ACM Comput. Surv. 40, 3, Article 7 (August 2008), 28 pages.
[11] Maoz, S.; , "Using Model-Based Traces as Runtime Models," Computer , vol.42,
no.10, pp.28-36, Oct. 2009
[12] Mengusoglu, E.; Pickering, B.; , “Automated management and service provisioning
model for distributed devices.” Proceedings of the 2007 workshop on Automating
service quality: Held at the International Conference on Automated Software
Engineering (ASE). ACM, New York, NY, USA, pp.38-41. 2007
[13] Mikic-Rakic, M.; Mehta, N.; Medvidovic, N.; , “Architectural style requirements for
self-healing systems.” In Proceedings of the first workshop on Self-healing
systems WOSS '02, ACM, New York, NY, USA, pp.49-54. 2002.
[14] Morin, B.; Barais, O.; Jezequel, J.-M.; Fleurey, F.; Solberg, A.; , "Models@
Run.time to Support Dynamic Adaptation," Computer , vol.42, no.10, pp.44-51,
Oct. 2009
[15] Morin, B.; Fleurey, F.; Bencomo, N.; Jezequel, J.-M.; Solberg, A.; Dehlen, V.;Blair,
G.; , “An Aspect-Oriented and Model-Driven Approach for Managing Dynamic
Variability,” Proceedings of the 11th international conference on Model Driven
Engineering Languages and Systems (MoDELS '08), Springer-Verlag, Berlin,
Heidelberg, pp.782-796. 2008
[16] Morin, B.; Barais, O.; Nain, G.; Jezequel, J.; , “Taming Dynamically Adaptive
Systems using models and aspects.” Proceedings of the 31st International
Conference on Software Engineering (ICSE '09). IEEE Computer Society,
Washington, DC, USA, pp.122-132. 2009
[17] Shapiro, M.; “Self-Healing in Modern Operating Systems.” Queue 2, 9, pp.66-75,
[18] Weyns, D.; Malek, S.; Andersson, D.; , "FORMS: a formal reference model for self-
adaptation.” Proceeding of the 7th international conference on Autonomic
computing (ICAC '10). ACM, New York, NY, USA, pp.205-214. 2010

More Related Content

What's hot

Autonomic Computing
Autonomic ComputingAutonomic Computing
Autonomic Computing
Ahmed Banafa
Software engineering socio-technical systems
Software engineering   socio-technical systemsSoftware engineering   socio-technical systems
Software engineering socio-technical systems
Dr. Loganathan R
Introducing sociotechnical systems
Introducing sociotechnical systemsIntroducing sociotechnical systems
Introducing sociotechnical systemssommerville-videos
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013Ian Sommerville
Introduction to real time software systems script
Introduction to real time software systems scriptIntroduction to real time software systems script
Introduction to real time software systems script
A Quick Look At The Computer Support Long Island
A Quick Look At The Computer Support Long IslandA Quick Look At The Computer Support Long Island
A Quick Look At The Computer Support Long Island
Availability and reliability
Availability and reliabilityAvailability and reliability
Availability and reliability
Ch18-Software Engineering 9
Ch18-Software Engineering 9Ch18-Software Engineering 9
Ch18-Software Engineering 9Ian Sommerville
Critical systems specification
Critical systems specificationCritical systems specification
Critical systems specificationAryan Ajmer
End User Development - Governance and Risk Management
End User Development - Governance and Risk ManagementEnd User Development - Governance and Risk Management
End User Development - Governance and Risk Management
Daniel Li
Systems human factors in system engg
Systems human factors in system enggSystems human factors in system engg
Systems human factors in system engg
Embedded systems software
Embedded systems softwareEmbedded systems software
Embedded systems software
Lect 2 assessing the technology landscape
Lect 2 assessing the technology landscapeLect 2 assessing the technology landscape
Lect 2 assessing the technology landscape
university of education,Lahore
Decision Making and Autonomic Computing
Decision Making and Autonomic ComputingDecision Making and Autonomic Computing
Decision Making and Autonomic Computing
IOSR Journals
Information System & Organizational System
Information System & Organizational SystemInformation System & Organizational System
Information System & Organizational System
university of education,Lahore
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System EngineeringEmmanuel Fuchs
CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013Ian Sommerville
Critical Systems
Critical SystemsCritical Systems
Critical Systems
Usman Bin Saad

What's hot (20)

Autonomic Computing
Autonomic ComputingAutonomic Computing
Autonomic Computing
Software engineering socio-technical systems
Software engineering   socio-technical systemsSoftware engineering   socio-technical systems
Software engineering socio-technical systems
Introducing sociotechnical systems
Introducing sociotechnical systemsIntroducing sociotechnical systems
Introducing sociotechnical systems
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013
Introduction to real time software systems script
Introduction to real time software systems scriptIntroduction to real time software systems script
Introduction to real time software systems script
A Quick Look At The Computer Support Long Island
A Quick Look At The Computer Support Long IslandA Quick Look At The Computer Support Long Island
A Quick Look At The Computer Support Long Island
Availability and reliability
Availability and reliabilityAvailability and reliability
Availability and reliability
Ch18-Software Engineering 9
Ch18-Software Engineering 9Ch18-Software Engineering 9
Ch18-Software Engineering 9
Critical systems specification
Critical systems specificationCritical systems specification
Critical systems specification
End User Development - Governance and Risk Management
End User Development - Governance and Risk ManagementEnd User Development - Governance and Risk Management
End User Development - Governance and Risk Management
Systems human factors in system engg
Systems human factors in system enggSystems human factors in system engg
Systems human factors in system engg
Embedded systems software
Embedded systems softwareEmbedded systems software
Embedded systems software
Lect 2 assessing the technology landscape
Lect 2 assessing the technology landscapeLect 2 assessing the technology landscape
Lect 2 assessing the technology landscape
Decision Making and Autonomic Computing
Decision Making and Autonomic ComputingDecision Making and Autonomic Computing
Decision Making and Autonomic Computing
Information System & Organizational System
Information System & Organizational SystemInformation System & Organizational System
Information System & Organizational System
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System Engineering
CSEC630 individaul assign
CSEC630 individaul assignCSEC630 individaul assign
CSEC630 individaul assign
CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013CS 5032 L1 critical socio-technical systems 2013
CS 5032 L1 critical socio-technical systems 2013
System engineering
System engineeringSystem engineering
System engineering
Critical Systems
Critical SystemsCritical Systems
Critical Systems

Viewers also liked

Self-healing Materials
Self-healing MaterialsSelf-healing Materials
Self-healing Materials
Self healing Materials
Self healing MaterialsSelf healing Materials
Self healing Materials
Sahil Gupta
Digital Future of IT Service Providers - Converge Chennai 2015
Digital Future of IT Service Providers - Converge Chennai 2015Digital Future of IT Service Providers - Converge Chennai 2015
Digital Future of IT Service Providers - Converge Chennai 2015
Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...
Marcel Bruch
Delivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Delivering Digital Business Solutions, Raja Ukil, CIO, WiproDelivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Delivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Deep Freeze - Design
Deep Freeze - DesignDeep Freeze - Design
Deep Freeze - Design Inc.
Bar cohen-jpl-biomimetic-robots
Bar cohen-jpl-biomimetic-robotsBar cohen-jpl-biomimetic-robots
Bar cohen-jpl-biomimetic-robots
Hau Nguyen
Study of the Antimatter at Large Hadron Collider
Study of the Antimatter at Large Hadron ColliderStudy of the Antimatter at Large Hadron Collider
Study of the Antimatter at Large Hadron Collider
Sustainable Engineering - Practical Studies for Building a Sustainable Society
Sustainable Engineering - Practical Studies for Building a Sustainable Society Sustainable Engineering - Practical Studies for Building a Sustainable Society
Sustainable Engineering - Practical Studies for Building a Sustainable Society
QuEST Forum
An Overview of Microfluidics
An Overview of MicrofluidicsAn Overview of Microfluidics
An Overview of Microfluidics
Rajan Arora
Blade-less Wind Turbine
Blade-less Wind TurbineBlade-less Wind Turbine
Blade-less Wind Turbine
Neel Patel
Bladeless wind turbine
Bladeless wind turbineBladeless wind turbine
Bladeless wind turbine
Revathi C
Hovercraft presentation-The Future is Now!
Hovercraft presentation-The Future is Now!Hovercraft presentation-The Future is Now!
Hovercraft presentation-The Future is Now!
Riaz Zalil
ayush sharma
Hovercraft seminar report
Hovercraft seminar report Hovercraft seminar report
Hovercraft seminar report
Sreesangh P Ghosh
Tesla bladeless turbine
Tesla bladeless turbineTesla bladeless turbine
Tesla bladeless turbine
Process design.cancer treatment using nanoparticles. ppt
Process design.cancer treatment using nanoparticles. pptProcess design.cancer treatment using nanoparticles. ppt
Process design.cancer treatment using nanoparticles. pptHoang Tien

Viewers also liked (20)

Self-healing Materials
Self-healing MaterialsSelf-healing Materials
Self-healing Materials
Self healing Materials
Self healing MaterialsSelf healing Materials
Self healing Materials
Digital Future of IT Service Providers - Converge Chennai 2015
Digital Future of IT Service Providers - Converge Chennai 2015Digital Future of IT Service Providers - Converge Chennai 2015
Digital Future of IT Service Providers - Converge Chennai 2015
Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...
Automation Concepts
Automation ConceptsAutomation Concepts
Automation Concepts
Delivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Delivering Digital Business Solutions, Raja Ukil, CIO, WiproDelivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Delivering Digital Business Solutions, Raja Ukil, CIO, Wipro
Deep Freeze - Design
Deep Freeze - DesignDeep Freeze - Design
Deep Freeze - Design
Bar cohen-jpl-biomimetic-robots
Bar cohen-jpl-biomimetic-robotsBar cohen-jpl-biomimetic-robots
Bar cohen-jpl-biomimetic-robots
Study of the Antimatter at Large Hadron Collider
Study of the Antimatter at Large Hadron ColliderStudy of the Antimatter at Large Hadron Collider
Study of the Antimatter at Large Hadron Collider
Sustainable Engineering - Practical Studies for Building a Sustainable Society
Sustainable Engineering - Practical Studies for Building a Sustainable Society Sustainable Engineering - Practical Studies for Building a Sustainable Society
Sustainable Engineering - Practical Studies for Building a Sustainable Society
An Overview of Microfluidics
An Overview of MicrofluidicsAn Overview of Microfluidics
An Overview of Microfluidics
Nano Fluids
Nano FluidsNano Fluids
Nano Fluids
Blade-less Wind Turbine
Blade-less Wind TurbineBlade-less Wind Turbine
Blade-less Wind Turbine
Bladeless wind turbine
Bladeless wind turbineBladeless wind turbine
Bladeless wind turbine
Hovercraft presentation-The Future is Now!
Hovercraft presentation-The Future is Now!Hovercraft presentation-The Future is Now!
Hovercraft presentation-The Future is Now!
Hovercraft seminar report
Hovercraft seminar report Hovercraft seminar report
Hovercraft seminar report
Tesla bladeless turbine
Tesla bladeless turbineTesla bladeless turbine
Tesla bladeless turbine
Process design.cancer treatment using nanoparticles. ppt
Process design.cancer treatment using nanoparticles. pptProcess design.cancer treatment using nanoparticles. ppt
Process design.cancer treatment using nanoparticles. ppt

Similar to Autonomic Computing and Self Healing Systems

Design patterns for self adaptive systems
Design patterns for self adaptive systemsDesign patterns for self adaptive systems
Design patterns for self adaptive systems
Softwarearchitecture in practice unit1 2
Softwarearchitecture in practice unit1 2Softwarearchitecture in practice unit1 2
Softwarearchitecture in practice unit1 2
Self Adaptive Systems
Self Adaptive SystemsSelf Adaptive Systems
Self Adaptive Systems
Adeel Rasheed
Quality aware approach for engineering self-adaptive software systems
Quality aware approach for engineering self-adaptive software systemsQuality aware approach for engineering self-adaptive software systems
Quality aware approach for engineering self-adaptive software systems
A Runtime Evaluation Methodology and Framework for Autonomic Systems
A Runtime Evaluation Methodology and Framework for Autonomic SystemsA Runtime Evaluation Methodology and Framework for Autonomic Systems
A Runtime Evaluation Methodology and Framework for Autonomic Systems
IDES Editor
IS-1 Short Report [Muhammad Akram Abbasi]
IS-1 Short Report [Muhammad Akram Abbasi]IS-1 Short Report [Muhammad Akram Abbasi]
IS-1 Short Report [Muhammad Akram Abbasi]Akram Abbasi
Software requirement analysis enhancements byprioritizing re
Software requirement analysis enhancements byprioritizing reSoftware requirement analysis enhancements byprioritizing re
Software requirement analysis enhancements byprioritizing re
Unit 1
Unit 1Unit 1
M azhar
M azharM azhar
M azhar
Mazhar Saleem
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Sad 1 chapter 1- additional material
Sad 1 chapter 1- additional materialSad 1 chapter 1- additional material
Sad 1 chapter 1- additional materialBirhan Atnafu
Software Development Skills and SDLC
Software Development Skills and SDLCSoftware Development Skills and SDLC
A model for run time software architecture adaptation
A model for run time software architecture adaptationA model for run time software architecture adaptation
A model for run time software architecture adaptation
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systems
Alan Tatourian
Dynamically Adapting Software Components for the Grid
Dynamically Adapting Software Components for the GridDynamically Adapting Software Components for the Grid
Dynamically Adapting Software Components for the Grid

Similar to Autonomic Computing and Self Healing Systems (20)

Design patterns for self adaptive systems
Design patterns for self adaptive systemsDesign patterns for self adaptive systems
Design patterns for self adaptive systems
Softwarearchitecture in practice unit1 2
Softwarearchitecture in practice unit1 2Softwarearchitecture in practice unit1 2
Softwarearchitecture in practice unit1 2
Self Adaptive Systems
Self Adaptive SystemsSelf Adaptive Systems
Self Adaptive Systems
Quality aware approach for engineering self-adaptive software systems
Quality aware approach for engineering self-adaptive software systemsQuality aware approach for engineering self-adaptive software systems
Quality aware approach for engineering self-adaptive software systems
A Runtime Evaluation Methodology and Framework for Autonomic Systems
A Runtime Evaluation Methodology and Framework for Autonomic SystemsA Runtime Evaluation Methodology and Framework for Autonomic Systems
A Runtime Evaluation Methodology and Framework for Autonomic Systems
IS-1 Short Report [Muhammad Akram Abbasi]
IS-1 Short Report [Muhammad Akram Abbasi]IS-1 Short Report [Muhammad Akram Abbasi]
IS-1 Short Report [Muhammad Akram Abbasi]
Software requirement analysis enhancements byprioritizing re
Software requirement analysis enhancements byprioritizing reSoftware requirement analysis enhancements byprioritizing re
Software requirement analysis enhancements byprioritizing re
Unit 1
Unit 1Unit 1
Unit 1
M azhar
M azharM azhar
M azhar
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Autonomics Computing (with some of Adaptive Systems) and Requirements Enginee...
Sad 1 chapter 1- additional material
Sad 1 chapter 1- additional materialSad 1 chapter 1- additional material
Sad 1 chapter 1- additional material
Software Development Skills and SDLC
Software Development Skills and SDLCSoftware Development Skills and SDLC
Software Development Skills and SDLC
A model for run time software architecture adaptation
A model for run time software architecture adaptationA model for run time software architecture adaptation
A model for run time software architecture adaptation
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systems
Dynamically Adapting Software Components for the Grid
Dynamically Adapting Software Components for the GridDynamically Adapting Software Components for the Grid
Dynamically Adapting Software Components for the Grid

Autonomic Computing and Self Healing Systems

  • 1. Autonomic Computing and Self-Healing Systems William Chipman Spring 2011 Colorado State University Dr. France / Dr. Georg
  • 2. Introduction Self-adapting systems have been developed to address the needs and are an important part of many critical systems. While the ideas and models for self-adapting systems have been around for quite a while, the current research and development has made many strides towards true self-healing, auto-modifying systems. These systems can be designed to adapt to the needs of the underlying system at run-time or while running based on changes in the system as a whole. Self-adapting systems have both advantages and disadvantages; some of which are general to all self-adapting systems and others that are specific to implementations. This paper will include a thorough description of several systems as well as the specific advantages, disadvantages and known issues along with suggested solutions. There are four characteristics of self-managing systems: self-configuration, self- optimization, self-healing and self-protection [1]. For the system to be complete in its implementation all four of the characteristics must be satisfied. In the current environment this has proven to be a difficult endeavor. Addressing all four points can make a system grow to an unwieldy size and level of complication. There have been many advances in the design of tools and implementations in recent years that have made these autonomic, self-healing systems close to a reality instead of a pipe dream. The beginnings of the development of these self-managing systems are built around Model Driven Engineering. “In MDE, a model is an abstraction or reduced representation of a system that is built for specific purposes” [2]. This reduced abstraction simplifies the system design so that the overall behavior and interactions can be mapped out. Once this overview is completed, appropriate self-representations become important
  • 3. to the continuation of the development. “It is critical that such representations be causally connected” [2]. This is important because (1) “the model as interrogated should provide up-to-date and exact information about the system to drive subsequent adaptation decisions; and (2) if the model is causally connected, then adaptations can be made at the model level rather than at the system level” [2]. In order to achieve these systems, developers have had to both develop the tools for designing the systems and use the tools to build the actual systems. This paper will begin with a thorough description of autonomic computing and self-healing systems. It will then describe the tools utilized and will follow with detailed descriptions of several past and current systems that incorporate the self-managing ideals. Autonomic Computing “The term autonomic computing was first used by IBM in 2001 to describe computing systems that are said to be self-managing” [10]. Autonomic computing is centered on the idea of removing human intervention in the system. The main goal is to design and then develop systems that will adapt to changes in their environment on their own. The description of autonomic computing from IBM compared it to the complexity of the human body and the autonomic nervous system because of its self-managing ability. “A system with autonomic capabilities installs, configures, tunes and maintains its own components at runtime” [3]. IBM describes the four main properties of self- management as self-configuration, self-optimization, self-healing and self-protection. Self-configuration says that a system can reconfigure itself based on high-level goals and
  • 4. modeling. Self-optimization says that a system will make the best use of its resources. Self-healing is the ability to detect and diagnose issues or problems and self-protection in a system means that system can protect itself from malicious attacks and from unintended or inadvertent changes. There are five levels of autonomicity proposed by IBM as the Autonomic Computing Adoption Model Levels. Level 1 is the basic level. At this level, the system elements are managed by highly skilled staff and changes are made manually. Level 2 is referred to as the Managed level. At this level, the system monitors itself and is intelligent enough to reduce some of the burden on the system administrators. The Predictive level, level 3, is characterized by the system using modeling of behavior to recognize system-wide behavioral patterns and suggests fixes to the IT staff. Level 4 is the Adaptive level. At this level, human interaction is minimized and the tools that were used at level 3 and automated more so that the burden on the IT staff is minimal. The fully autonomic level is the final level. At level 5, systems are able to self-manage almost all functionality related to the needs of the system. The basic building block in an autonomic system is the Autonomic Element (AE). An autonomic element is a software-based component that is responsible for managing sub-systems. “Autonomic elements may cooperate to achieve a common goal … [such as] servers in a cluster optimizing the allocation of resources to application to minimize the overall response time or execution time of the applications” [10]. Autonomic elements implement the MAPE-K loop as the control loop for the managing of the sub- system. A complete description of the MAPE-K loop will be seen in the Tools section of this paper.
  • 5. Variability models can be used to build autonomic elements and systems at runtime. “The use of variability models at runtime brings new opportunities for autonomic capabilities by reutilizing the efforts invested at design time” [3]. Systems leveraging the variability model can use knowledge of the design to attain autonomic modifications at compile-time and can further use system modeling to self-modify at runtime. Standards such as the meta-data exchange allow the models that were used at design-time to additionally be used at run-time. The negative aspect of variability models is the potential for exponential explosion of the possible state transitions. “In order to manage variability and avoid the combinatorial explosion of artifacts needed to support this variability, [software tools] focus on variation points and variants instead of focusing on whole configurations” [16]. A very specific type of autonomic system is the self- healing system. These systems are often designed using the variability model. Self-healing Systems “A self-healing system is one that: replaces traditional error messages with robust error detection, handling and correction that produces telemetry for automated diagnosis, provides automated diagnosis and response from the error telemetry for hardware and software entities, provides recursive fine-grained restart of services based upon knowledge of their dependencies, presents simplified administrative interactions for diagnosed problems and their effects on services and resources” [17]. There are several ways that self-healing systems can be implemented. The simplest implementation is through redundancy. Adding duplicate components for critical systems all the way up to duplication of the entire system can allow for fail-safe operations. The issue with this is that this is inherently wasteful of resources. The redundant components could be used
  • 6. productively with a more innovative implementation. In addition to the wastefulness of this, it only addresses total failures of components in the system. It does not address degradation in the system. To heal these types of issues, a more robust solution was needed. In many implementations of self-healing systems, a multi-faceted approach is needed. “Two distinct elements are required for the development of self-healing systems. First, an automated or semi-automated agent must be present to make the decision of when and how to affect repair on a system. Second, an infrastructure for actually executing the repair strategy must be available to that agent” [7]. The use of managers is the favored approach. In Solaris 10, managers to address faults and service issues were used. The fault manager uses a system level model to determine when a failure or degradation has occurred and searches through a dynamic list of solutions to determine the most opportune solution. The service manager allows for restarting of services and applications that have failed or degraded below a pre-determined threshold. This pre- determined threshold is given by the application to the service manager when it enters use on the system. A very popular approach to developing self-healing systems is architecture- centric. “An architectural style is a collection of constraints on components, connectors and their configurations targeted towards a family of systems with shared characteristics” [13]. In architecture-based self-healing system, to repair a running system, the changes have to be machine readable by the underlying system as well as the describing systems. This machine readable change instruction is referred to as an architectural difference or
  • 7. diff. A diff describes the difference in the system before and after the repair. A diff is comprised of components, links, connectors and interfaces. According to Mikic-Rakic, “self-healing systems should satisfy: adaptability, dynamicity, awareness, autonomy, robustness, distributability, mobility and traceability” [13]. 1. Adaptability: The system must allow changes to both the static and dynamic portions of the system. 2. Dynamicity: Addresses the run-time changes that a system is able to make. 3. Awareness: The architectural style must allow for self- monitoring in the system. 4. Autonomy: Autonomy is completed through the system being able to address the anomalies discovered. 5. Robustness: The architectural style should allow for the system to respond to unforeseen conditions. 6. Distributability: The system must have good performance in different distributions. 7. Mobility: The architecture should allow for modifications to the location of components in the system. 8. Traceability: A system should allow for a direct correlation between the model and the run-time execution. [13]
  • 8. Some of these requirements are basic building blocks for systems and are used to enforce a system level hierarchy on the data-flow and basic structure of the system while others administer the dynamic changes to the system based on the data-flows. These dynamic indicators analyze specific aspects of the system and address and implement the needed changes. “The ability to dynamically repair a system at runtime based on its architecture requires several capabilities: the ability to describe the current architecture of the system; the ability to express an arbitrary change to that architecture that will serve as a repair plan; the ability to analyze the result of the repair to gain confidence that the change is valid; and the ability to execute the repair plan on a running system without restarting the system” [7]. While self-healing systems are an innovative idea, there are still many issues in their design and implementation that must be overcome in order for them to develop into an integral part of a computer system. “Self-healing functionality for users and administrators of a modern operating system [must] provide fine-grained fault isolation and restart where possible of any component –hardware or software – that experiences a problem” [17]. Without this fine-grain fault isolation, fixing the problem becomes overkill in most situations. Any general fault will be addressed with a similar approach: restart the component or application. This is not always an optimal or applicable solution. With systems that are considered real-time and critical, often the downtime required for such an overzealous solution is not available. With the finer level of fault isolation, small problems can be fixed in a way that does not cripple the system even for a short period. A second issue is tool integration. Seamless integration is “especially important in the context of self-healing systems since no human can be involved in manually
  • 9. transforming tool outputs or invoking tools” [7]. This integration has to be performed with multiple tools. Current self-healing systems are so complex that no single tool encompasses all the needs. With multiple tools added to an already complex system, the system can become unwieldy. To ameliorate this complexity, using tools that have been thoroughly tested is of the utmost importance. In addition to this, managing the growing complexity of fix models can grow exponentially as the system grows. Predetermined solutions and exhaustive solution searches can also lead to problems. The solution space can grow exponentially and determining the best solution to a problem can be subjective with computing systems obviously only capable of making objective decisions. The third issue facing self-healing systems follows from a potential solution to the previously described solution. Building solutions to problems at runtime or at component/application integration time based on detailed models of the system can be a potential solution but the main issue with this is that “in an open system, upfront system analysis is at best of limited, heuristic usefulness” [8]. Because of the fact that open systems tend to be dynamic and are molded by their environment, design-time models become less relevant as the system grows and changes during runtime. While these design-time models are important, a combination of all types of solutions is needed to build robust systems that can grow and morph into what is truly needed. Tools There are many tools designed for building autonomic and self-healing systems. The first and most widely used tool is the MAPE-K loop. The MAPE-K loop stands for monitor, analyze, plan, execute and knowledge. The monitor “collates and aggregates
  • 10. information received into the system and attempts to characterize any symptoms relating to the way the system is running” [12]. The analyze phase processes the symptoms to determine if the issues at hand need to be addressed. At the plan step, the system decides what and how changes can be handled for a successful implementation. The execute stage is where the plan is implemented and activated. After a completed execute phase, the knowledge phase of the cycle attempts to determine if the implementation of the plan was successful in instigating the necessary changes. Figure 1. MAPE-K Loop Design [12]
  • 11. The MAPE-K loop design was first proposed by IBM as a solution to autonomic computing design. Systems built with the MAPE-K ideal tend to be robust. “The MAPE- K loop is controlled by a manager, an embedded part of the autonomic element that coordinates the individual activities” [1]. This manager is an integrated part of the underlying system. Often systems will have several managers running the MAPE-K loop simultaneously, autonomously and distributed. The way that the autonomic managers interact with each other is determined by the autonomic computing architecture. This integration of several managers leads to the next logical step of developing autonomic software product lines (ASPLs). ASPLs can self-manage a large and complex system and interact with other systems both local and distributed in order to deal with product variations and dynamic system changes. There has been a good deal of research into the ASPL concept and the Software Engineering Institute (SEI) has developed a framework. This framework divides systems into three general categories: core assets development, product development and management. The core assets are the basic components in the SPL. They can range from business artifacts to reference architectures. The product development is what is built with the core assets. They comprise the larger portions of the system that perform the objectives of the design. The final portion is the management. The management is what provides maintenance to all product developments as well as monitors the system to determine where potential problems will likely occur based on models. “Fractal is an advanced component model and associated on-growing(sic) programming and management support devised initially by France Telecom and INRIA since 2001” [5]. The Fractal model is based on component-based software engineering. It
  • 12. makes use of components, interfaces, which are interaction points between those components, and bindings which are the communication channels between components. Fractal also uses the concepts of membranes and contents. “The membrane exercises an arbitrary reflexive control over its content” [5]. A membrane is made up from a set of controllers. “The model is recursive with sharing at arbitrary levels” [5]. The model is programming language independent. It is extensible. Bindings are controlled through the specific programming. “The Fractal project targets the development of a reflective component technology for the construction of highly adaptable and reconfigurable distributed systems” [5]. Fractal enforces a limited number of architectural structures. This allows the systems to be more robust as the specified component need not exist at runtime in order for successful management. The final tool analyzed is SmartAdapters. SmartAdapters are used to decrease the complexity of dynamically adaptive systems. The first step in the use of SmartAdapters is the maintenance of a high-level representation model of the running systems. Maintaining the high-level model allows for a quicker and more through response to issues as they arrive. “SmartAdapters automatically generate an extensible Aspect-Oriented Modeling framework specific to [the] metamodel” [16]. These metamodels are used to control the potentially exponential growth of solutions. “Using Aspect-Oriented weavers, whole configurations can be built on-demand by selecting a set of aspects in practice [using] SmartAdapters” [16]. Components that occur in all configurations are the base models and are used to weave the aspects of the system. “In SmartAdapters, an aspect is composed of three parts: i) an advice model, representing what [to] weave, ii) a pointcut model, representing where [to] weave the
  • 13. aspect and iii) weaving directives specifying how to weave the advice model at the join points matching the pointcut model” [16]. The advice model is a portion of the model that is potentially having an issue. The pointcut model also represents a portion of the model but it is described by the roles in the system. The weaving directives specify how to weave an aspect building from the advice model to the pointcut model using the domain- specific language of the system. Implementations Early self-managing projects were funded by DARPA for the military. The first was Situational Awareness System (SAS). It was created to aid in communication between soldiers on the battlefield. The communication devices had to be durable and able to deal with harsh conditions and potential jammers to the communication channels. The design of the system was a distributed peer-to-peer systems with self-healing communication channels. “The DARPA Self-Regenerative Systems program started in 2004 is a project that aims to develop technology for building military computing systems that provide critical functionality at all times, in spite of damage caused by unintentional errors or attacks” [10]. The four aspects of this project are (1) software made resistant to errors and attacks, (2) binary code that is modifiable to make attacks harder when trying to exploit vulnerabilities, (3) a scalable architecture that is intrusion tolerant and (4) the ability to build systems that can attempt to detect malicious inside users and block attempts to attack the system.
  • 14. NASA, in 2005, began work on the Autonomous NanoTechnology Swarm (ANTS). The project was designed to launch a swarm of 1000 small spacecrafts into an asteroid belt and use the information gathered to determine which asteroids were deemed interesting for further investigation. The ships would be required to use autonomic techniques to continually elect a leader and rebuild communication channels. MAPE-K implementations include the Autonomic toolkit, ABLE, Kinesthetics eXtreme and Self-Management Tightly Coupled with Application [10]. The Autonomic toolkit is a prototype implementation of the MAPE-K loop built in Java but able to communicate through XML to other applications. ABLE is also a toolkit designed by IBM but it is designated for use in multi-tangent systems that need self-management implementations. Kinesthetics eXtreme is a complete autonomic loop designed mainly in Java that is focused on adding autonomic abilities to legacy systems that may not have been designed with autonomic capabilities. Finally, Self-Management Tightly Coupled with Application is a project with the goal of developing middleware frameworks that offer self-management functionality to applications. There are currently eight platforms that support Fractal components in multiple programming languages. “Julia was historically (2002) the first Fractal implementation” [5]. It was developed and used by the France Telecom. It is based in Java and was developed to prove that component-based systems did not have to perform inefficiently. “Think is a C implementation of Fractal” [5]. Think is also available through France Telecom but has development assistance through STMicroelectronics. Think is used to build kernels of all sorts. The kernels range from exo-kernels and micro-kernels to low memory complete operating systems. “ProActive is a distributed and asynchronous
  • 15. implementation of Fractal targeting grid computing” [5]. France Telecom developed ProActive as middleware for parallel, concurrent and distributed computing grids. It is object based and allows for asynchronicity deployment and management. “AOKell is a Java implementation by INRIA Jacquard” [5]. It is similar to Julia but based on AOP using membranes for load time weaving. Its performance is similar to that of Julia. “FractNet is a .Net implementation of the Fractal component model developed by the LSR laboratory” [5]. It is a port of AOKell to the .Net platform. It is similar in design and performance also. “Flone is a Java implementation of the Fractal component model developed by INRIA Sardes for teaching purposes” [5]. It is not a full implementation but instead is a group of APIs that simplify the Fractal model so that it is more easily understood by students. “FracTalk is an experimental Smalltalk implementation of the Fractal component model developed at Ecole des Mines de Douai” [5]. FracTalk focuses on dynamic elements in component-based programming. “Plasma is a C++ experimental implementation of Fractal developed at INRIA Sardes” [5]. It is dedicated to building multimedia applications that are self-adaptive. Fractal also has a complete repertoire of open components for middleware and operating systems. The smart home feature model is an autonomic computing solution described by Cetina [3]. The design is such that a smart home can be fully automated and yet still dynamically adjust to the changing patterns of the residents and the influx and removal of components. The goal of autonomic computing for smart homes is “to reduce … configuration effort, [so that] smart homes can provide the following autonomic capabilities: Self-configuration. New kinds of devices can be incorporated into the system; Self-healing. When a device is removed or fails, the system should adapt itself to
  • 16. offer its services using alternative components; Self-adaptation. Users’ needs differ and change over time. The system should adjust its services to fulfill user preferences” [3]. This behavior is similar to context adaptation. The Model-Based Reconfiguration Engine (MoRE) is used to implement the management of the models used in the system. The operations of the engine are used to determine how to evolve the system to meet future needs and reconfigurations. Reconfiguration actions fall in to three categories: 1. Component actions: components that must be installed, uninstalled or reconfigured 2. Channel actions: Communications for active components 3. Model actions: updates to the MoRE model after the component and channel actions occur. [3] Figure 2. Smart Home Model [3]
  • 17. Because any change to the system can trigger a need for a change to the model, the high-level models must be maintained and updated. This allows the system to continually gather and process information about the dynamicity of the system and affords the MoRE the ability to develop and implement solutions. Conclusion Self-healing and autonomic systems have begun to integrate themselves in to many more generalized computing systems. The tools that are used to build these systems have been developed and optimized to make the best use of the underlying models and component architectures. The MAPE-K loop and the Fractal modeling system are two of the most accepted and widely used tools for developing these autonomic systems. The MAPE-K loop, proposed by IBM, is used to develop manager driven systems that interact to build solutions. These solutions are then integrated into the running system. Many active self-healing systems are based on MAPE-K. These include the Autonomic toolkit, ABLE, Kinesthetics eXtreme and Self-Management Tightly Coupled with Application. All of these implementations have well-documented success. The Fractal modeling system, based on component models, was designed and implemented by France Telecom and has been used to implemented multiple systems across many different languages. The Fractal modeling system is much more complicated than the MAPE-K loop but the systems appear to be more straight-forward to implement. These and other tools have been used to build multiple systems ranging from smart homes to deep-space multi-object space probes. The unifying aspect of all the systems built to be autonomic and self-healing is that they tend to be complicated. This
  • 18. level of complexity can grow exponentially as the size of the system grows. To overcome this complexity, models and metamodels are often used to describe the runtime systems. These models are built to speed the processing of solutions but as often as not just add more complexity and sprawl to the system. While all of the tools described and used to build the implementations of self- healing systems are useful and well-developed, they would likely work best in a conglomeration. By merging the best aspects of the tools and design models, functionality will grow at a faster rate than the complexity of the implementation. Managers used in the MAPE-K loop could be utilized in the MoRE to build more cohesive systems. These systems and managers using the Fractal model management ideals would be more likely to have lower reaction time to changes in the environment and additionally would suffer from less model creep when describing the potential solution space. In addition to this merger of the best portions from multiple tools, there is a potential to use database-style storage of known good configurations as they are implemented along with new advanced and streamlined searching technologies to expedite the implementation of changes as needed. Autonomic systems and self-healing systems are going to become a part of most systems as the computing industry embraces the ideas and realizes the inherent good in the design. These systems will become more complex and all-encompassing as they become more commonplace. New tools will have to be developed to aid in the implementation of these new systems and a better understanding of system modeling will be needed by all individuals that interact with the development of these systems. Industry will embrace the usefulness of these systems but in order for there to be success in the
  • 19. implementations steps must be taken to merge the best aspects of the tools and to train the engineers on modeling and best practices.
  • 20. References [1] Abbas, N.; Andersson, J.; Loewe, W.; , “Autonomic Software Product Lines (ASPL). Proceeding of the 7th international conference on Autonomic computing (ICAC '10). ACM, New York, NY, USA, pp.324-331. 2010 [2] Blair, G.; Bencomo, N.; France, R.B.; , "Models@ run.time," Computer , vol.42, no.10, pp.22-27, Oct. 2009 [3] Cetina, C.; Giner, P.; Fons, J.; Pelechano, V.; , “Autonomic computing through reuse of variability models at runtime: the case of the smart homes” Computer , vol.42, no.10, pp.37-43, Oct. 2009 [4] Cheung-Foo-Wo, D.; Tigli, J.; Lavirotte, S.; Riveill, M.; “Self-adaptation of event- driven component-oriented middleware using aspects of assembly.” Proceedings of the 5th international workshop on Middleware for pervasive and ad-hoc computing: ACM, New York, NY, USA, pp.31-36. 2006 [5] Coupaye, T.; Stefani, J-B.; “Fractal component-based software engineering.” Proceedings of the 2006 conference on Object-oriented technology: ECOOP 2006 workshop reader (ECOOP'06), Springer-Verlag, Berlin, Heidelberg, pp.117- 129. 2006 [6] Dabrowski, C.; Mills, K.; , “Understanding self-healing in service-discovery systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02), ACM, New York, NY, USA, pp.15-20. 2002
  • 21. [7] Dashofy, E.; Hoek, A.; Taylor, R.; , “Towards architecture-based self-healing systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02), ACM, New York, NY, USA, pp.21-26. 2002. [8] Fickas, S.; Hall, R.; , “Self-Healing Open Systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02), ACM, New York, NY, USA, 99- 101. 2002 [9] George, S.; Evans, D.; Davidson, L.; ,“A biologically inspired programming model for self-healing systems.” Proceedings of the first workshop on Self-healing systems (WOSS '02), ACM, New York, NY, USA, 102-104. 2002 [10] Huebscher, M.; McCann, J.; “A survey of autonomic computing—degrees, models, and applications.” ACM Comput. Surv. 40, 3, Article 7 (August 2008), 28 pages. 2008 [11] Maoz, S.; , "Using Model-Based Traces as Runtime Models," Computer , vol.42, no.10, pp.28-36, Oct. 2009 [12] Mengusoglu, E.; Pickering, B.; , “Automated management and service provisioning model for distributed devices.” Proceedings of the 2007 workshop on Automating service quality: Held at the International Conference on Automated Software Engineering (ASE). ACM, New York, NY, USA, pp.38-41. 2007 [13] Mikic-Rakic, M.; Mehta, N.; Medvidovic, N.; , “Architectural style requirements for self-healing systems.” In Proceedings of the first workshop on Self-healing systems WOSS '02, ACM, New York, NY, USA, pp.49-54. 2002.
  • 22. [14] Morin, B.; Barais, O.; Jezequel, J.-M.; Fleurey, F.; Solberg, A.; , "Models@ Run.time to Support Dynamic Adaptation," Computer , vol.42, no.10, pp.44-51, Oct. 2009 [15] Morin, B.; Fleurey, F.; Bencomo, N.; Jezequel, J.-M.; Solberg, A.; Dehlen, V.;Blair, G.; , “An Aspect-Oriented and Model-Driven Approach for Managing Dynamic Variability,” Proceedings of the 11th international conference on Model Driven Engineering Languages and Systems (MoDELS '08), Springer-Verlag, Berlin, Heidelberg, pp.782-796. 2008 [16] Morin, B.; Barais, O.; Nain, G.; Jezequel, J.; , “Taming Dynamically Adaptive Systems using models and aspects.” Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, pp.122-132. 2009 [17] Shapiro, M.; “Self-Healing in Modern Operating Systems.” Queue 2, 9, pp.66-75, 2008 [18] Weyns, D.; Malek, S.; Andersson, D.; , "FORMS: a formal reference model for self- adaptation.” Proceeding of the 7th international conference on Autonomic computing (ICAC '10). ACM, New York, NY, USA, pp.205-214. 2010