S-CUBE LP: Self-healing in Mixed Service-oriented Systems


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

S-CUBE LP: Self-healing in Mixed Service-oriented Systems

  1. 1. S-Cube Learning Package Self-* infrastructures:Self-healing in Mixed Service-oriented Systems TU Wien (TUW) Harald Psaier, TUW www.s-cube-network.eu
  2. 2. Learning Package Categorization S-Cube Self-* Service Infrastructure and Discovery Support Self-* Service Infrastructure Self-healing in SOA © Harald Psaier
  3. 3. Learning Package Overview Problem Description Self-healing research Example: Self-healing policies for Mixed Service-oriented Systems Conclusions © Harald Psaier
  4. 4. Mixed Service-oriented Systems Open dynamic service environment to humans and services – distributed coordination and communication – no predefined top-down- but flexible compositions Interactions are ad-hoc and dynamic and usually in boundaries of an activity Mixed System (MS) include a mixed collaboration between two main and distinct types of services: Human-Provided Services (HPS) – Human provide knowledge/skills/expertise as services – Close gab between required human expertise and difficulty of implementation as software Software-Based Services (SBS) © Harald Psaier
  5. 5. Examples of mixed systems Review services: Include shared reviewing activities arround documents, code, and evaluations Innovation services: foster various ideas for a new product design Support services: provide solutions for questions and problems on multiple or selected subjects Current platforms with massive use of MSs: crowdsourcing platforms. These include, e.g., Amazon’s Mechanical Turk, Yahoo answers, uTest. © Harald Psaier
  6. 6. Let’s Consider a Scenario (1) human service activity scopes inv oke Service Registry Process Model Humans and services interact to perform work described by the activities in the process model. © Harald Psaier
  7. 7. Let’s Consider a Scenario (2) X in vo Run-Time Environment ke Deployment with Dependency Adaptation Monitoring Management Self-healing Process Model Policies One of the services fails to complete an assigned activity. In a loop self-healing monitors, recognizes and adapts to the incident © Harald Psaier
  8. 8. Let’s Consider a Scenario (3) The reaction is controlled by policies connected to the process activities The challenge of the autonomous system is in particular the complexity of MSs (c.f., dynamicity of MSs). The goal of Self-* properties is to support administration in system management. In particular the tasks of self-healing in MS include: – Avoid errors in design – Avoid errors in configuration – Replace failing services at runtime – Handle adaptation complexity transparently to keep system healthy – Support need of service maintenance © Harald Psaier
  9. 9. Learning Package Overview Problem Description Self-healing research Example: Self-healing policies for Mixed Service-oriented Systems Conclusions © Harald Psaier
  10. 10. What is self-healing A self-healing system should recover from the abnormal (or “unhealthy”) state and return to the normative (“healthy”) state, and function as it was prior to disruption. A system with self-healing properties can be identified as a system that comprises fault-tolerant, self-stabilizing, and survivable system capabilities and, if needed, must be human supported. The 3 common states are Normal, Broken, and Degraded. The challenge is to identify Degraded in time and to recover soundly. © Harald Psaier
  11. 11. Self-healing origins Fault-tolerant system refers to a system that continues working at a reasonable degree in the presence of faults Self-stabilizing systems refers to a system that continuously stabilizes the system from any perturbations. Survivable systems sustain the unexpected © Harald Psaier
  12. 12. Self-healing research: autonomiccomputing (1/2) IBMs autonomic computing research envisions a layered structure that can manage itself to given high-level objectives from administrators. Motivated by the amount spent on and overwhelming effort in system maintenance The research tries to cover all adaptable layers down to network and operating system Defines 4 properties for a self-managing system (self-CHOP): – self-configuring: The ability to readjust itself “on-the fly” – self-healing: Discover, diagnose, and react to disruptions – self-optimization: Maximize resource utilization to meet end-user needs – self-protection: Anticipate, detect, identify, and protect itself from attacks. © Harald Psaier
  13. 13. Self-healing research: self-adaptivesystems (2/2) Self-adaptive systems evaluate their behavior and adapt on system irregularities or when better functionality or performance is possible The research primarily covers the application and the middleware layers and focuses on the system as a whole. Includes also self-healing as a combination of self-diagnosing and self-repairing with the capabilities to diagnose and recover from malfunctions. © Harald Psaier
  14. 14. Self-healing characteristics What:  Continuous availability by compensating the dynamics of a running system. Why:  maintenance of health momentarily and ...  Enduring continuity by resilience against unintentional behavior How:  Detect disruptions  Diagnose root cause  Derive recovery strategy © Harald Psaier
  15. 15. Self-healing requirements A closed loop design which integrates sufficient sensor and effector interfaces. A status knowledge database and logic for an accurate state recognition State recognition must include failure classification for a adequate handling of the problem A collection of recovery policies in the format of <trigger, rule, action>. Usually this collection is preconfigured but must also be configurable to obtain… Fitness and evolutionary aspects. Self-* properties generally are applied to maintain a long-term use of the system © Harald Psaier
  16. 16. Self-healing loop A self-healing loop comprises 3 common states: detecting, diagnosing, recovering These are connected to the sensors and effectors of the system In the background, a knowledge-base supports the states detecting: filters any suspicious status information diagnosing: does root cause analysis and calculates an appropriate recovery recovery: carefully applies the planned adaptations © Harald Psaier
  17. 17. Self-healing states The most general states in self-healing research are: Normal: The system is in a “healthy” state. In particular, it signalizes intentional functioning and all requirements are met as expected. Broken: This is an “unhealthy” system. It can generally be identified by an unacceptable response which most probably is the cause of a failure or error. Degraded: The system is in a fuzzy transition zone between the former. Behavior is expected to be unpredictable and parts of the system will drift from acceptable state to some failure state. In large-scale system in many cases this is recognizable by considerable performance loss. If redundant, in most cases the size provides the system with additional recovery time. © Harald Psaier
  18. 18. Failure classification: Failure types (1/2) The main goal of this classification is to assist root cause analysis and find the adequate resolution for the failure. Common failure types are: – Crash failure: undetectable malign service interruption – Fail-stop: detected failure caused a service interruption – Transient: instantaneous transparent interruption with measurable side-effects – Omission: message loss, transmission errors in communication infrastructure – Performance: violation of agreements on execution time – Arbitrary: any type of failure with no specific pattern © Harald Psaier
  19. 19. Failure classification: Policies (2/2) Policies provide configuration and settings for detection and recovery. There are three different types of policies: – Action policies: These are reactive policies with a specialized trigger and immediate response is expected. – Goal Policies: These define a set of desired states. They also calculate the set of actions for the transition from the current (failure affected) to a desired state – Utility Function Policies: the set of states is connected to an utility function. Problem solving includes extensive analysis including history information, adaptation knowledge and a comprehensive system awareness Common recovery include: – Replacement, balancing, isolation, persistence, redirection, etc. © Harald Psaier
  20. 20. Fitness and evolution Current large-scale systems, especially self-* enhanced, must be designed for long-term service. This means they must be resilient to changes and allow any required future variations. The issues to keep in mind are: – Most arising requirements are not known a-priori but expose over time – Intervention and changes on the current system must respect the system’s essential functionality and avoid malicious failures at any cost – adaptation might reach its limits in resources The current solution is to create self-* systems with exposed configuration management and thus human assisted adaptations © Harald Psaier
  21. 21. S-Cube contributions toSelf-healing/-* research Psaier H., Dustdar S. (2010). A survey on self-healing systems: approaches and systems. Computing. Springer Wien. Di Nitto, E., Ghezzi, C., Metzger, A., Papazoglou, M., Pohl, K. (2008). A journey to highly dynamic, self- adaptive service-based applications. Automated Software Engineering, 15(3), p 313—341. Springer. Hielscher, J., Kazhamiakin, R., Metzger, A., Pistore, M. (2008). A framework for proactive self-adaptation of service-based applications based on online testing. Towards a Service-Based Internet. P 122—133. Springer. Pernici, B. (2009). Self-healing Systems and Web Services: The WS-Diamond Approach. Business Process Management Workshops. p 440—442. Springer. Psaier H., Skopik F., Schall D., Dustdar S. (2010). Behavior Monitoring in Self-healing Service-oriented Systems. 34th Annual IEEE Computer Software and Applications Conference (COMPSAC), July 19-23, 2010, Seoul, South Korea. IEEE. Papazoglou, M.; Pohl, K.; Parkin, M.; Metzger, A. (2010). S-Cube - Towards Engineering, Managing and Adapting Service-Based Systems. Springer. 1st Edition., 2010, XVIII, 374 p.<NAME> – SoE1.1 Virtual Campus learning material © Harald Psaier – 21/<Max>
  22. 22. Learning Package Overview Problem Description Self-healing research Example: Self-healing policies for Mixed Service-oriented Systems Conclusions © Harald Psaier
  23. 23. Mixed Service-oriented Systems:Challenges Mixed Service-oriented Systems aka. Mixed Systems (MS) are open to humans and services. Inherit all properties of SOA including distributed, ad-hoc interactions along with a communication infrastructure and coordination. … and aforementioned properties … and examples What are the challenges in MS? – the „openness“ of the system allows to join many and possibly unreliable services – In particular humans are unreliable related to their, e.g., different working hours, particular preferences, current mood, and context. © Harald Psaier
  24. 24. Scenario: Expert Network Includes two parties: the service consumer with a request as an activity – and experts and resources in the service network. The network combines all knowledge required to process jointly the activity The key is to share the subtask of the activity among the appropriate experts for the subtask. This is usually solved by delegation and re-delegation. However can fail on individual misbehavior. Main challenge: How to guarantee that the activity is complete, also, on time? © Harald Psaier
  25. 25. Delegation and processing behavior A model of the network helps to analyze a possible problem – HPS and SBS are represented as nodes – Interactions are allowed over established channels – The current work load of nodes is indicated by the queues At runtime the model additionally indicates – The delegation directions and frequency by the arrow direction and the thickness of the connection – The current work load is indicated by the queue fill state With the model we can present two main patterns of misbehavior © Harald Psaier
  26. 26. 1st misbehavior pattern: Delegation Factory The delegation factory misbehavior pattern: – a accepts and delegates particular tasks frequently – However, a processes few tasks and has a low task-queue The factory behavior impact: – produces unusual amounts of task delegations – tasks miss their deadline – leads to performance degradations of the entire network © Harald Psaier
  27. 27. 2nd misbehavior pattern: DelegationSink The delegation sink Misbehavior pattern: – d accepts too many offered tasks – However, d processes slow (e.g., overestimates its capability vs. received overload) Sink behavior impact: – produces unusual amounts of task delegations – tasks miss their deadline – leads to performance degradations of the entire network © Harald Psaier
  28. 28. Observing and avoiding misbehavior A successful self-healing architecture that can handle the misbehavior situations must – avoid unpredictable system behavior leading to faults – indentify and handle degraded states. Degraded states here relate to poor progress in activity process because of increasing factory/source behavior Feasible adaptation actions must not include direct punishment of the misbehaving participating experts. Instead a transparent temporary decoupling from the system is considered. Also, the architecture must be aware of the side-effects of the healing actions. – a feedback loop informs about the success of the adaptation © Harald Psaier
  29. 29. The VieCure Framework Between the MS atop a monitoring and adaptation layer connects to the framework. From the interaction logs events are derived and diagnosed. The Behavior Registry provides the metrics to identify the misbehavior patterns During recovery the interaction channels are adjusted © Harald Psaier
  30. 30. Self-healing steps on misbehavior System is in prefect health An overload in node b is detected Assuming a causes the most overload traffic, the recovery action regulates channel (i) between a and b However, b remains overloaded. An additional unknown cause is assumed An alternative for b is found and channels to d are opened Channels (ii) and (iii) are now available © Harald Psaier
  31. 31. Learning Package Overview Problem Description Self-healing research Example: Self-healing policies for Mixed Service-oriented Systems Conclusions © Harald Psaier
  32. 32. Summary Self-healing research principles – A self-healing system should recover from the abnormal (or “unhealthy”) state and return to the normative (“healthy”) state, and function as it was prior to disruption. – The 3 common states are Normal, Broken, and Degraded. The Challenge is to identify Degraded in time and to recover soundly. – In order to recover a self-healing loop is required that detects, diagnose, and recovers the system. Self-healing in MS – the „openness“ of the system and the generally unpredictable human behavior are sources of system degradation. – The two presented misbehavior models are delegation factory and sink. Either a node delegates without respecting the capacity of the neighbors or a node overestimates its capacity. – The VieCure Framework considers and resolves both cases. © Harald Psaier
  33. 33. Further S-Cube ReadingPsaier H., Juszczyk L., Skopik F., Schall D., Dustdar S. (2010). Runtime Behavior Monitoring and Self-Adaptation in Service-Oriented Systems. 4th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), September 27 - October 01, 2010, Budapest, Hungary. IEEE. .Psaier H., Skopik F., Schall D., Juszczyk L., Treiber M., Dustdar S. (2010). A Programming Model for Self-Adaptive Open Enterprise Systems. 5th Workshop of the 11th International Middleware Conference(MW4SOC), November 29 - December 3, 2010, Bangalore, India. ACM.Psaier H., Skopik F., Schall D., Dustdar S. (2011). Resource and Agreement Management in DynamicCrowdcomputing Environments. 15th IEEE International EDOC Conference (EDOC), 29th August - 2ndSeptember, 2011, Helsinki, Finland, IEEE. Dustdar, S.; Schall, D.; Skopik, F.; Juszczyk, L.; Psaier, H. (Eds.) (2011). Socially Enhanced Services Computing -- Modern Models and Algorithms for Distributed Systems. (1) p. 37. Springer © Harald Psaier
  34. 34. Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement 215483 (S-Cube). © Harald Psaier