SlideShare a Scribd company logo
Cognitive Behavior Analysis
framework for Fault Prediction
     in Cloud Computing
   (NoF’12, Nov 21st-23rd, 2012, Tunis, Tunisia)

Reza FARRAHI MOGHADDAM, Fereydoun FARRAHI MOGHADDAM,
             Vahid ASGHARI, Mohamed CHERIET

     Synchromedia Lab, ETS, University of Quebec, Montreal, Quebec, Canada




                Laboratory for Multimedia
              Communication in Telepresence
Outline

        Motivation for Behavior Analysis (BA) and
         Failure Prediction
            Proposed BA framework
              Probabilistic Behavior Analysis
              Simulated Probabilistic Behavior Analysis
              Behavior-Time Profile Modeling and Analysis

        Scalability of the Proposed BA framework
        Conclusions and Future Prospects

11/23/2012                         NoF’12                    2
Why Behavior Analysis (BA)?
            Benefits of BA for Failure Prediction
              Preventing Service-Layer or System-Level failures
              Enabling operation in “unallowable” states to save
              energy and cost, and also to reduce footprint
            Profiling the Actors
              Profiling end users, service providers, and other
              actors in a computing business (for example, a
              telecom business)
              The ensemble of these actors resembles more an
              ecosystem than a system
              Profiling helps in:
               • Smart management of resources
               • Building reputations and trust for actors
               • Identifying and isolating wrong-acting actors and threats
11/23/2012                                  NoF’12                           3
Why Failure Prediction?
A new failure source: Cyclic ElastoPlastic Operation (CEPO)



                 Cyclic
              elastoplastic                         Hardware factor
                operation



       Software           Human              Middleware          Other
                                               factor           factors
        factor            factor



 11/23/2012                         NoF’12                                4
Cyclic elastoplastic operation (CEPO):
              in Civil and Mechanical Engineering

 Safe operation in plastic mode
 Repeatable transitions between elastic and
 plastic modes
                                           Plastic regime
 Cyclic operation is the key
                                                             Plastic
                                        Elastic regime
                                                         Collapse Point




11/23/2012                   NoF’12                               5
Cyclic elastoplastic operation (CEPO):
                     its counterparts in Computing Systems

     Carbon Enabling Effect and Green Push: Doing more with less
     1. PUE of Data centers
             Increasing inlet air flow temperature (2-4% energy saving per 1°C increase)
                 For example: PUE = 1.5, 20% saving (5°C)  PUE = 1.2
             Reducing or eliminating fans
         Failure at component level (servers) increases with temperature (ASHRAE TC
           9.9. 2011)
         Failure Prediction and Behavior Analysis can isolate component-level failures
           (even before their occurrence) in order to prevent system-level failures (which
           may violate SLO constraints)
         Again, cyclic operation is the key to success
     2. Can be applied to Bandwidth too??                                                                           Uncertainty increases with the
                                                                                                                  length of stay in the plastic mode
                                                                                      Bearable stress level
                                                                                                              Plastic mode




                                                                 Stress on System
                                                                                      Elastic mode




                                                                                    Allowable Elastic range     Inlet temperature
11/23/2012                                       NoF’12                                                                                     6
The Proposed BA framework

            An Ensemble-of-Experts approach:
              The sub-paradigms
               • Probabilistic Behavior Analysis
               • Simulated Probabilistic Behavior Analysis
               • Behavior-Time Profile Modeling and Analysis
            Two different pictures:
              Systemic picture
              Ecosystemic picture




11/23/2012                              NoF’12                 7
BA Framework:
             Systemic picture




11/23/2012        NoF’12        8
BA Framework:
             Ecosystemic picture




11/23/2012          NoF’12         9
Multiple layers in
                          BA framework

  Layers vs (physical and non-
  physical) location: Toward Location   Various layers
  Intelligence in Computing systems         Hardware (Compute/Network)
                                            Hardware Drivers/Software
                                            Middleware/Protocols
                                            Virtualware
                                            Virtualware Drivers/Software
                                            Applications (Software)




11/23/2012                          NoF’12                              10
Sub-paradigm 1:
                  Probabilistic Behavior Analysis
      Each layer of system is considered as a graph
      Sub-graphs constitute super-components of

       higher levels (vertical scaling)
      The behavior is modeled as PoA



            The PoA is related to CDF of failure:


            The Differential Density Function (DDF):


11/23/2012                        NoF’12                11
Sub-paradigm 1:
             Probabilistic Behavior Analysis
       An example of a 2-component system:




11/23/2012                 NoF’12              12
Sub-paradigm 1:
                     Tanh distribution
             Tanh CDFs              Tanh DDFs




11/23/2012                 NoF’12               13
Sub-paradigm 1:
                 Probabilistic Behavior Analysis

             Lanl05 database        Lanl05 database statistics
                                    Duration: 9 years

                                    Retrieved from FTA
                                    Availability statistics:
                                           19874 records
                                           mean = 1777.99 (hrs)
                                           std = 3462.33
                                           Skewness = 3.09
                                           GoF p-value (Tanh) = 0.500
                                           GoF p-value (Weib.) = 0.416
                                       Unavailability statistics:
                                         mean = 5.88 (hrs)
                                         std = 78.39
                                         Skewness = 43.96
11/23/2012                     NoF’12                                     14
Sub-paradigm 2:
                 Simulated Probabilistic Behavior Analysis



      For highly-complex system topologies, the CDFs of
      high-level sub-graphs and components is estimated
      using simulation based on CDFs of basic components
      It can be also used to validate the calculations of the
      first sub-paradigm
      Monte Carlo strategy is used
      In each run, the fault time of each basic component is

      calculated randomly based on its CDF
      The cumulative behavior of all runs of the high-level
      sub-graph is used to estimate its CDF
      1000-run simulations have been used


11/23/2012                       NoF’12                          15
Sub-paradigm 2:
              Simulated Probabilistic Behavior Analysis



  MC simulation: G_1,1            MC simulation: G_2,1




11/23/2012                    NoF’12                      16
Sub-paradigm 2:
             Simulated Probabilistic Behavior Analysis



  MC simulation: CDFs            MC simulation: DDFs




11/23/2012                   NoF’12                      17
Sub-paradigm 3:
                Behavior-Time Profile Modeling and Analysis


       Time-profile of components characteristics collected
       by opportunistic agents across the system (or
       ecosystem)
       Time-profile of state transitions in components and

       also higher level sub-graphs at various layers
       collected or injected by BSU
       Machine learning methods are used to match the
       state transitions with the characteristics
         Support Vector Machine (SVM)
         Bayesian networks
         Agent-based data mining
         Fuzzy logic
         ···
11/23/2012                       NoF’12                        18
Sub-paradigm 3:
                      Behavior-Time Profile Modeling and Analysis


         Four motivations for behavior-time profile
          analysis:
              Spontaneous faults compared to cause-and-effect
               faults have been reduced significantly
               • Less pure hardware-caused faults compared to interaction-
                 caused faults
              Patterns and cycles in fault occurrence and in
              general in behavior
              Handling of faulty systems that do not have any
              faulty components
               • context-sensitive diagnosis [Lamperti2011]
              handling of gradual events

11/23/2012                              NoF’12                               19
Sub-paradigm 3:
             Behavior-Time Profile Modeling and Analysis

     A simple example:




11/23/2012                    NoF’12                       20
SLA and Service Grading
        Even without considering elastoplastic use case, BA can help in
         upgrading a service (for example, to the telco grade)
        Probability of Availability (PoA): Lease-based business models
              Predicting, isolating and resolving failure events at component or sub-
               system levels before they get to the Service Layer.
        Probability of Completion (PoC): Task-based business models
        Countermeasure options:
              Put out high risk components (maintenance tickets)
              Temporal redundancy
        But, all this depends on the ability to predict high risk or failure

        An example:
          No BA: Major fault mode with MTBF = 10 weeks, MTTR = 10
           minutes  52:09 minutes downtime a year < 52:33  4nines
          With BA: 90% of faults are detected 15 minutes before system
           failure  5:13 minutes downtime a year < 5:15  5nines

11/23/2012                                     NoF’12                                    21
Countermeasures and
                        cost savings

                                      Two alternative modes to save
        An example: Full system       both energy (cost) and life
                                      expectancy of components




11/23/2012                        NoF’12                              22
Scalability

    Horizontal and Vertical scaling            Federated scaling




11/23/2012                            NoF’12                       23
Conclusions and Future
                                  Prospects
        A multi-paradigm, multi-layer, multi-level cognitive behavior analysis
        framework is introduced
        Three sub-paradigms (cross-cover):
          Statistical inference
          Statistical inference by means of simulation
          Time-profile modeling and analysis
       Multiple granularity analysis and scalability:
         Horizontal, vertical and hierarchical scaling
       Including other layers in the analysis: virtualware and middleware
       Estimation of PoA to improve system dependability and its service grade
       A new distribution is introduced: Tanh distribution
          validated on a real database: lanl05 database
       Future Prospects:
          Large-scale operation of each sub-paradigm
          Cognitive Response: Multi-Expert Decision Making, Cognitive Models
          Integration of the framework with real computing systems:
             • OpenStack, Open GSN
          Machine learning techniques for the time-profile modeling sub-paradigm
          Development of more sophisticated distributions


11/23/2012                                      NoF’12                              24
Thanks you, Any question!
                                                      BATG




Reza                       Fereydoun                   Vahid                     Mohamed
FARRAHI                    FARRAHI                     ASGHARI,                  CHERIET,
MOGHADDAM,                 MOGHADDAM,                  Eng., Ph.D., MIEEE        Eng., Ph.D., SMIEEE
Eng., Ph.D., MIEEE         Eng., M.Sc., MIEEE          vahid@emt.inrs.ca         mohamed.cheriet@etsmtl.ca
imriss@ieee.org,           farrahi@ieee.org,
rfarrahi@synchromedia.ca   ffarrahi@synchromedia.ca
    Research Associate            PhD Student              Postdoctoral Fellow   Director of Synchromedia Lab

                             http://www.synchromedia.ca/
                                                  NSERC

More Related Content

Similar to Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing

600.412.Lecture06
600.412.Lecture06600.412.Lecture06
600.412.Lecture06
ragibhasan
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
MediaEval2012
 
Elastic High Performance Applications – A Composition Framework
Elastic High Performance Applications – A Composition FrameworkElastic High Performance Applications – A Composition Framework
Elastic High Performance Applications – A Composition Framework
Hong-Linh Truong
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Reza Farrahi Moghaddam, PhD, BEng
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
TERN Australia
 
Df24693697
Df24693697Df24693697
Df24693697
IJERA Editor
 
M2M Platform-as-a-Service for Sustainability Governance
M2M Platform-as-a-Service for Sustainability GovernanceM2M Platform-as-a-Service for Sustainability Governance
M2M Platform-as-a-Service for Sustainability Governance
Hong-Linh Truong
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
taylor_1313
 
Green Computing Observatory
Green Computing ObservatoryGreen Computing Observatory
Green Computing Observatory
Cecile Germain
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Eswar Publications
 
RadioSense RTSS 2012
RadioSense RTSS 2012RadioSense RTSS 2012
RadioSense RTSS 2012
Qi Xin
 
11.compression technique using dct fractal compression
11.compression technique using dct fractal compression11.compression technique using dct fractal compression
11.compression technique using dct fractal compression
Alexander Decker
 
Compression technique using dct fractal compression
Compression technique using dct fractal compressionCompression technique using dct fractal compression
Compression technique using dct fractal compression
Alexander Decker
 
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
boundary_slides
 
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
IaaS Cloud Benchmarking: Approaches, Challenges, and ExperienceIaaS Cloud Benchmarking: Approaches, Challenges, and Experience
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
Alexandru Iosup
 
Project Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
Project Report on Modeling and Robust Control of Blu-Ray disc Servo MechanismsProject Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
Project Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
Manu Mitra
 
Lean
LeanLean
Lean
ITshare
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live Image
IRJET Journal
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)
alexbaranau
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
Francisco (Paco) Florez-Revuelta
 

Similar to Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing (20)

600.412.Lecture06
600.412.Lecture06600.412.Lecture06
600.412.Lecture06
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
Elastic High Performance Applications – A Composition Framework
Elastic High Performance Applications – A Composition FrameworkElastic High Performance Applications – A Composition Framework
Elastic High Performance Applications – A Composition Framework
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
Df24693697
Df24693697Df24693697
Df24693697
 
M2M Platform-as-a-Service for Sustainability Governance
M2M Platform-as-a-Service for Sustainability GovernanceM2M Platform-as-a-Service for Sustainability Governance
M2M Platform-as-a-Service for Sustainability Governance
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
 
Green Computing Observatory
Green Computing ObservatoryGreen Computing Observatory
Green Computing Observatory
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
 
RadioSense RTSS 2012
RadioSense RTSS 2012RadioSense RTSS 2012
RadioSense RTSS 2012
 
11.compression technique using dct fractal compression
11.compression technique using dct fractal compression11.compression technique using dct fractal compression
11.compression technique using dct fractal compression
 
Compression technique using dct fractal compression
Compression technique using dct fractal compressionCompression technique using dct fractal compression
Compression technique using dct fractal compression
 
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
Matthias Vallentin - Towards Interactive Network Forensics and Incident Respo...
 
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
IaaS Cloud Benchmarking: Approaches, Challenges, and ExperienceIaaS Cloud Benchmarking: Approaches, Challenges, and Experience
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
 
Project Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
Project Report on Modeling and Robust Control of Blu-Ray disc Servo MechanismsProject Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
Project Report on Modeling and Robust Control of Blu-Ray disc Servo Mechanisms
 
Lean
LeanLean
Lean
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live Image
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
 

More from Reza Farrahi Moghaddam, PhD, BEng

40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
40 Gbps Access for Metro networks: Implications in terms of Sustainability an...40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
Reza Farrahi Moghaddam, PhD, BEng
 
A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral Images
Reza Farrahi Moghaddam, PhD, BEng
 
Sustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and TransparencySustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and Transparency
Reza Farrahi Moghaddam, PhD, BEng
 
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
Reza Farrahi Moghaddam, PhD, BEng
 
Challenges and complexities in application of LCA approaches in the case of I...
Challenges and complexities in application of LCA approaches in the case of I...Challenges and complexities in application of LCA approaches in the case of I...
Challenges and complexities in application of LCA approaches in the case of I...
Reza Farrahi Moghaddam, PhD, BEng
 
Sustainability Analysis of Broadband wireless access (BWA)
Sustainability Analysis of Broadband wireless access (BWA)Sustainability Analysis of Broadband wireless access (BWA)
Sustainability Analysis of Broadband wireless access (BWA)
Reza Farrahi Moghaddam, PhD, BEng
 
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...Reza Farrahi Moghaddam, PhD, BEng
 
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
Reza Farrahi Moghaddam, PhD, BEng
 
TIC pour un développement durable de la société (ICT for Sustainable Developm...
TIC pour un développement durable de la société (ICT for Sustainable Developm...TIC pour un développement durable de la société (ICT for Sustainable Developm...
TIC pour un développement durable de la société (ICT for Sustainable Developm...
Reza Farrahi Moghaddam, PhD, BEng
 
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
Reza Farrahi Moghaddam, PhD, BEng
 
Life cycle assessment (LCA) for ICT
Life cycle assessment (LCA) for ICTLife cycle assessment (LCA) for ICT
Life cycle assessment (LCA) for ICT
Reza Farrahi Moghaddam, PhD, BEng
 

More from Reza Farrahi Moghaddam, PhD, BEng (11)

40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
40 Gbps Access for Metro networks: Implications in terms of Sustainability an...40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
40 Gbps Access for Metro networks: Implications in terms of Sustainability an...
 
A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral Images
 
Sustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and TransparencySustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and Transparency
 
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
A Sustainable Future: Potentials of our Tools (ICT and Energy) and Responsibi...
 
Challenges and complexities in application of LCA approaches in the case of I...
Challenges and complexities in application of LCA approaches in the case of I...Challenges and complexities in application of LCA approaches in the case of I...
Challenges and complexities in application of LCA approaches in the case of I...
 
Sustainability Analysis of Broadband wireless access (BWA)
Sustainability Analysis of Broadband wireless access (BWA)Sustainability Analysis of Broadband wireless access (BWA)
Sustainability Analysis of Broadband wireless access (BWA)
 
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...
Reza Farrahi Moghaddam's Progress Report. From the Perspective of the Axe of ...
 
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
Reza Farrahi Moghaddam’s Progress Report within the Perspective of the GSTC P...
 
TIC pour un développement durable de la société (ICT for Sustainable Developm...
TIC pour un développement durable de la société (ICT for Sustainable Developm...TIC pour un développement durable de la société (ICT for Sustainable Developm...
TIC pour un développement durable de la société (ICT for Sustainable Developm...
 
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
Unsupervised ensemble of experts (EoE) framework for automatic binarization o...
 
Life cycle assessment (LCA) for ICT
Life cycle assessment (LCA) for ICTLife cycle assessment (LCA) for ICT
Life cycle assessment (LCA) for ICT
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 

Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing

  • 1. Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing (NoF’12, Nov 21st-23rd, 2012, Tunis, Tunisia) Reza FARRAHI MOGHADDAM, Fereydoun FARRAHI MOGHADDAM, Vahid ASGHARI, Mohamed CHERIET Synchromedia Lab, ETS, University of Quebec, Montreal, Quebec, Canada Laboratory for Multimedia Communication in Telepresence
  • 2. Outline  Motivation for Behavior Analysis (BA) and Failure Prediction  Proposed BA framework  Probabilistic Behavior Analysis  Simulated Probabilistic Behavior Analysis  Behavior-Time Profile Modeling and Analysis  Scalability of the Proposed BA framework  Conclusions and Future Prospects 11/23/2012 NoF’12 2
  • 3. Why Behavior Analysis (BA)?  Benefits of BA for Failure Prediction  Preventing Service-Layer or System-Level failures  Enabling operation in “unallowable” states to save energy and cost, and also to reduce footprint  Profiling the Actors  Profiling end users, service providers, and other actors in a computing business (for example, a telecom business)  The ensemble of these actors resembles more an ecosystem than a system  Profiling helps in: • Smart management of resources • Building reputations and trust for actors • Identifying and isolating wrong-acting actors and threats 11/23/2012 NoF’12 3
  • 4. Why Failure Prediction? A new failure source: Cyclic ElastoPlastic Operation (CEPO) Cyclic elastoplastic Hardware factor operation Software Human Middleware Other factor factors factor factor 11/23/2012 NoF’12 4
  • 5. Cyclic elastoplastic operation (CEPO): in Civil and Mechanical Engineering  Safe operation in plastic mode  Repeatable transitions between elastic and plastic modes Plastic regime  Cyclic operation is the key Plastic Elastic regime Collapse Point 11/23/2012 NoF’12 5
  • 6. Cyclic elastoplastic operation (CEPO): its counterparts in Computing Systems Carbon Enabling Effect and Green Push: Doing more with less 1. PUE of Data centers Increasing inlet air flow temperature (2-4% energy saving per 1°C increase) For example: PUE = 1.5, 20% saving (5°C)  PUE = 1.2 Reducing or eliminating fans Failure at component level (servers) increases with temperature (ASHRAE TC 9.9. 2011) Failure Prediction and Behavior Analysis can isolate component-level failures (even before their occurrence) in order to prevent system-level failures (which may violate SLO constraints) Again, cyclic operation is the key to success 2. Can be applied to Bandwidth too?? Uncertainty increases with the length of stay in the plastic mode Bearable stress level Plastic mode Stress on System Elastic mode Allowable Elastic range Inlet temperature 11/23/2012 NoF’12 6
  • 7. The Proposed BA framework  An Ensemble-of-Experts approach:  The sub-paradigms • Probabilistic Behavior Analysis • Simulated Probabilistic Behavior Analysis • Behavior-Time Profile Modeling and Analysis  Two different pictures:  Systemic picture  Ecosystemic picture 11/23/2012 NoF’12 7
  • 8. BA Framework: Systemic picture 11/23/2012 NoF’12 8
  • 9. BA Framework: Ecosystemic picture 11/23/2012 NoF’12 9
  • 10. Multiple layers in BA framework Layers vs (physical and non- physical) location: Toward Location Various layers Intelligence in Computing systems  Hardware (Compute/Network)  Hardware Drivers/Software  Middleware/Protocols  Virtualware  Virtualware Drivers/Software  Applications (Software) 11/23/2012 NoF’12 10
  • 11. Sub-paradigm 1: Probabilistic Behavior Analysis  Each layer of system is considered as a graph  Sub-graphs constitute super-components of higher levels (vertical scaling)  The behavior is modeled as PoA  The PoA is related to CDF of failure:  The Differential Density Function (DDF): 11/23/2012 NoF’12 11
  • 12. Sub-paradigm 1: Probabilistic Behavior Analysis  An example of a 2-component system: 11/23/2012 NoF’12 12
  • 13. Sub-paradigm 1: Tanh distribution Tanh CDFs Tanh DDFs 11/23/2012 NoF’12 13
  • 14. Sub-paradigm 1: Probabilistic Behavior Analysis Lanl05 database Lanl05 database statistics  Duration: 9 years  Retrieved from FTA  Availability statistics:  19874 records  mean = 1777.99 (hrs)  std = 3462.33  Skewness = 3.09  GoF p-value (Tanh) = 0.500  GoF p-value (Weib.) = 0.416  Unavailability statistics:  mean = 5.88 (hrs)  std = 78.39  Skewness = 43.96 11/23/2012 NoF’12 14
  • 15. Sub-paradigm 2: Simulated Probabilistic Behavior Analysis  For highly-complex system topologies, the CDFs of high-level sub-graphs and components is estimated using simulation based on CDFs of basic components  It can be also used to validate the calculations of the first sub-paradigm  Monte Carlo strategy is used  In each run, the fault time of each basic component is calculated randomly based on its CDF  The cumulative behavior of all runs of the high-level sub-graph is used to estimate its CDF  1000-run simulations have been used 11/23/2012 NoF’12 15
  • 16. Sub-paradigm 2: Simulated Probabilistic Behavior Analysis MC simulation: G_1,1 MC simulation: G_2,1 11/23/2012 NoF’12 16
  • 17. Sub-paradigm 2: Simulated Probabilistic Behavior Analysis MC simulation: CDFs MC simulation: DDFs 11/23/2012 NoF’12 17
  • 18. Sub-paradigm 3: Behavior-Time Profile Modeling and Analysis  Time-profile of components characteristics collected by opportunistic agents across the system (or ecosystem)  Time-profile of state transitions in components and also higher level sub-graphs at various layers collected or injected by BSU  Machine learning methods are used to match the state transitions with the characteristics  Support Vector Machine (SVM)  Bayesian networks  Agent-based data mining  Fuzzy logic  ··· 11/23/2012 NoF’12 18
  • 19. Sub-paradigm 3: Behavior-Time Profile Modeling and Analysis  Four motivations for behavior-time profile analysis:  Spontaneous faults compared to cause-and-effect faults have been reduced significantly • Less pure hardware-caused faults compared to interaction- caused faults  Patterns and cycles in fault occurrence and in general in behavior  Handling of faulty systems that do not have any faulty components • context-sensitive diagnosis [Lamperti2011]  handling of gradual events 11/23/2012 NoF’12 19
  • 20. Sub-paradigm 3: Behavior-Time Profile Modeling and Analysis A simple example: 11/23/2012 NoF’12 20
  • 21. SLA and Service Grading  Even without considering elastoplastic use case, BA can help in upgrading a service (for example, to the telco grade)  Probability of Availability (PoA): Lease-based business models  Predicting, isolating and resolving failure events at component or sub- system levels before they get to the Service Layer.  Probability of Completion (PoC): Task-based business models  Countermeasure options:  Put out high risk components (maintenance tickets)  Temporal redundancy  But, all this depends on the ability to predict high risk or failure  An example:  No BA: Major fault mode with MTBF = 10 weeks, MTTR = 10 minutes  52:09 minutes downtime a year < 52:33  4nines  With BA: 90% of faults are detected 15 minutes before system failure  5:13 minutes downtime a year < 5:15  5nines 11/23/2012 NoF’12 21
  • 22. Countermeasures and cost savings Two alternative modes to save An example: Full system both energy (cost) and life expectancy of components 11/23/2012 NoF’12 22
  • 23. Scalability Horizontal and Vertical scaling Federated scaling 11/23/2012 NoF’12 23
  • 24. Conclusions and Future Prospects  A multi-paradigm, multi-layer, multi-level cognitive behavior analysis framework is introduced  Three sub-paradigms (cross-cover):  Statistical inference  Statistical inference by means of simulation  Time-profile modeling and analysis  Multiple granularity analysis and scalability:  Horizontal, vertical and hierarchical scaling  Including other layers in the analysis: virtualware and middleware  Estimation of PoA to improve system dependability and its service grade  A new distribution is introduced: Tanh distribution  validated on a real database: lanl05 database  Future Prospects:  Large-scale operation of each sub-paradigm  Cognitive Response: Multi-Expert Decision Making, Cognitive Models  Integration of the framework with real computing systems: • OpenStack, Open GSN  Machine learning techniques for the time-profile modeling sub-paradigm  Development of more sophisticated distributions 11/23/2012 NoF’12 24
  • 25. Thanks you, Any question! BATG Reza Fereydoun Vahid Mohamed FARRAHI FARRAHI ASGHARI, CHERIET, MOGHADDAM, MOGHADDAM, Eng., Ph.D., MIEEE Eng., Ph.D., SMIEEE Eng., Ph.D., MIEEE Eng., M.Sc., MIEEE vahid@emt.inrs.ca mohamed.cheriet@etsmtl.ca imriss@ieee.org, farrahi@ieee.org, rfarrahi@synchromedia.ca ffarrahi@synchromedia.ca Research Associate PhD Student Postdoctoral Fellow Director of Synchromedia Lab http://www.synchromedia.ca/ NSERC