SlideShare a Scribd company logo
Machine Learning for Automated Diagnosis of Distributed Systems Performance   Ira Cohen HP-Labs June 2006 http://www.hpl.hp.com/personal/Ira_Cohen
Intersection of systems and ML/Data mining: Growing (research) area ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SLIC project at HP-Labs * : Statistical learning inference and control ,[object Object],[object Object],[object Object],I’ll Focus today on Performance diagnosis
Intuition: Why is performance diagnosis hard? ,[object Object]
Why care about performance? ,[object Object],[object Object],[object Object]
Challenges today in diagnosing/forecasting IT performance problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Translation to Machine Learning Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: A real distributed HP Application architecture Geographically distribution 3-tier application Results shown today are from last 19+ months of data collected from this service
Application performance “management”: Service Level Objectives (SLO) Unhealthy = SLO Violation
Detection is not enough… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Unhealthy
Challenge 1: Transforming data to information… ,[object Object],[object Object],[object Object],[object Object],Where is the relevant information?
ML Approach: Model using Classifiers ,[object Object],[object Object],[object Object],[object Object],Unhealthy F(M ,SLO)
But we need an explanation, not just classification accuracy...  P(M|SLO) Our approach: Learn joint probability distribution (Bayesian network classifiers) Unhealthy P(M,SLO) Normal Metric has a value associated with  healthy behavior Abnormal Metric has a value associated with unhealthy behavior Inferences (“ metric attribution ”):
Bayesian network classifiers: Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SLO State M3 M30 M32 M5 M8
Additional issues ,[object Object],[object Object],[object Object]
Challenge 2: Adaptation ,[object Object],[object Object],Learning with “Concept drift” Different? Same problem?
Adaptation: Possible approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Our approach: Managing an ensemble of models for our classification approach ,[object Object],[object Object],[object Object],Construction Inference: Use  Brier score  for selection of models
Adaptation: Results ,[object Object],[object Object],[object Object],[object Object],[object Object],0.9 84.2 Single model with sliding window 7.1 90.7 Ensemble of Models 71.5 82.4 Single model trained with all history (no forgetting) 0.2 61.4 Single model: No Adaptation Total Processing Time (mins) Accuracy (%)
Adaptation: Result ,[object Object],[object Object]
Additional issues ,[object Object],[object Object]
Challenge 3: Leveraging history ,[object Object],Diagnosis : Stuck thread due to insufficient   Database connections Repair : Increase connections to +6 Periods : : : : Severity : SLO time increases up to 10secs : : Location : Americas.  Not seen in Asia/Pacific
Leveraging history ,[object Object],[object Object],[object Object],[object Object],Diagnosis : Stuck thread due to insufficient   Database connections Repair : Increase connections to +6 Periods : : : : Severity : SLO time increases up to 10secs : : Location : Americas.  Not seen in Asia/Pacific
Our approach to defining signatures 1) Learn probabilistic classifiers 2) Inferences: Metric Attribution Unhealthy Models P(SLO,M) DB cpu util high app active proc high app alive proc high app cpu util Abnormal metrics 3) Define these as  signatures  of the problems
Example: Defining a signature ,[object Object],[object Object],Attri- bution
Results: With signatures… ,[object Object],[object Object],Diagnosis : Stuck thread due to insufficient   Database connections Repair : Increase connections to +6 Periods : : : : Severity : SLO time increases up to 10secs : : Location : Americas.  Not seen in Asia/Pacific
Results: Retrieval accuracy Retrieval of "Stuck Thread" problem Top 100: 92 vs 51 Ideal P-R curve
Results: With signatures we can also… ,[object Object],[object Object],[object Object]
Additional issues ,[object Object],[object Object],[object Object]
Challenge 4: Combining multiple data sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Properties of logs ,[object Object],[object Object],[object Object],[object Object]
Our approach: Processing application error-logs ,[object Object],[object Object],[object Object],[object Object],2006-02-26T00:00:06.461 ES_Domain:ES_hpat615_01:2257913:Thread43.ES82|commandchain.BaseErrorHandler.logException()|FUNCTIONAL|0||FatalException occurred type=com.hp.es.service.productEntitlement.knight.logic.access.KnightIOException, message=Connection timed out, class=com.hp.es.service.productEntitlement.knight.logic.RequestKnightResultMENUCommand 2006-02-26T00:00:06.465 ES_Domain:ES_hpat615_01:22579163:Thread-43.ES82|com.hp.es.service.productEntitlement.combined.errorhandling.DefaultAlwaysEIAErrorHandlerRed.handleException()|FATAL|2706||KNIGHT system unavailable: java.io.IOException 2006-02-26T00:00:06.465 ES_Domain:ES_hpat615_01:22579163:Thread-43.ES82|com.hp.es.service.productEntitlement.combined.errorhandling.DefaultAlwaysEIAErrorHandlerRed.handleException()|FATAL|0||com.hp.es.service.productEntitlement.knight.logic.RequestKnightResultMENUCommand message: Connection timed out causing exception type: java.io.IOException KNIGHT URL accessed: http://vccekntpro.cce.hp.com/knight/knightwarrantyservice.asmx 2006-02-26T00:00:06.466 ES_Domain:ES_hpat615_01:22579163:Thread-43.ES82|com.hp.es.service.productEntitlement.combined.errorhandling.DefaultAlwaysEIAErrorHandlerRed.handleException()|FATAL|0||com.hp.es.service.productEntitlement.knight.logic.access.KnightIOException: Connection timed out 2006-02-26T00:00:08.279 ES_Domain:ES_hpat615_01:22579163:ExecuteThread: '16' for 'weblogic.kernel.Default'.ES82|com.hp.es.service.productEntitlement.combined.MergeAllStartedThreadsCommand.setWaitingFinished()|WARNING|3709||2006-02-26T00:00:08.279 ES_Domain:ES_hpat615_01:22579163:ExecuteThread: '16' for 2006-02-26T00:00:06.465 ES_Domain:ES_hpat615_01:22579163:Thread-43.ES82|com.hp.es.service.productEntitlement.combined.errorhandling.DefaultAlwaysEIAErrorHandlerRed.handleException()|FATAL|0||com.hp.es.service.productEntitlem Over 4,000,000 error log entries 200,000+ distinct error messages   Use  count  of appearances over 5-minute intervals of the features messages as  metrics  for learning Similarity-based Sequential   Clustering 190 “feature messages”
Learning Probabilistic Models ,[object Object],# of appearances PDF
Results: Adding Log based metrics ,[object Object],[object Object],From Operator Incident Report: Diagnosis and Solution: Unable to start SWAT wrapper. Disk usage reached 100%.  Cleaned up disk and restarted the wrapper… CORBA access failure: IDL:hpsewrapper/SystemNotAvailableException:… com.hp.es.wrapper.corba.hpsewrapper.SystemNotAvailableException From Application Error Log:
Additional issues ,[object Object],[object Object]
Challenge 5: Scaling up Machine Learning techniques ,[object Object],[object Object],[object Object],[object Object],A B C D E
Challenge 5: Possible approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Diagnosis with Multiple Instances ,[object Object],A B
Diagnosis with Multiple Instances ,[object Object],B C D E F G H A
[object Object],[object Object],A B Diagnosis with Multiple Instances
[object Object],B C D E F G H Diagnosis with Multiple Instances A
Metric Exchange: Does it help? ,[object Object],[object Object],Time Epoch Online Prediction Time Epoch Online Prediction Violation detection w/ model exchange Violation detection w/o model exchange False Alarm Instance 1 Instance 2
[object Object],[object Object],[object Object],Model Exchange: Does it help? Time Epoch Online Prediction Violation detection w/o model exchange Violation detection w/ model exchange False alarm w/ model exchange False alarm w/o model exchange Models imported from other instances improve accuracy
Additional issues ,[object Object],[object Object]
Providing diagnosis as a web service: SLIC’s IT-Rover ,[object Object],[object Object],[object Object],[object Object],[object Object],Metrics/SLO  Monitoring  Signature construction engine Signature DB Clustering engine Retrieval engine Monitored Services Admin
Discussion: Additional issues, opportunities, and challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Summary
Publications: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
Trideeb Kumar Das
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slides
toolboc
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
Timothy Cook
 
Mis
MisMis
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
Ting Yuan, Ed.D.
 
Approach AI assurance
Approach AI assuranceApproach AI assurance
Approach AI assurance
Aviral Srivastava
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
Ting Yuan, Ed.D.
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Editor IJCATR
 
RDAP 15: The Role of Assessment in Research Data Services
RDAP 15: The Role of Assessment in Research Data ServicesRDAP 15: The Role of Assessment in Research Data Services
RDAP 15: The Role of Assessment in Research Data Services
ASIS&T
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
Eleanor Howe
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey
Doaa Mohey Eldin
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
nirvdrum
 
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsBenchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
IJNSA Journal
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey
Doaa Mohey Eldin
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
László Kovács
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
Gabriel Hughes PhD
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Sri Ambati
 
Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_mohey
Doaa Mohey Eldin
 
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Towards a pattern recognition approach for transferring knowledge in acm v4 f...Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Thanh Tran
 

What's hot (19)

Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slides
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Mis
MisMis
Mis
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Approach AI assurance
Approach AI assuranceApproach AI assurance
Approach AI assurance
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
 
RDAP 15: The Role of Assessment in Research Data Services
RDAP 15: The Role of Assessment in Research Data ServicesRDAP 15: The Role of Assessment in Research Data Services
RDAP 15: The Role of Assessment in Research Data Services
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsBenchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_mohey
 
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Towards a pattern recognition approach for transferring knowledge in acm v4 f...Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
 

Viewers also liked

Automated Classification and Quantification of Verbatims via Machine...
         Automated Classification and Quantification of Verbatims via Machine...         Automated Classification and Quantification of Verbatims via Machine...
Automated Classification and Quantification of Verbatims via Machine...
Fabrizio Sebastiani
 
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
Michael Kopp
 
Application SLA - the missing part of complete SLA management
Application SLA - the missing part of complete SLA managementApplication SLA - the missing part of complete SLA management
Application SLA - the missing part of complete SLA management
Comarch
 
Enforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and MonascaEnforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and Monasca
Fabio Giannetti
 
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Spark Summit
 
Machine Learning for Automated Reasoning: An Overview
Machine Learning for Automated Reasoning: An OverviewMachine Learning for Automated Reasoning: An Overview
Machine Learning for Automated Reasoning: An Overview
Vincenzo Lomonaco
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
SLA-Ready Network
 
Self-Adaptive SLA-Driven Capacity Management for Internet Services
Self-Adaptive SLA-Driven Capacity Management for Internet ServicesSelf-Adaptive SLA-Driven Capacity Management for Internet Services
Self-Adaptive SLA-Driven Capacity Management for Internet Services
Bruno Abrahao
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Autonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud ApplicationsAutonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud Applications
nbonvin
 
Hierarchical SLA-based Service Selection for Multi-Cloud Environments
Hierarchical SLA-based Service Selection for Multi-Cloud EnvironmentsHierarchical SLA-based Service Selection for Multi-Cloud Environments
Hierarchical SLA-based Service Selection for Multi-Cloud Environments
Soodeh Farokhi
 
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Cisco Canada
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
Sudhir Tonse
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7

Viewers also liked (15)

Automated Classification and Quantification of Verbatims via Machine...
         Automated Classification and Quantification of Verbatims via Machine...         Automated Classification and Quantification of Verbatims via Machine...
Automated Classification and Quantification of Verbatims via Machine...
 
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
SLAs and Performance in the Cloud: Because There is More Than "Just" Availabi...
 
Application SLA - the missing part of complete SLA management
Application SLA - the missing part of complete SLA managementApplication SLA - the missing part of complete SLA management
Application SLA - the missing part of complete SLA management
 
Enforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and MonascaEnforcing Application SLA with Congress and Monasca
Enforcing Application SLA with Congress and Monasca
 
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
Automated Machine Learning Using Spark Mllib to Improve Customer Experience-(...
 
Machine Learning for Automated Reasoning: An Overview
Machine Learning for Automated Reasoning: An OverviewMachine Learning for Automated Reasoning: An Overview
Machine Learning for Automated Reasoning: An Overview
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
 
Self-Adaptive SLA-Driven Capacity Management for Internet Services
Self-Adaptive SLA-Driven Capacity Management for Internet ServicesSelf-Adaptive SLA-Driven Capacity Management for Internet Services
Self-Adaptive SLA-Driven Capacity Management for Internet Services
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Autonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud ApplicationsAutonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud Applications
 
Hierarchical SLA-based Service Selection for Multi-Cloud Environments
Hierarchical SLA-based Service Selection for Multi-Cloud EnvironmentsHierarchical SLA-based Service Selection for Multi-Cloud Environments
Hierarchical SLA-based Service Selection for Multi-Cloud Environments
 
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 

Similar to Machine Learning for automated diagnosis of distributed ...AE

Situation Awareness In A Complex World
Situation Awareness In A Complex WorldSituation Awareness In A Complex World
Situation Awareness In A Complex World
vsorathia
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
butest
 
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
Dr Talaat Refaat
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
Soojung Hong
 
-linkedin
-linkedin-linkedin
The role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha KrishnanThe role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha Krishnan
sunanthakrishnan
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
Vedaj Padman
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Machine learning
Machine learningMachine learning
Machine learning
Tushar Nikam
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Jin Young Kim
 
Data science course in ameerpet Hyderabad
Data science course in ameerpet HyderabadData science course in ameerpet Hyderabad
Data science course in ameerpet Hyderabad
ShivaKanukuntla33
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabad
rajasrichalamala3zen
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .
rajasrichalamala3zen
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .
rajasrichalamala3zen
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabad
akhilamadupativibhin
 

Similar to Machine Learning for automated diagnosis of distributed ...AE (20)

Situation Awareness In A Complex World
Situation Awareness In A Complex WorldSituation Awareness In A Complex World
Situation Awareness In A Complex World
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
-linkedin
-linkedin-linkedin
-linkedin
 
The role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha KrishnanThe role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha Krishnan
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine learning
Machine learningMachine learning
Machine learning
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
Data science course in ameerpet Hyderabad
Data science course in ameerpet HyderabadData science course in ameerpet Hyderabad
Data science course in ameerpet Hyderabad
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabad
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabad
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
PPT
PPTPPT
PPT
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
hier
hierhier
hier
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning for automated diagnosis of distributed ...AE

  • 1. Machine Learning for Automated Diagnosis of Distributed Systems Performance Ira Cohen HP-Labs June 2006 http://www.hpl.hp.com/personal/Ira_Cohen
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Example: A real distributed HP Application architecture Geographically distribution 3-tier application Results shown today are from last 19+ months of data collected from this service
  • 10. Application performance “management”: Service Level Objectives (SLO) Unhealthy = SLO Violation
  • 11.
  • 12.
  • 13.
  • 14. But we need an explanation, not just classification accuracy... P(M|SLO) Our approach: Learn joint probability distribution (Bayesian network classifiers) Unhealthy P(M,SLO) Normal Metric has a value associated with healthy behavior Abnormal Metric has a value associated with unhealthy behavior Inferences (“ metric attribution ”):
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Our approach to defining signatures 1) Learn probabilistic classifiers 2) Inferences: Metric Attribution Unhealthy Models P(SLO,M) DB cpu util high app active proc high app alive proc high app cpu util Abnormal metrics 3) Define these as signatures of the problems
  • 26.
  • 27.
  • 28. Results: Retrieval accuracy Retrieval of "Stuck Thread" problem Top 100: 92 vs 51 Ideal P-R curve
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.

Editor's Notes

  1. Outline: 1. SLIC general slide 2. HP management general slide --- Control + forecasting + diagnostics 2. Give example of performance problems --- complexity of finding solutions 3. Challenges: A. Explanation not just classification B. Adaptation: Concept drifts D. Using multiple data sources: semi-structured data. C. Information retrieval: Leveraging past history E. Distributed diagnosis --- efficiency, transfer learning E. Leveraging human feedback (human in the loop): semi-supervised learning (active learning, semi-supervised clustering) F. Anomaly detection: Amazon vs. other companies. 4. Bringing in all together as a tool: Providing diagnostic capabilities as a centrally managed service --- explain why we need it. Talk about Splunk, our prototypes, etc.
  2. Complexity will get even higher with the use of virtual machines!
  3. Only subset of them capture system/application state at a given time – question is how to distill the best representation. Analogous to documents, containing lots of irrelevant words/phrases and some very relevant.
  4. (drift in both violation and compliance periods).
  5. Attributed: metric in “abnormal” state Not-Attributed: metric in “normal” state 0 = irrelevant, can’t distinguish normal from abnormal state
  6. Allows prioritization