SlideShare a Scribd company logo

Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019

Presentation at PRDC 2019

1 of 31
Download to read offline
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
1) Yahoo Japan Corporation, 2) Japan Advanced Institute of Science and Technology (JAIST)
3) School of Computing, Tokyo Institute of Technology
Approximate QoS Rule Derivation
Based on Root Cause Analysis
for Cloud Computing
PRDC 2019
December 1-3, 2019, Kyoto, Japan
Satoshi Konno 1) 2) and Xavier Defago 3)
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Database Platforms in Yahoo! JAPAN
2
300+
Systems
100+
Services
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Major Services of Yahoo! JAPAN
3
3
Media
US
Search Video Answer Mail
JP
US
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
Yahoo AuctionPremium Loco
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Demand on OSS Database Platforms
4
300+
Systems
200+
Systems
MySQL 2000+
DBs
100+
Systems
Cassandra
30
70
60
40
Yahoo Japan
NoSQL
Team
RDB
Team
• Demand on developing autonomous recovery systems
• The number of nodes is increasing year by year.
• The human resources are limited.
4000+
Nodes
X : Autonomous Recovery
X : Autonomous Recovery
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Table of Contents
5
• Background and Related Work
• Proposal Autonomous Recovery Methods
(μQoS and Shape-Root)
• Evaluation Result
• Conclusion and Future Plans
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Background

Recommended

Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringSK Ahammad Fahad
 

More Related Content

What's hot

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learningMark Reynolds
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheonMark Reynolds
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven productionMark Reynolds
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AIaimsnist
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS DatasetKan Yuenyong
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talkbattagline
 

What's hot (20)

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learning
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Big Data
Big Data Big Data
Big Data
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Cri big data
Cri big dataCri big data
Cri big data
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AI
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talk
 

Similar to Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019

kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdfAkuhuruf
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a reviewTELKOMNIKA JOURNAL
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computingabhijeetnawal
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...Rachel Doty
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET Journal
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jNeo4j
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAcscpconf
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things PayamBarnaghi
 
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET Journal
 
Association Rule Mining using RHadoop
Association Rule Mining using RHadoopAssociation Rule Mining using RHadoop
Association Rule Mining using RHadoopIRJET Journal
 
Ijgca july 11
Ijgca   july 11Ijgca   july 11
Ijgca july 11ijgca
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
A Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentA Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentCSCJournals
 

Similar to Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019 (20)

kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdf
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big data
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a review
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computing
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database Techniques
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
 
Association Rule Mining using RHadoop
Association Rule Mining using RHadoopAssociation Rule Mining using RHadoop
Association Rule Mining using RHadoop
 
Ijgca july 11
Ijgca   july 11Ijgca   july 11
Ijgca july 11
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
A Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentA Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing Environment
 

Recently uploaded

Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Prometix Pty Ltd
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonAPNIC
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfgalfinprihardiputra0
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPTPraveenKumarThota7
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Damar Juniarto
 
NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonAPNIC
 

Recently uploaded (6)

Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
Elevate Your Business: Unleashing Collaboration and Efficiency through Expert...
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdf
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPT
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023
 
NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff Huston
 

Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019

  • 1. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. 1) Yahoo Japan Corporation, 2) Japan Advanced Institute of Science and Technology (JAIST) 3) School of Computing, Tokyo Institute of Technology Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing PRDC 2019 December 1-3, 2019, Kyoto, Japan Satoshi Konno 1) 2) and Xavier Defago 3)
  • 2. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Database Platforms in Yahoo! JAPAN 2 300+ Systems 100+ Services
  • 3. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Major Services of Yahoo! JAPAN 3 3 Media US Search Video Answer Mail JP US JP Membership C2C Payment C2C EC B2C EC Local Search Knowledge search MailNews Yahoo AuctionPremium Loco
  • 4. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Demand on OSS Database Platforms 4 300+ Systems 200+ Systems MySQL 2000+ DBs 100+ Systems Cassandra 30 70 60 40 Yahoo Japan NoSQL Team RDB Team • Demand on developing autonomous recovery systems • The number of nodes is increasing year by year. • The human resources are limited. 4000+ Nodes X : Autonomous Recovery X : Autonomous Recovery
  • 5. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Table of Contents 5 • Background and Related Work • Proposal Autonomous Recovery Methods (μQoS and Shape-Root) • Evaluation Result • Conclusion and Future Plans
  • 6. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Background
  • 7. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. In-memory Monitoring Systems Traditional (Storage) Monitoring Systems Monitoring Studies for Cloud Computing 7 X : Analysis O : Aggregation X : Root Cause O : Analysis Replacing Tech Giant Public System OSS Type Capacity Period Legacy 2010 DataGarage × Distributed 4,000 nodes - TableStore 2014 Atlas + Winston △ Centralized 2 billion records 6 h Epic 2015 Gorilla △ Centralized 10 billion records 26 h HBase 2016 Borgmon × Distributed + Hierarchical 10,000 nodes 12 h
  • 8. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Studies for Cloud Computing 8 [1] Abdelzahir Abdelmaboud, Dayang NA Jawawi, Imran Ghani, Abubakar Elsafi, and Barbara Kitchenham. Quality of service approaches in cloud computing: A systematic mapping study. Journal of Systems and Software, 101:159–179, 2015 Map of focus areas in research on QoS approaches in cloud computing [1] Distribution of primary studies by contribution type [1] Models: Discusses concepts, makes comparisons, explores relationships, identifies challenges, or makes classifications. Tools: Supports various aspects of QoS approaches in cloud computing. Methods: Presents a model, algorithm or approaches describing the rules.
  • 9. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Studies for Cloud Computing 9 [1] S Anithakumari and K Chandrasekaran. Monitoring and management of service level agreements in cloud computing. In Cloud and Autonomic Computing (ICCAC), 2015 International Conference on, pages 204–207. IEEE, 2015. X : Resource expansion based on system failure without finding the root cause O : Resource extension based on increased demand
  • 10. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Methods (μQos and Shape-Root)
  • 11. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Overall Autonomous Recovery Sequence μQoS: Reasoning Framework for Guaranteeing QoS 11 STEP1 STEP2 STEP3 Root Cause Analysis QoS Rule Derivation QoS Monitoring Internal In-Memory Time-Series Monitoring System (Foreman) Expanding QoS Monitoring and Action Rules without Resource Expansion STEP4
  • 12. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Monitoring Rules and Actions STEP1 : Separation of Monitoring Rules and Actions 12 Internal In-Memory Time-Series Monitoring System (Foreman)
  • 13. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. STEP3 : μQoS Concept with Root Cause 13 Consumer Provider Root Cause Metrics Service QoS Rule (with No Action) Unsatisfied Metrics Operation QoS Rule (with Recovery Action) STEP1 STEP2 STEP3 Execute the μQoS Rule STEP4 Generating a μQoS Rule • Reliable Root Cause • Fast Root Cause Analysis Mandatory Requirements for Fast Autonomous Recovery
  • 14. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Analysis (Shape-Root)
  • 15. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Analysis Methods for Time-Series Data 15 • Correlation Analysis: Traditional parametric statistics method • Clustering Analysis: Grouping a set of metrics in the same group • Recent Studies: BigRoot, Gorilla, etc.
  • 16. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Correlation Analysis 16 PPMCC [1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895. [2] Charles Loboz, Slawek Smyl, and Suman Nath. Datagarage: Warehousing massive performance data on commodity servers. Proceedings of the VLDB Endowment, 3(1-2):1447–1458, 2010. [3] AbdullahMueen,SumanNath,andJieLiu.Fastapproximatecorrelation for massive time-series data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 171–182. ACM, 2010. [4] Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816–1827, 2015. • Pearson product-moment correlation coefficient • Traditional Parametric Statistics Method • Some monitoring studies on Cloud computing [2][3][4] denoted using the general correlation algorithm, but these studies did not reveal how to find the root causes using PPMCC in more detail.
  • 17. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Clustering Analysis 17 k-Shape [1] [1] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015. • Scalable shape-based clustering algorithm for time-series data based on k-means clustering • Normalized version of the cross- correlation is used for measuring distances between metrics
  • 18. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Recent Studies 18 BigRoots [1] [1] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of stragglers in big data system. IEEE Access, 6:41966–41977, 2018. • Root-cause analysis for the underlying reasons for stragglers. • Analyzing the stragglers using general metrics such as shuffle read/write bytes and JVM garbage collection time, CPU, I/O, and network.
  • 19. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Shape-Root for μQoS 19 Shape-Root • Developed with an emphasis on precision and analysis speed to identify reliable root causes dynamically for μQoS • Based on a shape based algorithm, Dynamic Time Warping (DTW), to measure the metrics correlation distance • Root-cause analysis for all time-series metrics for unsatisfied QoS metrics unlike BigRoot • Excluding descendant metrics and confounding metric based on the timestamps unlike PMCC
  • 20. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation
  • 21. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Purpose 21 • Q1: Effectiveness of μQoS and Shape- Root in detecting candidate root causes • Q2: Correlation between analysis span and precision for the time-series data • Q3: Effectiveness of μQoS and Shape- Root for autonomous recovery to real services?
  • 22. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation Environment 22 • Apache Cassandra v3.11.4: Distributed NoSQL Database • Yahoo! Cloud Serving Benchmark (YCSB) v0.15.0: Benchmark Program for Distributed Databases • Foreman v0.8.9: Internal Distributed Monitoring System in Yahoo JAPAN (11,754 type metrics in a 5 minute cycle) Cassandra Cassandra Cassandra Cassandra Cassandra
  • 23. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Methods for μQoS 23 • Shape-Root: Our proposal root cause method for μQoS • PPMCC [1]: Standard correlation analysis • k-Shape [2]: General time-series clustering analysis algorithm • BigRoot [3]: Root cause algorithm for stragglers [1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895. [2] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015. [3] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of stragglers in big data system. IEEE Access, 6:41966–41977, 2018.
  • 24. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation Metrics 24 • Evaluating the following metrics based on leave-one- out cross-validation • TP: Number of correct potential root causes • FP: Number of wrong potential root causes • FN: Number of not detected root causes
  • 25. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 1: Detecting root causes for injected faults 25 QoS Rules Consumer Bad ? CPU Stress (30min cycle) YCSB Read Heavy Workload Rule11 was unsatisfied with CPU load (30min cycle) Best Good
  • 26. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 2: Detecting root causes in a real system 26 Consumer QoS Rules x x x x x x YCSB Anti-Pattern Workload SSTables of Tombstone Good Bad ? Bad
  • 27. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 3: Comparing effective analysis period in a real system 27 x Consumer QoS Rules YCSB Anti-Pattern Workload Good Good Bad x x Bad Slow Good
  • 28. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 4: Autonomous Recovery Effectiveness for a real system 28 Consumer (Rule41) Initial QoS Rules 5. Execute Rule43 3. Derivate Rule43 2. Root Cause Analysis for Derivate for Rule41 1. Rule41 is Unsatisfied 4. Add Rule43and Execute Rule43 YCSB Anti-Pattern Workload Provider (Rule42)
  • 29. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Conclusion
  • 30. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Summary and Future Plans μQoS : Event-driven monitoring rule derivation method based on case-based reasoning and root cause analysis for autonomous recovery • Good: μQoS have demonstrated the effectiveness in the real system with high precision and real-time root cause algorithm called Shape-Root. • Bad: Oversimplified the acausal root cause exclusion algorithm based on only the metric timestamp. In complex real systems which has many QoS rules, acausal μQoS may be executed. The study focused only on root cause analysis for past failures. As the next step, we currently plan to expand the μQoS framework for future failure based on model-based reasoning or anomaly detection for preventing potential failures. 30
  • 31. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Thank you