Service Clustering for Autonomic Clouds Using Random Forest

•

0 likes•374 views

Managing and optimising cloud services is one of the main challenges faced by industry and academia. A possible solution is resorting to self-management, as fostered by autonomic computing. However, the abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, for example, the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. This paper, instead, proposes an approach, which uses all kinds of data and learns in a data-driven fashion the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show the applicability of our approach, we devise a service scheduler that uses the notion of similarity among services and evaluate it in a cloud test-bed.

Data & Analytics

Service Clustering for Autonomic
Clouds Using Random Forest
Rafael Brundo Uriarte
IMT Lucca
Sotirios Tsaftaris Francesco Tiezzi
IMT Lucca University of Camerino
CCGrid - 7th May 2015 - Shenzhen, China

Contents
1 Introduction
2 Requirements and Existing Solutions
3 RF+PAM
4 Evaluation
5 Conclusions
Uriarte, Tsaftaris and Tiezzi 1/29

Introduction
Introduction Uriarte, Tsaftaris and Tiezzi 2/29

Cloud Computing
Everything-as-a-Service
Dynamism
Heterogeneity
Virtualization
Large-Scale
Introduction Uriarte, Tsaftaris and Tiezzi 3/29

Autonomic Computing
Introduction Uriarte, Tsaftaris and Tiezzi 4/29

Autonomic Clouds
Restricted Knowledge
Approaches to alleviate the problem:
Machine Learning
Service Clustering
Introduction Uriarte, Tsaftaris and Tiezzi 5/29

Applications in the Domain
Anomalous Behaviour Detection
Service Scheduling
Application Proﬁling
SLA Risk Assessment
Introduction Uriarte, Tsaftaris and Tiezzi 6/29

Requirements and Existing Solutions
Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 7/29

Requirements
Characteristics Requirements
Security, Heterogeneity,
Dynamism
Mixed Types of
Features
Large-Scale, Dynamism On-line Prediction
Large-Scale, Multi-Agent
Loosely-Coupled
Parallelism
Heterogeneity
Large Number of
Features
Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 8/29

Existing Approaches
Solutions which handle mixed data types usually are not
scalable (e.g. HClustream)
Expert intervention is not feasible due to the dynamism
Distance Metric Learning Approaches require labelled data
or are computationally expensive.
Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 9/29

RF+PAM
RF+PAM Uriarte, Tsaftaris and Tiezzi 10/29

Random Forest
Mixed Features
Large Number of Features
Eﬃcient and Scales Well
Easily Parallelizable
RF+PAM Uriarte, Tsaftaris and Tiezzi 11/29

Random Forest
Clustering with Random Forest
Originally Developed for Classiﬁcation
On-Line Random Forest
Intrinsic Measure of Similarity
Clustering Algorithm (e.g. PAM)
RF+PAM Uriarte, Tsaftaris and Tiezzi 12/29

Similarity Using RF: Criteria
RF+PAM Uriarte, Tsaftaris and Tiezzi 13/29

Problems
Similarity Matrix (Big Memory Footprint)
Re-cluster on Every New Observation
RF+PAM Uriarte, Tsaftaris and Tiezzi 14/29

Solution: RF+PAM
Oﬀ-line Training and On-line Prediction
Similarity Learning and Standard Clustering
RF+PAM Uriarte, Tsaftaris and Tiezzi 15/29

Solution: RF+PAM
Build Forest, Calculate Similarities, Cluster, Select
the medoids and Store the references of the leaves.
RF+PAM Uriarte, Tsaftaris and Tiezzi 16/29

Solution: RF+PAM
Parse service and Assign the cluster of the most
similar medoid to it.
RF+PAM Uriarte, Tsaftaris and Tiezzi 17/29

Evaluation
Evaluation Uriarte, Tsaftaris and Tiezzi 18/29

Experiments
1. Cluster Quality
2. On-Line Prediction
3. Use Case
Evaluation Uriarte, Tsaftaris and Tiezzi 19/29

Cluster Quality
Clustering quality compared to 2 other
approaches (same dataset)
Better results in all criteria
Connectivity - Connectedness of the clusters
Dunn Index - Cluster density and Separation
Silhouette - Conﬁdence in the assignment
Evaluation Uriarte, Tsaftaris and Tiezzi 20/29

On-line Prediction
On-Line vs Batch Mode
K-Fold Cross-Validation
Compared the Adjusted Rand Index (ARI) for 2
datasets:
Monitoring data of Google’s production
clouds - 12500 servers
Requests of a grid of the Dutch Universities
Research Testbed (DAS-2) - 200 servers
Evaluation Uriarte, Tsaftaris and Tiezzi 21/29

Results: ARI
K Google DAS-2
100 0.81 (0.32) 0.70 (0.23)
50 0.75 (0.19) 0.68 (0.17)
20 0.73 (0.09) 0.67 (0.11)
10 0.70 (0.06) 0.63 (0.09)
5 0.69 (0.05) 0.61 (0.07)
Evaluation Uriarte, Tsaftaris and Tiezzi 22/29

Use Case
Schedules according to the Dissimilarity
Similar services separated
Algorithms:
1. Random
2. Dissimilarity
3. Isolated
Evaluation Uriarte, Tsaftaris and Tiezzi 23/29

Use Case
9 VMs
Arrival Rates
Types of Service
Services’ SLA
Evaluation Uriarte, Tsaftaris and Tiezzi 24/29

Results
Evaluation Uriarte, Tsaftaris and Tiezzi 25/29

Conclusions
Conclusions Uriarte, Tsaftaris and Tiezzi 26/29

Summary
We propose RF+PAM to alleviate the problem
of limited knowledge in AC
Validated RF+PAM with 3 Experiments
Scheduling Algorithm
Conclusions Uriarte, Tsaftaris and Tiezzi 27/29

Future Works
More Use Cases
Better Implementation
Conclusions Uriarte, Tsaftaris and Tiezzi 28/29

Thank you!
Questions?
Rafael Brundo Uriarte
rafael.uriarte@gmail.com
Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

Prune Trees
Parsing is very fast and eﬃcient
Prune requires analysis (time consuming)
Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

Retraining
Ratio of predictions/training services (user deﬁned):
Parallel training
Trade-oﬀ between updating/prediction
Other solutions:
Dissimilarity to Medoids
On-line Clustering (Current Limitations and
Prediction Speed)
Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

Viewers also liked

Introduction to Random Forests by Dr. Adele CutlerSalford Systems

Decision trees and random forestsDebdoot Sheet

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George

Random Forests R vs Python by Linda UruchurtuPyData

Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi

10分でわかるRandom forestYasunori Ozaki

Decision tree and random forestLippo Group Digital

Accelerating Random Forests in Scikit-LearnGilles Louppe

Machine Learning for Medical Image Analysis:What, where and how?Debdoot Sheet

Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanPyData

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare

Understanding Random Forests: From Theory to PracticeGilles Louppe

Random forestMusa Hawamdah

2017 Calendarron mader

Data Science - Part V - Decision Trees & Random Forests Derek Kane

Building Random Forest at ScaleSri Ambati

Viewers also liked (16)

Introduction to Random Forests by Dr. Adele Cutler

Decision trees and random forests

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

Random Forests R vs Python by Linda Uruchurtu

Machine Learning and Data Mining: 16 Classifiers Ensembles

10分でわかるRandom forest

Decision tree and random forest

Accelerating Random Forests in Scikit-Learn

Machine Learning for Medical Image Analysis:What, where and how?

Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)

Understanding Random Forests: From Theory to Practice

Random forest

2017 Calendar

Data Science - Part V - Decision Trees & Random Forests

Building Random Forest at Scale

Similar to Service Clustering for Autonomic Clouds Using Random Forest

Supporting Autonomic Management of Clouds: Service-Level-Agreement, Cloud Mon...Rafael Uriarte

TeraGrid Communication and ComputationTal Lavian Ph.D.

Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh

IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...sunda2011

Itc542 network design researchOz Paper Help

ANALYSIS OF SOFTWARE SECURITY TESTING TECHNIQUES IN CLOUD COMPUTINGEditor IJMTER

ANALYSIS OF ROBUST MILTIUSER DETECTION TECHNIQUE FOR COMMUNICATION SYSTEMIJARIIE JOURNAL

Tulinx introduction 20130622 detailedarjen1970

Principles and risk assessment of managing distributed ontologies hosted by e...FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology

Cs6703 grid and cloud computing Study materialkaleeswaranme

Network optimisation and management - Guaranteed network quality with less costVTT Technical Research Centre of Finland Ltd

A Study of Protocols for Grid Computing EnvironmentCSCJournals

40120140501011 2IAEME Publication

Multisensor Data Fusion : Techno BriefingPaveen Juntama

How Romanian companies are developing secure applications on Azure.pptxRadu Vunvulea

M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsVladislavKashansky

Ee4301798802IJERA Editor

Introduction to DDS: Context, Information Model, Security, and Applications.Gerardo Pardo-Castellote

conference cnc 2022.pptxTumkurInfomedia

IOT model to Unified Communication Events in SDNChandrashekhar Rao

Similar to Service Clustering for Autonomic Clouds Using Random Forest (20)

Supporting Autonomic Management of Clouds: Service-Level-Agreement, Cloud Mon...

TeraGrid Communication and Computation

Neural Network Classification and its Applications in Insurance Industry

IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...

Itc542 network design research

ANALYSIS OF SOFTWARE SECURITY TESTING TECHNIQUES IN CLOUD COMPUTING

ANALYSIS OF ROBUST MILTIUSER DETECTION TECHNIQUE FOR COMMUNICATION SYSTEM

Tulinx introduction 20130622 detailed

Principles and risk assessment of managing distributed ontologies hosted by e...

Cs6703 grid and cloud computing Study material

Network optimisation and management - Guaranteed network quality with less cost

A Study of Protocols for Grid Computing Environment

40120140501011 2

Multisensor Data Fusion : Techno Briefing

How Romanian companies are developing secure applications on Azure.pptx

M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications

Ee4301798802

Introduction to DDS: Context, Information Model, Security, and Applications.

conference cnc 2022.pptx

IOT model to Unified Communication Events in SDN

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Discover Why Less is More in B2B Researchmichael115558

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

April 2024 - Crypto Market Report's Analysismanisha194592

Invezz.com - Grow your wealth with trading signalsInvezz1

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Data-Analysis for Chicago Crime Data 2023ymrp368

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

BigBuy dropshipping via API with DroFx.pptx

Discover Why Less is More in B2B Research

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Carero dropshipping via API with DroFx.pptx

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

April 2024 - Crypto Market Report's Analysis

Invezz.com - Grow your wealth with trading signals

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Data-Analysis for Chicago Crime Data 2023

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Generative AI on Enterprise Cloud with NiFi and Milvus

Service Clustering for Autonomic Clouds Using Random Forest

1. Service Clustering for Autonomic Clouds Using Random Forest Rafael Brundo Uriarte IMT Lucca Sotirios Tsaftaris Francesco Tiezzi IMT Lucca University of Camerino CCGrid - 7th May 2015 - Shenzhen, China

2. Contents 1 Introduction 2 Requirements and Existing Solutions 3 RF+PAM 4 Evaluation 5 Conclusions Uriarte, Tsaftaris and Tiezzi 1/29

3. Introduction Introduction Uriarte, Tsaftaris and Tiezzi 2/29

4. Cloud Computing Everything-as-a-Service Dynamism Heterogeneity Virtualization Large-Scale Introduction Uriarte, Tsaftaris and Tiezzi 3/29

5. Autonomic Computing Introduction Uriarte, Tsaftaris and Tiezzi 4/29

6. Autonomic Clouds Restricted Knowledge Approaches to alleviate the problem: Machine Learning Service Clustering Introduction Uriarte, Tsaftaris and Tiezzi 5/29

7. Applications in the Domain Anomalous Behaviour Detection Service Scheduling Application Proﬁling SLA Risk Assessment Introduction Uriarte, Tsaftaris and Tiezzi 6/29

8. Requirements and Existing Solutions Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 7/29

9. Requirements Characteristics Requirements Security, Heterogeneity, Dynamism Mixed Types of Features Large-Scale, Dynamism On-line Prediction Large-Scale, Multi-Agent Loosely-Coupled Parallelism Heterogeneity Large Number of Features Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 8/29

10. Existing Approaches Solutions which handle mixed data types usually are not scalable (e.g. HClustream) Expert intervention is not feasible due to the dynamism Distance Metric Learning Approaches require labelled data or are computationally expensive. Requirements and Existing Solutions Uriarte, Tsaftaris and Tiezzi 9/29

11. RF+PAM RF+PAM Uriarte, Tsaftaris and Tiezzi 10/29

12. Random Forest Mixed Features Large Number of Features Eﬃcient and Scales Well Easily Parallelizable RF+PAM Uriarte, Tsaftaris and Tiezzi 11/29

13. Random Forest Clustering with Random Forest Originally Developed for Classiﬁcation On-Line Random Forest Intrinsic Measure of Similarity Clustering Algorithm (e.g. PAM) RF+PAM Uriarte, Tsaftaris and Tiezzi 12/29

14. Similarity Using RF: Criteria RF+PAM Uriarte, Tsaftaris and Tiezzi 13/29

15. Problems Similarity Matrix (Big Memory Footprint) Re-cluster on Every New Observation RF+PAM Uriarte, Tsaftaris and Tiezzi 14/29

16. Solution: RF+PAM Oﬀ-line Training and On-line Prediction Similarity Learning and Standard Clustering RF+PAM Uriarte, Tsaftaris and Tiezzi 15/29

17. Solution: RF+PAM Build Forest, Calculate Similarities, Cluster, Select the medoids and Store the references of the leaves. RF+PAM Uriarte, Tsaftaris and Tiezzi 16/29

18. Solution: RF+PAM Parse service and Assign the cluster of the most similar medoid to it. RF+PAM Uriarte, Tsaftaris and Tiezzi 17/29

19. Evaluation Evaluation Uriarte, Tsaftaris and Tiezzi 18/29

20. Experiments 1. Cluster Quality 2. On-Line Prediction 3. Use Case Evaluation Uriarte, Tsaftaris and Tiezzi 19/29

21. Cluster Quality Clustering quality compared to 2 other approaches (same dataset) Better results in all criteria Connectivity - Connectedness of the clusters Dunn Index - Cluster density and Separation Silhouette - Conﬁdence in the assignment Evaluation Uriarte, Tsaftaris and Tiezzi 20/29

22. On-line Prediction On-Line vs Batch Mode K-Fold Cross-Validation Compared the Adjusted Rand Index (ARI) for 2 datasets: Monitoring data of Google’s production clouds - 12500 servers Requests of a grid of the Dutch Universities Research Testbed (DAS-2) - 200 servers Evaluation Uriarte, Tsaftaris and Tiezzi 21/29

23. Results: ARI K Google DAS-2 100 0.81 (0.32) 0.70 (0.23) 50 0.75 (0.19) 0.68 (0.17) 20 0.73 (0.09) 0.67 (0.11) 10 0.70 (0.06) 0.63 (0.09) 5 0.69 (0.05) 0.61 (0.07) Evaluation Uriarte, Tsaftaris and Tiezzi 22/29

24. Use Case Schedules according to the Dissimilarity Similar services separated Algorithms: 1. Random 2. Dissimilarity 3. Isolated Evaluation Uriarte, Tsaftaris and Tiezzi 23/29

25. Use Case 9 VMs Arrival Rates Types of Service Services’ SLA Evaluation Uriarte, Tsaftaris and Tiezzi 24/29

26. Results Evaluation Uriarte, Tsaftaris and Tiezzi 25/29

27. Conclusions Conclusions Uriarte, Tsaftaris and Tiezzi 26/29

28. Summary We propose RF+PAM to alleviate the problem of limited knowledge in AC Validated RF+PAM with 3 Experiments Scheduling Algorithm Conclusions Uriarte, Tsaftaris and Tiezzi 27/29

29. Future Works More Use Cases Better Implementation Conclusions Uriarte, Tsaftaris and Tiezzi 28/29

30. Thank you! Questions? Rafael Brundo Uriarte rafael.uriarte@gmail.com Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

31. Prune Trees Parsing is very fast and eﬃcient Prune requires analysis (time consuming) Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

32. Retraining Ratio of predictions/training services (user deﬁned): Parallel training Trade-oﬀ between updating/prediction Other solutions: Dissimilarity to Medoids On-line Clustering (Current Limitations and Prediction Speed) Conclusions Uriarte, Tsaftaris and Tiezzi 29/29

Service Clustering for Autonomic Clouds Using Random Forest

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to Service Clustering for Autonomic Clouds Using Random Forest

Similar to Service Clustering for Autonomic Clouds Using Random Forest (20)

Recently uploaded

Recently uploaded (20)

Service Clustering for Autonomic Clouds Using Random Forest