SlideShare a Scribd company logo
1 of 30
Monalytics - Online Monitoring and
Analytics for Managing Large
Scale Data Centers

Mahendra Kutare*,Greg Eisenhauer*,
Chengwei Wang*, Karsten Schwan*,
Vanish Talwar# and Matthew Wolf*
(*Georgia Tech, # HP Labs)
Data Center Management
State of Art
•

Rich platform-level monitoring, incl. hardware
counters.

•

Monitoring and management systems:
–

Dedicated firmware and infrastructure for monitoring at
rack levels (e.g., HP iLO, IBM Director).

–

Middleware-level tools and support for center-level (e.g.,
HP OpenView, IBM Tivoli):
•

statically configured, with standards-(XML)-based logging,
and

•

centralized analysis and management.
Current Management Methods
Key Idea
Monalytics – for on-line management `at scale’:


Combine monitoring with analysis for scalability
and fast response.



Lightweight, dynamic, and distributed.



Enable `local’ control loops for fast actions on
analyzed monitoring data.
Issues and Goals


Scale to future datacenter systems:
–

–



`in space’: e.g., large numbers of entities, even
per node, due to consolidation – implies large
monitoring data volumes;
‘in time’: e.g., fault localization made difficult by
cascading effects of failures at scale – requires
short response times.

Dynamics in utility clouds:
–

e.g., changed endpoints due to VM migration
require re-deployment of capture, aggregation,
analysis components;

–

e.g., changing needs demand capture of more
detailed or alternative metrics and analyses.
Monalytics - Design


Monitoring:
−
−

flexible and dynamic monalytics topologies;

−


‘at source’ lightweight data manipulation (e.g.,
filtering);
distributed, concurrent, and supporting multiple
administrative domains (zones).

Analysis combined with monitoring:
−
−



`at source’, `during aggregation’, and global;
dynamic: whenever and wherever needed.

Overhead proportionality of control loops with the
processes been controlled.
Monalytics Topology
Monalytics Physical Topology
Monalytics Implementation
Technical Issues


Elasticity: dynamic topologies/methods:
–

–



adapting to different time and length scales, e.g., by
dynamic use of alternate metrics and analyses, by
‘zooming in’ on select detail;
dealing with VM migration and arrivals/departures.

Overhead-proportional monalytics:
–
–



dynamically and selectively local analysis/actions;
limiting overheads by providing only summary and
aggregate data to higher level brokers and zone
leaders.

Flexible, lightweight building blocks for monalytics
topologies.
Achieving Overhead-Proportionality
Monitor

Collect
Local Window
Collect

Monitor
Local
Loop

Global Window
Summary

Global
Loop

Action
Action
Action

Local
Analyze

Aggregate
Analyze
Analyze
Illustrations - I
Dynamic local control loop


Recreate an apache bug that segfaults and
finally stops all interactions between RUBiS
components:
–

behavior detection via run time monitoring;

–

behavior diagnosis triggers corrective action by
instantiating control loop to stop/re-start VM.
Total
Requests

Unsuccessful
Requests

Without Control Action

52535

13976

With Control Action

52535

5763
Illustration - II
Scalability through local analysis


Trace the http requests processed by the
apache webserver:
–

behavior detection by monitoring for request
processing time for abnormalities;

–

behavior diagnosis using predefined processing
time threshold of 200 ms.
Total Request Trace Data
Generated (10 min)

Without Filtering Operator

1.46 MB

With Filtering Operator

60.45 KB
Illustration - III
Zoom In analysis
 Failure proportional scalability:
–

behavior detection by monitoring CPU utilization
for the application’s multiple VMs;

–

behavior diagnosis by using entropy-based
statistical techniques.
Total Data Transferred
(3hour )

With Centralized Decision

394.06 KB

With Local Decision and Zoom in
Analysis

123.32 KB
Lightweight Building Blocks:
Event Tracing Overheads DomU

•

Overheads grow with the increase in trace event size
from 50-150Kb.

•

Low overheads for reasonably high request rates.
Lightweight Building Blocks:
Trace Event Logging Overheads

•

Logging overheads are similar to Apache native logging.

•

Monalytics achieve better results on logging numerical
versus string data.
Summary


Light weight monalytics enable flexible
management.



Illustrations demonstrate the importance of
dynamic capabilities to attain overhead
proportionality and to operate at larger scales.



Statistical methods for behavior detection key to
effective monitoring.
Related Publications
•

Online Detection of Utility Cloud Anomalies Using Metric
Distributions – NOMS 2010
–

•

Chengwei Wang, Vanish Talwar, Karsten Schwan, Partha
Ranganathan

Look Who's Talking: Discovering Dependencies between
Virtual Machines Using CPU Utilization – HotCloud 2010
–

Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh
Monalytics – Cloud Visibility
• Issues – Service Provider/User separation.
– Multiple administrative domains between applications
and infrastructure operators reduces system visibility.
– Cooperative Vs Non-cooperative components
• Problems – How to deduce/infer from limited information?
– Answer questions such as • What are the source-destination communication
pairs for each VM?
• Most heavily interacting VM in the infrastructure ?
Current Research
• Communication – Current techniques work under simplistic
communication patterns, hence are not generic
enough.
– More complex communications arise with load
balancing and multiple instances of the same
application residing on the infrastructure.
•

Techniques – Time series analysis techniques based on stationary
model are less realistic and inaccurate.
Problem and Intended Contribution
• Problem – Find all VMs (source-destination) which are
communicating with a VM ?
– A VM can communicate with multiple VMs due to
application design or load balancing.


Contribution 



A model for communication between VMs using
network level metrics.
Demonstration of the validity of the model for realistic
cloud deployments.
Approach
Collect network traffic information

Build a profile for normal network traffic patterns

Monitor the VM operation based on the traffic profile
Issues
• Issues – Differentiate traffic from multiple sources and
destinations.
– Traffic data becomes high dimensional.
Practical Scenario

time
Key Idea
• Use Gaussian distribution to model relationship between
incoming and outgoing traffic information.
• Issues – Directly applying Gaussian model is expensive and
inaccurate.
• Perform dimensionality reduction using PCA.
– In real operations, the relationship could be more
complex due to different request types.
• Build a mixture of Gaussian model.
• Use Gaussian mixture model on features rather than
original monitored data
Example
incoming traffic

outgoing traffic

outgoing traffic

profile on network
traffic relationship

Three Gaussian models

Mixture of Gaussians

incoming traffic

Probability distribution
Model Building
• Model –
– Use EM algorithm for Gaussian mixture model.
– EM algorithms provides for a given network metrics
point which Gaussian distribution among the mixture
the data point belongs to.
• Base case –
– Each Gaussian distribution presents a VM and all the
data points associated with a Gaussian distribution
represents all the source-destination network metrics
data points that are associated with a VM.
– Through above we can find all the source-destination
VMs which are interacting with a VM.
Runtime Monitoring
• Check system current status by comparing it with the
learned model.
– If a new VM starts interacting with a given VM it can be
detected by changes in source –destination network
metrics data points for a given Gaussian representing a
VM.
– Similarly we can detect when VM interaction with
source or destination VMs are stopped.
– To still figure out, if we can rank all the source and
destination VMs interacting with a VM. This can provide
information about which VM has been communicated
the most.
Experimental Evaluation
• Steps –
– Testbed setup with multiple instances of application and
load balancer components.
– Collect the network in and out metrics.
– Plot the scatter plots
– Describe the mixture model from the learned data.
• Evaluation –
– Zero traffic
– Significantly delayed traffic
Thanks

More Related Content

Viewers also liked

Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Tutorial on using CoreOS Flannel for Docker networking
Tutorial on using CoreOS Flannel for Docker networkingTutorial on using CoreOS Flannel for Docker networking
Tutorial on using CoreOS Flannel for Docker networkingLorisPack Project
 
Introduction to Microservices and Cloud Native Application Architecture
Introduction to Microservices and Cloud Native Application ArchitectureIntroduction to Microservices and Cloud Native Application Architecture
Introduction to Microservices and Cloud Native Application ArchitectureDavid Currie
 
Docker networking Tutorial 101
Docker networking Tutorial 101Docker networking Tutorial 101
Docker networking Tutorial 101LorisPack Project
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to ThriftDvir Volk
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeAdrian Cockcroft
 

Viewers also liked (7)

Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Tutorial on using CoreOS Flannel for Docker networking
Tutorial on using CoreOS Flannel for Docker networkingTutorial on using CoreOS Flannel for Docker networking
Tutorial on using CoreOS Flannel for Docker networking
 
Introduction to Microservices and Cloud Native Application Architecture
Introduction to Microservices and Cloud Native Application ArchitectureIntroduction to Microservices and Cloud Native Application Architecture
Introduction to Microservices and Cloud Native Application Architecture
 
Docker networking Tutorial 101
Docker networking Tutorial 101Docker networking Tutorial 101
Docker networking Tutorial 101
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 

Similar to Monalytics - Online Monitoring and Analytics for Large Scale Data Centers

Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Architecture for monitoring applications in Cloud
Architecture for monitoring applications in CloudArchitecture for monitoring applications in Cloud
Architecture for monitoring applications in CloudOnkar Kadam
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...Fatima Qayyum
 
Architecture of Wemlin Hub
Architecture of Wemlin HubArchitecture of Wemlin Hub
Architecture of Wemlin HubGoran Cvetkoski
 
Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetrypphaal
 
Mobile QoS Management using Complex Event Processing
Mobile QoS Management using Complex Event ProcessingMobile QoS Management using Complex Event Processing
Mobile QoS Management using Complex Event ProcessingMauricio Arango
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionGyan Prakash
 
Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Dr. Fahad Aijaz
 
Monitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized ApplicationMonitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized ApplicationRahul Krishna Upadhyaya
 
Mule soa
Mule soaMule soa
Mule soaPhaniu
 
Java Abs Dynamic Server Replication
Java Abs   Dynamic Server ReplicationJava Abs   Dynamic Server Replication
Java Abs Dynamic Server Replicationncct
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...IEEEFINALSEMSTUDENTPROJECTS
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 

Similar to Monalytics - Online Monitoring and Analytics for Large Scale Data Centers (20)

Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Architecture for monitoring applications in Cloud
Architecture for monitoring applications in CloudArchitecture for monitoring applications in Cloud
Architecture for monitoring applications in Cloud
 
unit3.ppt
unit3.pptunit3.ppt
unit3.ppt
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
Architecture of Wemlin Hub
Architecture of Wemlin HubArchitecture of Wemlin Hub
Architecture of Wemlin Hub
 
Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetry
 
Mobile QoS Management using Complex Event Processing
Mobile QoS Management using Complex Event ProcessingMobile QoS Management using Complex Event Processing
Mobile QoS Management using Complex Event Processing
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detection
 
Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services:
 
Monitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized ApplicationMonitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized Application
 
Mule soa
Mule soaMule soa
Mule soa
 
Mule soa
Mule soaMule soa
Mule soa
 
Mule soa
Mule soaMule soa
Mule soa
 
Mule soa
Mule soaMule soa
Mule soa
 
Mule soa
Mule soaMule soa
Mule soa
 
Java Abs Dynamic Server Replication
Java Abs   Dynamic Server ReplicationJava Abs   Dynamic Server Replication
Java Abs Dynamic Server Replication
 
Mule soa
Mule soaMule soa
Mule soa
 
Mule soa
Mule soaMule soa
Mule soa
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

Monalytics - Online Monitoring and Analytics for Large Scale Data Centers

  • 1. Monalytics - Online Monitoring and Analytics for Managing Large Scale Data Centers Mahendra Kutare*,Greg Eisenhauer*, Chengwei Wang*, Karsten Schwan*, Vanish Talwar# and Matthew Wolf* (*Georgia Tech, # HP Labs)
  • 2. Data Center Management State of Art • Rich platform-level monitoring, incl. hardware counters. • Monitoring and management systems: – Dedicated firmware and infrastructure for monitoring at rack levels (e.g., HP iLO, IBM Director). – Middleware-level tools and support for center-level (e.g., HP OpenView, IBM Tivoli): • statically configured, with standards-(XML)-based logging, and • centralized analysis and management.
  • 4. Key Idea Monalytics – for on-line management `at scale’:  Combine monitoring with analysis for scalability and fast response.  Lightweight, dynamic, and distributed.  Enable `local’ control loops for fast actions on analyzed monitoring data.
  • 5. Issues and Goals  Scale to future datacenter systems: – –  `in space’: e.g., large numbers of entities, even per node, due to consolidation – implies large monitoring data volumes; ‘in time’: e.g., fault localization made difficult by cascading effects of failures at scale – requires short response times. Dynamics in utility clouds: – e.g., changed endpoints due to VM migration require re-deployment of capture, aggregation, analysis components; – e.g., changing needs demand capture of more detailed or alternative metrics and analyses.
  • 6. Monalytics - Design  Monitoring: − − flexible and dynamic monalytics topologies; −  ‘at source’ lightweight data manipulation (e.g., filtering); distributed, concurrent, and supporting multiple administrative domains (zones). Analysis combined with monitoring: − −  `at source’, `during aggregation’, and global; dynamic: whenever and wherever needed. Overhead proportionality of control loops with the processes been controlled.
  • 10. Technical Issues  Elasticity: dynamic topologies/methods: – –  adapting to different time and length scales, e.g., by dynamic use of alternate metrics and analyses, by ‘zooming in’ on select detail; dealing with VM migration and arrivals/departures. Overhead-proportional monalytics: – –  dynamically and selectively local analysis/actions; limiting overheads by providing only summary and aggregate data to higher level brokers and zone leaders. Flexible, lightweight building blocks for monalytics topologies.
  • 11. Achieving Overhead-Proportionality Monitor Collect Local Window Collect Monitor Local Loop Global Window Summary Global Loop Action Action Action Local Analyze Aggregate Analyze Analyze
  • 12. Illustrations - I Dynamic local control loop  Recreate an apache bug that segfaults and finally stops all interactions between RUBiS components: – behavior detection via run time monitoring; – behavior diagnosis triggers corrective action by instantiating control loop to stop/re-start VM. Total Requests Unsuccessful Requests Without Control Action 52535 13976 With Control Action 52535 5763
  • 13. Illustration - II Scalability through local analysis  Trace the http requests processed by the apache webserver: – behavior detection by monitoring for request processing time for abnormalities; – behavior diagnosis using predefined processing time threshold of 200 ms. Total Request Trace Data Generated (10 min) Without Filtering Operator 1.46 MB With Filtering Operator 60.45 KB
  • 14. Illustration - III Zoom In analysis  Failure proportional scalability: – behavior detection by monitoring CPU utilization for the application’s multiple VMs; – behavior diagnosis by using entropy-based statistical techniques. Total Data Transferred (3hour ) With Centralized Decision 394.06 KB With Local Decision and Zoom in Analysis 123.32 KB
  • 15. Lightweight Building Blocks: Event Tracing Overheads DomU • Overheads grow with the increase in trace event size from 50-150Kb. • Low overheads for reasonably high request rates.
  • 16. Lightweight Building Blocks: Trace Event Logging Overheads • Logging overheads are similar to Apache native logging. • Monalytics achieve better results on logging numerical versus string data.
  • 17. Summary  Light weight monalytics enable flexible management.  Illustrations demonstrate the importance of dynamic capabilities to attain overhead proportionality and to operate at larger scales.  Statistical methods for behavior detection key to effective monitoring.
  • 18. Related Publications • Online Detection of Utility Cloud Anomalies Using Metric Distributions – NOMS 2010 – • Chengwei Wang, Vanish Talwar, Karsten Schwan, Partha Ranganathan Look Who's Talking: Discovering Dependencies between Virtual Machines Using CPU Utilization – HotCloud 2010 – Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh
  • 19. Monalytics – Cloud Visibility • Issues – Service Provider/User separation. – Multiple administrative domains between applications and infrastructure operators reduces system visibility. – Cooperative Vs Non-cooperative components • Problems – How to deduce/infer from limited information? – Answer questions such as • What are the source-destination communication pairs for each VM? • Most heavily interacting VM in the infrastructure ?
  • 20. Current Research • Communication – Current techniques work under simplistic communication patterns, hence are not generic enough. – More complex communications arise with load balancing and multiple instances of the same application residing on the infrastructure. • Techniques – Time series analysis techniques based on stationary model are less realistic and inaccurate.
  • 21. Problem and Intended Contribution • Problem – Find all VMs (source-destination) which are communicating with a VM ? – A VM can communicate with multiple VMs due to application design or load balancing.  Contribution   A model for communication between VMs using network level metrics. Demonstration of the validity of the model for realistic cloud deployments.
  • 22. Approach Collect network traffic information Build a profile for normal network traffic patterns Monitor the VM operation based on the traffic profile
  • 23. Issues • Issues – Differentiate traffic from multiple sources and destinations. – Traffic data becomes high dimensional.
  • 25. Key Idea • Use Gaussian distribution to model relationship between incoming and outgoing traffic information. • Issues – Directly applying Gaussian model is expensive and inaccurate. • Perform dimensionality reduction using PCA. – In real operations, the relationship could be more complex due to different request types. • Build a mixture of Gaussian model. • Use Gaussian mixture model on features rather than original monitored data
  • 26. Example incoming traffic outgoing traffic outgoing traffic profile on network traffic relationship Three Gaussian models Mixture of Gaussians incoming traffic Probability distribution
  • 27. Model Building • Model – – Use EM algorithm for Gaussian mixture model. – EM algorithms provides for a given network metrics point which Gaussian distribution among the mixture the data point belongs to. • Base case – – Each Gaussian distribution presents a VM and all the data points associated with a Gaussian distribution represents all the source-destination network metrics data points that are associated with a VM. – Through above we can find all the source-destination VMs which are interacting with a VM.
  • 28. Runtime Monitoring • Check system current status by comparing it with the learned model. – If a new VM starts interacting with a given VM it can be detected by changes in source –destination network metrics data points for a given Gaussian representing a VM. – Similarly we can detect when VM interaction with source or destination VMs are stopped. – To still figure out, if we can rank all the source and destination VMs interacting with a VM. This can provide information about which VM has been communicated the most.
  • 29. Experimental Evaluation • Steps – – Testbed setup with multiple instances of application and load balancer components. – Collect the network in and out metrics. – Plot the scatter plots – Describe the mixture model from the learned data. • Evaluation – – Zero traffic – Significantly delayed traffic