SlideShare a Scribd company logo
1
Online Memory Leak Detection in the Cloud-based
Infrastructures
Anshul Jindal*, Paul Staab† , Jorge Cardoso†, Michael Gerndt* and
Vladimir Podolskiy*
*Technical University of Munich (TUM), Germany
†Huawei Munich Research Center Germany
International Workshop on Artificial Intelligence for IT
Operations (AIOps 2020)
18th edition of the International Conference on Service
Oriented Computing (ICSOC 2020)
14th December 2020
q Introduction
q Goals
q Implementation
q Results
q Conclusion
q Future work
2
Outline
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
3
Introduction: What is memory leak?
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q A memory leak is loss of available memory when a program fails to return memory that
it has obtained for temporary use
100%
Time
Mem.Util.
Memory Utilization hits
ceiling
Potential
Memory
Leak case
4
Introduction: Memory leak on VMs in Cloud
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Long-running software services are susceptible to memory leaks.
q Effects:
o compromising the performance and the stability of the service.
o Catastrophic Impact on critical infrastructure.
o Requires restart of VMs.
q Challenges:
o Hard to trace during development (due to microservices architecture).
o Applications are written in different programming languages.
o Hard to monitor thousands of VMs running simultaneously for memory leak.
o For static code analysis: require source code modifications and add to performance
overhead.
5
Introduction: Memory Leak Patterns
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Linear Increasing Pattern Saw-Tooth Pattern
6
Goals
Detection of memory leaks on VMs without any
insight knowledge of applications running on VM.
Minimal overhead with detection in a
reasonable time.
Scalable approach.
1
2
3
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
7
Implementation: Methodology
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Try to fit a line at a window from the end of the metric time series. It requires that the
line approximates the data well (based on R2).
q Fit lines for windows of different sizes and pick the best and longest one
Time
Metric
Window1: Good fit
Window2: Bad fit
8
Implementation: Precog
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
… Data
Pre-Processing
Trend Lines
Fitting
Detect Trends
Minimum
R2 score
Critical
Time C
Trend
Lines
Precog
Save All
Trends
Maximum
Slope
Maximum
Duration
Training
Trends
Training
Data
New
Data
Analysis
New Data
Trends
Saved Values
Result
a VM memory utilization
timeseries data
9
Implementation: Precog (Parameters)
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Parameter Description Default
min_window Minimal size of a window in points. Only trends
present for at least this period are detected.
72 (= 6h)
max_window Maximal size of a window in points. 288 (= 24h)
min_r_squared Required for considering a line a good fit to the
data. Higher values require that the time series
must be very close to a line.
0.8
max_threshold Maximum threshold we care about. 100
critical_time If the time left until we hit the ceiling is less than this
time (measured in points), an anomaly will be
detected.
2016 (=7d)
10
Evaluation Settings: Experimental Configurations
Real Cloud VMs Dataset* Synthetic Data
Memory Leaks No Memory Leaks Memory Leaks No Memory Leaks
20 40 90 90
q Evaluations were conducted on executed on a machine with 4 physical cores (3.6 GHz
Intel Core i7-4790 CPU) with hyperthreading enabled and 16 GB of RAM.
*Provided by Huawei Munich Research Center
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
11
(i) Linear
Increasing
(ii) Sawtooth
(iii) Linear
Increasing
without
trends in
training
data
(iv) Linear
Increasing with
similar trends
as training data
and correctly
not detected.
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Results: Memory Leak Detection
12
Results: Memory Leak Detection (Accuracy)
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Dataset Sub-cases +ve Cases -ve Cases F1 Score Recall Precision
Synthetic
Linearly Increasing 30 30 0.933 0.933 0.933
Linearly Increasing
(with noise)
30 30 0.895 1.0 0.810
Sawtooth 30 30 0.83 0.73 0.956
Overall 90 90 0.9 0.9 0.91
Real Overall 20 60 0.857 0.75 1.0
13Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Prediction time within 1 second
Results: Memory Leak Detection (Training and Predict Time)
14Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Critical TimeMinimum R2 Score
Results: Memory Leak Detection (Parameter Sensitivity)
15
Conclusion & Future Work
Developed Precog algorithm relevant for cloud-based infrastructures.
Precog is scalable to thousands of VMs.
Precog is able to achieve a F1-Score of 0.85 with less than half a
second prediction time
Although Precog depends on some parameters but beyond a certain
values it is insensitive.
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
16
Future Work
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Use of Precog for detection of memory leaks in Serverless computing.
q Extension of Precog to other metrics like Disk Consumption and Cost Budget.
q Use of other metrics like CPU, network and storage utilization for enhancing the accuracy
performance.
Edge-Cloud and IoT RG @ Chair of Computer Architecture and Parallel Systems
Technical University of Munich
q Research Areas:
• Self-adaptive Edge-Cloud Continuum (predictive autoscaling for VMs, offloading computation from edge to cloud).
• AI for smart Cloud operations (anomaly detection and failure predictions).
• Modelling of microservices applications and many more…
q Key Publications:
• [2020, WoSC@Middleware] Mohak Chadha, Anshul Jindal, and Michael Gerndt. 2020. "Towards Federated Learning
Using FaaS Fabric".
• [2019, ICPE ] A. Jindal, V. Podolskiy, M. Gerndt "Performance Modeling for Cloud Microservice Applications”.
• [2018, CLOUD] Vladimir Podolskiy, Anshul Jindal and Michael Gerndt. IaaS Reactive Autoscaling Performance
Challenges. The 2018 IEEE International Conference on Cloud Computing (CLOUD 2018).
17
We are…
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Prof. Dr.
Michael Gerndt
Ph.D Student
Vladimir Podolskiy
Ph.D. Student
Anshul Jindal
+ Other Students
Ph.D. Student
Mohak Chadha
18
Contact
anshul.jindal@tum.de
/ansjin
/Anshul_Jindal4
Thank you for your attention!
Questions ?

More Related Content

What's hot

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without Failures
Jorge Cardoso
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016
Jorge Cardoso
 
Bosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance CenterBosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance Center
Splunk
 
Shape the Cloud
Shape the CloudShape the Cloud
Shape the Cloud
Jorge Cardoso
 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenMeetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Digipolis Antwerpen
 
Digital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing IndustryDigital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing Industry
Wg Cdr Jayesh C S PAI
 
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findingsStrategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Digital Enterprise Journal
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
Splunk
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
Splunk
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Splunk
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
Predix
 
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Precisely
 
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout SessionSplunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk
 
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Thomas Benjamin
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk
 
Splunk in integration testing
Splunk in integration testingSplunk in integration testing
Splunk in integration testing
Albert Witteveen
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
Splunk
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk
 
SplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary sessionSplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary session
Splunk
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
Splunk
 

What's hot (20)

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without Failures
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016
 
Bosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance CenterBosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance Center
 
Shape the Cloud
Shape the CloudShape the Cloud
Shape the Cloud
 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenMeetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
 
Digital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing IndustryDigital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing Industry
 
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findingsStrategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findings
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
 
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
 
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout SessionSplunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout Session
 
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
 
Splunk in integration testing
Splunk in integration testingSplunk in integration testing
Splunk in integration testing
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
 
SplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary sessionSplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary session
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
 

Similar to Online Memory Leak Detection in the Cloud-based Infrastructures

Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Anshul Jindal
 
Offloading Computation to the Edge
Offloading Computation to the EdgeOffloading Computation to the Edge
Offloading Computation to the Edge
Vittorio Scarano
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
KTN
 
Scalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly DetectionScalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly Detection
Anshul Jindal
 
Computing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - SampleComputing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - Sample
Yole Developpement
 
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
FBK CREATE-NET
 
An emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge ApplicationsAn emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge Applications
MoysisSymeonides
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
Förderverein Technische Fakultät
 
Machine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use casesMachine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use cases
ADVA
 
Pushing the boundaries of AI research
Pushing the boundaries of AI researchPushing the boundaries of AI research
Pushing the boundaries of AI research
Qualcomm Research
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
ICS
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
RitwikShrivastava1
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoT
Capgemini
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET Journal
 
Object Detection Bot
Object Detection BotObject Detection Bot
Object Detection Bot
IRJET Journal
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
Bill Wong
 

Similar to Online Memory Leak Detection in the Cloud-based Infrastructures (20)

Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
 
Offloading Computation to the Edge
Offloading Computation to the EdgeOffloading Computation to the Edge
Offloading Computation to the Edge
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
 
Scalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly DetectionScalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly Detection
 
Computing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - SampleComputing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - Sample
 
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
 
An emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge ApplicationsAn emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge Applications
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Machine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use casesMachine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use cases
 
Pushing the boundaries of AI research
Pushing the boundaries of AI researchPushing the boundaries of AI research
Pushing the boundaries of AI research
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoT
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
 
Object Detection Bot
Object Detection BotObject Detection Bot
Object Detection Bot
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 

Recently uploaded

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 

Recently uploaded (20)

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 

Online Memory Leak Detection in the Cloud-based Infrastructures

  • 1. 1 Online Memory Leak Detection in the Cloud-based Infrastructures Anshul Jindal*, Paul Staab† , Jorge Cardoso†, Michael Gerndt* and Vladimir Podolskiy* *Technical University of Munich (TUM), Germany †Huawei Munich Research Center Germany International Workshop on Artificial Intelligence for IT Operations (AIOps 2020) 18th edition of the International Conference on Service Oriented Computing (ICSOC 2020) 14th December 2020
  • 2. q Introduction q Goals q Implementation q Results q Conclusion q Future work 2 Outline Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 3. 3 Introduction: What is memory leak? Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q A memory leak is loss of available memory when a program fails to return memory that it has obtained for temporary use 100% Time Mem.Util. Memory Utilization hits ceiling Potential Memory Leak case
  • 4. 4 Introduction: Memory leak on VMs in Cloud Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Long-running software services are susceptible to memory leaks. q Effects: o compromising the performance and the stability of the service. o Catastrophic Impact on critical infrastructure. o Requires restart of VMs. q Challenges: o Hard to trace during development (due to microservices architecture). o Applications are written in different programming languages. o Hard to monitor thousands of VMs running simultaneously for memory leak. o For static code analysis: require source code modifications and add to performance overhead.
  • 5. 5 Introduction: Memory Leak Patterns Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Linear Increasing Pattern Saw-Tooth Pattern
  • 6. 6 Goals Detection of memory leaks on VMs without any insight knowledge of applications running on VM. Minimal overhead with detection in a reasonable time. Scalable approach. 1 2 3 Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 7. 7 Implementation: Methodology Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Try to fit a line at a window from the end of the metric time series. It requires that the line approximates the data well (based on R2). q Fit lines for windows of different sizes and pick the best and longest one Time Metric Window1: Good fit Window2: Bad fit
  • 8. 8 Implementation: Precog Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 … Data Pre-Processing Trend Lines Fitting Detect Trends Minimum R2 score Critical Time C Trend Lines Precog Save All Trends Maximum Slope Maximum Duration Training Trends Training Data New Data Analysis New Data Trends Saved Values Result a VM memory utilization timeseries data
  • 9. 9 Implementation: Precog (Parameters) Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Parameter Description Default min_window Minimal size of a window in points. Only trends present for at least this period are detected. 72 (= 6h) max_window Maximal size of a window in points. 288 (= 24h) min_r_squared Required for considering a line a good fit to the data. Higher values require that the time series must be very close to a line. 0.8 max_threshold Maximum threshold we care about. 100 critical_time If the time left until we hit the ceiling is less than this time (measured in points), an anomaly will be detected. 2016 (=7d)
  • 10. 10 Evaluation Settings: Experimental Configurations Real Cloud VMs Dataset* Synthetic Data Memory Leaks No Memory Leaks Memory Leaks No Memory Leaks 20 40 90 90 q Evaluations were conducted on executed on a machine with 4 physical cores (3.6 GHz Intel Core i7-4790 CPU) with hyperthreading enabled and 16 GB of RAM. *Provided by Huawei Munich Research Center Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 11. 11 (i) Linear Increasing (ii) Sawtooth (iii) Linear Increasing without trends in training data (iv) Linear Increasing with similar trends as training data and correctly not detected. Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Results: Memory Leak Detection
  • 12. 12 Results: Memory Leak Detection (Accuracy) Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Dataset Sub-cases +ve Cases -ve Cases F1 Score Recall Precision Synthetic Linearly Increasing 30 30 0.933 0.933 0.933 Linearly Increasing (with noise) 30 30 0.895 1.0 0.810 Sawtooth 30 30 0.83 0.73 0.956 Overall 90 90 0.9 0.9 0.91 Real Overall 20 60 0.857 0.75 1.0
  • 13. 13Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Prediction time within 1 second Results: Memory Leak Detection (Training and Predict Time)
  • 14. 14Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Critical TimeMinimum R2 Score Results: Memory Leak Detection (Parameter Sensitivity)
  • 15. 15 Conclusion & Future Work Developed Precog algorithm relevant for cloud-based infrastructures. Precog is scalable to thousands of VMs. Precog is able to achieve a F1-Score of 0.85 with less than half a second prediction time Although Precog depends on some parameters but beyond a certain values it is insensitive. Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 16. 16 Future Work Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Use of Precog for detection of memory leaks in Serverless computing. q Extension of Precog to other metrics like Disk Consumption and Cost Budget. q Use of other metrics like CPU, network and storage utilization for enhancing the accuracy performance.
  • 17. Edge-Cloud and IoT RG @ Chair of Computer Architecture and Parallel Systems Technical University of Munich q Research Areas: • Self-adaptive Edge-Cloud Continuum (predictive autoscaling for VMs, offloading computation from edge to cloud). • AI for smart Cloud operations (anomaly detection and failure predictions). • Modelling of microservices applications and many more… q Key Publications: • [2020, WoSC@Middleware] Mohak Chadha, Anshul Jindal, and Michael Gerndt. 2020. "Towards Federated Learning Using FaaS Fabric". • [2019, ICPE ] A. Jindal, V. Podolskiy, M. Gerndt "Performance Modeling for Cloud Microservice Applications”. • [2018, CLOUD] Vladimir Podolskiy, Anshul Jindal and Michael Gerndt. IaaS Reactive Autoscaling Performance Challenges. The 2018 IEEE International Conference on Cloud Computing (CLOUD 2018). 17 We are… Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Prof. Dr. Michael Gerndt Ph.D Student Vladimir Podolskiy Ph.D. Student Anshul Jindal + Other Students Ph.D. Student Mohak Chadha