SlideShare a Scribd company logo
1
Online Memory Leak Detection in the Cloud-based
Infrastructures
Anshul Jindal*, Paul Staab† , Jorge Cardoso†, Michael Gerndt* and
Vladimir Podolskiy*
*Technical University of Munich (TUM), Germany
†Huawei Munich Research Center Germany
International Workshop on Artificial Intelligence for IT
Operations (AIOps 2020)
18th edition of the International Conference on Service
Oriented Computing (ICSOC 2020)
14th December 2020
q Introduction
q Goals
q Implementation
q Results
q Conclusion
q Future work
2
Outline
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
3
Introduction: What is memory leak?
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q A memory leak is loss of available memory when a program fails to return memory that
it has obtained for temporary use
100%
Time
Mem.Util.
Memory Utilization hits
ceiling
Potential
Memory
Leak case
4
Introduction: Memory leak on VMs in Cloud
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Long-running software services are susceptible to memory leaks.
q Effects:
o compromising the performance and the stability of the service.
o Catastrophic Impact on critical infrastructure.
o Requires restart of VMs.
q Challenges:
o Hard to trace during development (due to microservices architecture).
o Applications are written in different programming languages.
o Hard to monitor thousands of VMs running simultaneously for memory leak.
o For static code analysis: require source code modifications and add to performance
overhead.
5
Introduction: Memory Leak Patterns
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Linear Increasing Pattern Saw-Tooth Pattern
6
Goals
Detection of memory leaks on VMs without any
insight knowledge of applications running on VM.
Minimal overhead with detection in a
reasonable time.
Scalable approach.
1
2
3
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
7
Implementation: Methodology
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Try to fit a line at a window from the end of the metric time series. It requires that the
line approximates the data well (based on R2).
q Fit lines for windows of different sizes and pick the best and longest one
Time
Metric
Window1: Good fit
Window2: Bad fit
8
Implementation: Precog
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
… Data
Pre-Processing
Trend Lines
Fitting
Detect Trends
Minimum
R2 score
Critical
Time C
Trend
Lines
Precog
Save All
Trends
Maximum
Slope
Maximum
Duration
Training
Trends
Training
Data
New
Data
Analysis
New Data
Trends
Saved Values
Result
a VM memory utilization
timeseries data
9
Implementation: Precog (Parameters)
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Parameter Description Default
min_window Minimal size of a window in points. Only trends
present for at least this period are detected.
72 (= 6h)
max_window Maximal size of a window in points. 288 (= 24h)
min_r_squared Required for considering a line a good fit to the
data. Higher values require that the time series
must be very close to a line.
0.8
max_threshold Maximum threshold we care about. 100
critical_time If the time left until we hit the ceiling is less than this
time (measured in points), an anomaly will be
detected.
2016 (=7d)
10
Evaluation Settings: Experimental Configurations
Real Cloud VMs Dataset* Synthetic Data
Memory Leaks No Memory Leaks Memory Leaks No Memory Leaks
20 40 90 90
q Evaluations were conducted on executed on a machine with 4 physical cores (3.6 GHz
Intel Core i7-4790 CPU) with hyperthreading enabled and 16 GB of RAM.
*Provided by Huawei Munich Research Center
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
11
(i) Linear
Increasing
(ii) Sawtooth
(iii) Linear
Increasing
without
trends in
training
data
(iv) Linear
Increasing with
similar trends
as training data
and correctly
not detected.
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Results: Memory Leak Detection
12
Results: Memory Leak Detection (Accuracy)
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Dataset Sub-cases +ve Cases -ve Cases F1 Score Recall Precision
Synthetic
Linearly Increasing 30 30 0.933 0.933 0.933
Linearly Increasing
(with noise)
30 30 0.895 1.0 0.810
Sawtooth 30 30 0.83 0.73 0.956
Overall 90 90 0.9 0.9 0.91
Real Overall 20 60 0.857 0.75 1.0
13Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Prediction time within 1 second
Results: Memory Leak Detection (Training and Predict Time)
14Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Critical TimeMinimum R2 Score
Results: Memory Leak Detection (Parameter Sensitivity)
15
Conclusion & Future Work
Developed Precog algorithm relevant for cloud-based infrastructures.
Precog is scalable to thousands of VMs.
Precog is able to achieve a F1-Score of 0.85 with less than half a
second prediction time
Although Precog depends on some parameters but beyond a certain
values it is insensitive.
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
16
Future Work
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
q Use of Precog for detection of memory leaks in Serverless computing.
q Extension of Precog to other metrics like Disk Consumption and Cost Budget.
q Use of other metrics like CPU, network and storage utilization for enhancing the accuracy
performance.
Edge-Cloud and IoT RG @ Chair of Computer Architecture and Parallel Systems
Technical University of Munich
q Research Areas:
• Self-adaptive Edge-Cloud Continuum (predictive autoscaling for VMs, offloading computation from edge to cloud).
• AI for smart Cloud operations (anomaly detection and failure predictions).
• Modelling of microservices applications and many more…
q Key Publications:
• [2020, WoSC@Middleware] Mohak Chadha, Anshul Jindal, and Michael Gerndt. 2020. "Towards Federated Learning
Using FaaS Fabric".
• [2019, ICPE ] A. Jindal, V. Podolskiy, M. Gerndt "Performance Modeling for Cloud Microservice Applications”.
• [2018, CLOUD] Vladimir Podolskiy, Anshul Jindal and Michael Gerndt. IaaS Reactive Autoscaling Performance
Challenges. The 2018 IEEE International Conference on Cloud Computing (CLOUD 2018).
17
We are…
Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
Prof. Dr.
Michael Gerndt
Ph.D Student
Vladimir Podolskiy
Ph.D. Student
Anshul Jindal
+ Other Students
Ph.D. Student
Mohak Chadha
18
Contact
anshul.jindal@tum.de
/ansjin
/Anshul_Jindal4
Thank you for your attention!
Questions ?

More Related Content

What's hot

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without Failures
Jorge Cardoso
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016
Jorge Cardoso
 
Bosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance CenterBosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance Center
Splunk
 
Shape the Cloud
Shape the CloudShape the Cloud
Shape the Cloud
Jorge Cardoso
 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenMeetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Digipolis Antwerpen
 
Digital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing IndustryDigital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing Industry
Wg Cdr Jayesh C S PAI
 
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findingsStrategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Digital Enterprise Journal
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
Splunk
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
Splunk
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Splunk
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
Predix
 
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Precisely
 
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout SessionSplunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk
 
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Thomas Benjamin
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk
 
Splunk in integration testing
Splunk in integration testingSplunk in integration testing
Splunk in integration testing
Albert Witteveen
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunk
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk
 
SplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary sessionSplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary session
Splunk
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
Splunk
 

What's hot (20)

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without Failures
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016
 
Bosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance CenterBosch Splunk Roundtable: Bosch atmo Performance Center
Bosch Splunk Roundtable: Bosch atmo Performance Center
 
Shape the Cloud
Shape the CloudShape the Cloud
Shape the Cloud
 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunenMeetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
 
Digital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing IndustryDigital twins - Technology that is Changing Industry
Digital twins - Technology that is Changing Industry
 
Strategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findingsStrategies of Top Performing Organizations in Deploying AIOps - key findings
Strategies of Top Performing Organizations in Deploying AIOps - key findings
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
 
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
Wie erkenne ich die Auswirkungen von IT Ausfallen auf meine Produktion?
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
 
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
Integrating IBM Z and IBM i Operational Intelligence Into Splunk, Elastic, an...
 
Splunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout SessionSplunk Enterprise for InfoSec Hands-On Breakout Session
Splunk Enterprise for InfoSec Hands-On Breakout Session
 
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
Predix Data Fabric & Digital Twin Framework- Platform for Continuous Learning...
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
 
Splunk in integration testing
Splunk in integration testingSplunk in integration testing
Splunk in integration testing
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
 
SplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary sessionSplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary session
 
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your BusinessThe Top 10 Glasstable Design Principles to Boost Your Career and Your Business
The Top 10 Glasstable Design Principles to Boost Your Career and Your Business
 

Similar to Online Memory Leak Detection in the Cloud-based Infrastructures

Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Anshul Jindal
 
Offloading Computation to the Edge
Offloading Computation to the EdgeOffloading Computation to the Edge
Offloading Computation to the Edge
Vittorio Scarano
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
KTN
 
Scalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly DetectionScalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly Detection
Anshul Jindal
 
Computing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - SampleComputing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - Sample
Yole Developpement
 
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
FBK CREATE-NET
 
An emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge ApplicationsAn emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge Applications
MoysisSymeonides
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
Förderverein Technische Fakultät
 
Machine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use casesMachine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use cases
ADVA
 
Pushing the boundaries of AI research
Pushing the boundaries of AI researchPushing the boundaries of AI research
Pushing the boundaries of AI research
Qualcomm Research
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
ICS
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
RitwikShrivastava1
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoT
Capgemini
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET Journal
 
Object Detection Bot
Object Detection BotObject Detection Bot
Object Detection Bot
IRJET Journal
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
Bill Wong
 

Similar to Online Memory Leak Detection in the Cloud-based Infrastructures (20)

Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
Windsurfing with APPA: Automating the Computational Fluid Dynamics Simulation...
 
Offloading Computation to the Edge
Offloading Computation to the EdgeOffloading Computation to the Edge
Offloading Computation to the Edge
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
 
Scalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly DetectionScalable Infrastructure and Workflow for Anomaly Detection
Scalable Infrastructure and Workflow for Anomaly Detection
 
Computing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - SampleComputing for Datacenter Servers 2021 - Sample
Computing for Datacenter Servers 2021 - Sample
 
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...Walking through the fog (computing) - Keynote talk at Italian Networking Work...
Walking through the fog (computing) - Keynote talk at Italian Networking Work...
 
An emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge ApplicationsAn emulation framework for IoT, Fog, and Edge Applications
An emulation framework for IoT, Fog, and Edge Applications
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Machine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use casesMachine learning for optical networking: hype, reality and use cases
Machine learning for optical networking: hype, reality and use cases
 
Pushing the boundaries of AI research
Pushing the boundaries of AI researchPushing the boundaries of AI research
Pushing the boundaries of AI research
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
 
Microservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoTMicroservices: The Future-Proof Framework for IoT
Microservices: The Future-Proof Framework for IoT
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...IRJET- 	  Embedded System for Automatic Door Access using Face Recognition Te...
IRJET- Embedded System for Automatic Door Access using Face Recognition Te...
 
Object Detection Bot
Object Detection BotObject Detection Bot
Object Detection Bot
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 

Recently uploaded

Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
LucyLuo36
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Breaking the Ruby Performance Barrier with YJIT
Breaking the Ruby Performance Barrier with YJITBreaking the Ruby Performance Barrier with YJIT
Breaking the Ruby Performance Barrier with YJIT
maximechevalierboisv1
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
QADay
 

Recently uploaded (20)

Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Breaking the Ruby Performance Barrier with YJIT
Breaking the Ruby Performance Barrier with YJITBreaking the Ruby Performance Barrier with YJIT
Breaking the Ruby Performance Barrier with YJIT
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 

Online Memory Leak Detection in the Cloud-based Infrastructures

  • 1. 1 Online Memory Leak Detection in the Cloud-based Infrastructures Anshul Jindal*, Paul Staab† , Jorge Cardoso†, Michael Gerndt* and Vladimir Podolskiy* *Technical University of Munich (TUM), Germany †Huawei Munich Research Center Germany International Workshop on Artificial Intelligence for IT Operations (AIOps 2020) 18th edition of the International Conference on Service Oriented Computing (ICSOC 2020) 14th December 2020
  • 2. q Introduction q Goals q Implementation q Results q Conclusion q Future work 2 Outline Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 3. 3 Introduction: What is memory leak? Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q A memory leak is loss of available memory when a program fails to return memory that it has obtained for temporary use 100% Time Mem.Util. Memory Utilization hits ceiling Potential Memory Leak case
  • 4. 4 Introduction: Memory leak on VMs in Cloud Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Long-running software services are susceptible to memory leaks. q Effects: o compromising the performance and the stability of the service. o Catastrophic Impact on critical infrastructure. o Requires restart of VMs. q Challenges: o Hard to trace during development (due to microservices architecture). o Applications are written in different programming languages. o Hard to monitor thousands of VMs running simultaneously for memory leak. o For static code analysis: require source code modifications and add to performance overhead.
  • 5. 5 Introduction: Memory Leak Patterns Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Linear Increasing Pattern Saw-Tooth Pattern
  • 6. 6 Goals Detection of memory leaks on VMs without any insight knowledge of applications running on VM. Minimal overhead with detection in a reasonable time. Scalable approach. 1 2 3 Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 7. 7 Implementation: Methodology Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Try to fit a line at a window from the end of the metric time series. It requires that the line approximates the data well (based on R2). q Fit lines for windows of different sizes and pick the best and longest one Time Metric Window1: Good fit Window2: Bad fit
  • 8. 8 Implementation: Precog Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 … Data Pre-Processing Trend Lines Fitting Detect Trends Minimum R2 score Critical Time C Trend Lines Precog Save All Trends Maximum Slope Maximum Duration Training Trends Training Data New Data Analysis New Data Trends Saved Values Result a VM memory utilization timeseries data
  • 9. 9 Implementation: Precog (Parameters) Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Parameter Description Default min_window Minimal size of a window in points. Only trends present for at least this period are detected. 72 (= 6h) max_window Maximal size of a window in points. 288 (= 24h) min_r_squared Required for considering a line a good fit to the data. Higher values require that the time series must be very close to a line. 0.8 max_threshold Maximum threshold we care about. 100 critical_time If the time left until we hit the ceiling is less than this time (measured in points), an anomaly will be detected. 2016 (=7d)
  • 10. 10 Evaluation Settings: Experimental Configurations Real Cloud VMs Dataset* Synthetic Data Memory Leaks No Memory Leaks Memory Leaks No Memory Leaks 20 40 90 90 q Evaluations were conducted on executed on a machine with 4 physical cores (3.6 GHz Intel Core i7-4790 CPU) with hyperthreading enabled and 16 GB of RAM. *Provided by Huawei Munich Research Center Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 11. 11 (i) Linear Increasing (ii) Sawtooth (iii) Linear Increasing without trends in training data (iv) Linear Increasing with similar trends as training data and correctly not detected. Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Results: Memory Leak Detection
  • 12. 12 Results: Memory Leak Detection (Accuracy) Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Dataset Sub-cases +ve Cases -ve Cases F1 Score Recall Precision Synthetic Linearly Increasing 30 30 0.933 0.933 0.933 Linearly Increasing (with noise) 30 30 0.895 1.0 0.810 Sawtooth 30 30 0.83 0.73 0.956 Overall 90 90 0.9 0.9 0.91 Real Overall 20 60 0.857 0.75 1.0
  • 13. 13Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Prediction time within 1 second Results: Memory Leak Detection (Training and Predict Time)
  • 14. 14Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Critical TimeMinimum R2 Score Results: Memory Leak Detection (Parameter Sensitivity)
  • 15. 15 Conclusion & Future Work Developed Precog algorithm relevant for cloud-based infrastructures. Precog is scalable to thousands of VMs. Precog is able to achieve a F1-Score of 0.85 with less than half a second prediction time Although Precog depends on some parameters but beyond a certain values it is insensitive. Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020
  • 16. 16 Future Work Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 q Use of Precog for detection of memory leaks in Serverless computing. q Extension of Precog to other metrics like Disk Consumption and Cost Budget. q Use of other metrics like CPU, network and storage utilization for enhancing the accuracy performance.
  • 17. Edge-Cloud and IoT RG @ Chair of Computer Architecture and Parallel Systems Technical University of Munich q Research Areas: • Self-adaptive Edge-Cloud Continuum (predictive autoscaling for VMs, offloading computation from edge to cloud). • AI for smart Cloud operations (anomaly detection and failure predictions). • Modelling of microservices applications and many more… q Key Publications: • [2020, WoSC@Middleware] Mohak Chadha, Anshul Jindal, and Michael Gerndt. 2020. "Towards Federated Learning Using FaaS Fabric". • [2019, ICPE ] A. Jindal, V. Podolskiy, M. Gerndt "Performance Modeling for Cloud Microservice Applications”. • [2018, CLOUD] Vladimir Podolskiy, Anshul Jindal and Michael Gerndt. IaaS Reactive Autoscaling Performance Challenges. The 2018 IEEE International Conference on Cloud Computing (CLOUD 2018). 17 We are… Anshul Jindal | Online Memory Leak Detection in the Cloud-based Infrastructures | AIOPS 2020 Prof. Dr. Michael Gerndt Ph.D Student Vladimir Podolskiy Ph.D. Student Anshul Jindal + Other Students Ph.D. Student Mohak Chadha