SlideShare a Scribd company logo
1 of 18
Download to read offline
Autonomous Cloud Operations
Managing AWS Costs with Anomaly
Detection and Root Cause Analysis
Imran Moin
Chief Product Officer
imran@yotascale.com
AGENDA
• Company Overview
• Key Pain Points in managing AWS cloud infrastructure
• YotaScale Solution Overview
• Deep dive into Anomaly Detection
• Real world cost anomalies found by YotaScale
• Live Product Demonstration
• Q&A
COMPANY VISION
AUTONOMOUS CLOUD OPERATIONS
3© YotaScale. CONFIDENTIAL.
Overview
Company with deep domain expertise in enterprise infrastructure, cloud
computing and machine learning
4
ASIM RAZZAQ
CEO, YotaScale
Leadership
USMAN ABBASI
CTO, YotaScale
IMRAN MOIN
CPO, YotaScale
Investors
© YotaScale. CONFIDENTIAL.
Company
Founded in 2015 HQ - Menlo Park, CA Raised $11.6M till date
Cloud Operations Getting Significantly More Complex
© YotaScale. CONFIDENTIAL. 5
40%
Wastage due
to OpEx Cost
Model
On average enterprises
waste 40% cloud
infrastructure
30%
Time on Manual
Cloud Ops
On average, enterprises
spend 30% of their Cloud
Ops time doing manual
tasks that should be
automated
Over 50% of enterprises
are looking to
decentralize cloud
operations
25+
Tools Causing
Fatigue
On average, enterprises
have 25+ tools to manage
cost, performance and
availability SLAs for cloud
workloads
>50%
Cloud Ops
Decentralizing
© YotaScale. CONFIDENTIAL. 6
Cost
Data
Utilization
Data
Inventory
Data
Log
Data
How to optimize each workload to
get the right cost, performance and
availability?
Perf
Data
Memory
Data
Operational Metrics
● Logins/sec
● DAU
● Queries/sec
Cloud Provider Data
Cloud Ops Policies
● Tagging
● Resource Whitelist
Cofig
Data
Container
Data
Third Party Data
Too much data, not enough insights
© YotaScale. CONFIDENTIAL. 7
AUTOMATION
● 1-click RI purchase
● Auto-tag cloud resources
● Identify root cause and remediation for anomalous incident
● Automate fixes for policy violations
PLANNING
● Accurate forecasting of cloud spend per application, team or BU
● Setting budgets and ensuring compliance
● What-if scenario planning
OPTIMIZATION
● Identify cost savings opportunities
● Detect cost spikes ( anomalies)
● RI purchase decisions
● Rightsizing workloads
VISIBILITY
● Detailed cost, usage and performance reports per workload
● Custom dashboards for each application team
● Cloud policy compliance
● Pulling business context from other tools like ServiceNow, etc.
The Problems that Customers Need to Solve
How the YotaScale Platform Works
8
Cloud Provider Data
● Cost
● Utilization
● Inventory
● Logs
● Containers
OPTIMIZE
Suggestions to remediate issues
DIAGNOSE
Identify root cause
DETECT
Discover trends and identify incidents
1
2
3
4
PREDICT
Forecast the future
Third Party Data
● Performance
● Memory
● Configuration
APIs
AUTONOMOUS
Cloud Ops Policies
● Mandatory Tags
● RegEx Formats
● Resource Whitelist
● Purchase
Preferences
Enterprise Integrations
MANUAL
© YotaScale. CONFIDENTIAL. 9
Streamline Cloud
Operations
• Real-time incident detection
and alerting
• Workflows integrated with
existing Cloud Ops tools -
Slack, JIRA, PagerDuty, etc.
• Incident investigation through
RCA
• 1-click implementation
Optimize Cloud Workloads
• Continuous optimization
assessment across all cloud
resources
• Identify opportunities to
purchase reserved instances
• Real-time re-balancing of RI
inventory
• Rightsize workloads
• Shutdown orphan resources
• Increase performance of
existing cloud resources
Governance with Cloud
Policies
• Pre-defined policies for cloud
operations best practices
• Auto-remediation of
violations
• Audit trail to identify user
responsible for violations
• Real-time inventory of all
assets
• Identify workloads out of
compliance with policies
YotaScale Platform Use Cases
• Detect Anomalies
• Root Cause Analysis
• Intelligent Workflow
Anomaly Detection
Live Monitoring
PREVENT RUNAWAY COSTS
• Contextually aware
corrective action
• Deep library of best
practices
• EC2 & PaaS Support
Continuous
Optimization
Up to 40% Savings
OPTIMALLY EFFICIENT
INFRASTRUCTURE
• Scorecard
• 100% tag hygiene
• Slice and dice analysis
• Accountability
& transparency
Contextual
Analytics
Org Benchmark
ACCOUNTABILITY
& TRANSPARENCY
Through the use of
machine learning,
YotaScale processes
millions of data signals
and provides
contextually relevant
anomaly detection and
optimization
recommendations that
reduce your cloud spend
YotaScale Anomaly Detection Overview
● YotaScale’s ML/AI powered Anomaly Detection can detect cost anomalies happening across
any possible dimension
● Customers get alerted real-time via Email, Slack, etc.
● Quick time to resolution due to YotaScale’s Root Cause Analysis (RCA)
DETECT/ ALERT
Detect and Alert on
real-time cost anomalies
PROVIDE RCA
Provide Root Cause on
what caused that anomaly
REMEDIATE
Suggest possible fixes to
the customer
Key Features for Anomaly Detection
Identify and Customize
Anomalies
● Sophisticated ML Models
● Customizable Dimensions
● Severity Per Anomaly
Provide Root Cause Analysis
● ML Models find correlations /
causations for each anomaly
● Linked to business events
(positive or negative)
Suggest Possible Fixes
● Identify Solutions
● Manual Scripts
● Approval based
implementation
● Automation
Workflow Integration
● Single Sign-On (SSO)
● Slack Integration
● JIRA Integration
Closed Feedback Loop on Anomaly Models
● Customer actions for each cost anomaly
○ Dismiss
○ Resolve
○ Snooze
● Anomaly ML models fine-tuned based on customer feedback
Actions for every
anomaly
Dismiss
Anomalies
Resolve
Anomalies
Remediation for Cost Anomalies (Future Roadmap)
Out of Band
(manual)
instructions
Out of Band
(manual) script
In Band
(manual) Script
Approval Based
Implementation
Automation
(Autonomous
execution by
YotaScale)
With real-time anomaly detection, root
cause and remediation YotaScale caught
this anomaly in time and saved
thousands of dollars.
“Our virus scanning
engine died. We could
not figure out the right
host and in the process
spun up hundreds of
machines. YotaScale
detected the issue in
realtime.”
Jonathan Monette
Senior Architect
“Our virus scanning
engine died. We could
not figure out the right
host and in the process
spun up hundreds of
machines. YotaScale
detected the issue in
real-time.”
Senior Application Architect
YotaScale’s Anomaly
Detection discovers
applications and services
and alerts you to
significant changes.
ANOMALY DETECTION ROOT CAUSE SAVES DAYS OF TROUBLESHOOTING
“Our API gateway team
saw an unusual amount of
requests resulting in a
huge spike in resource
provisioning. YotaScale
pinpointed the exact issue
saving valuable cycles”
YotaScale was able to pinpoint the exact
issue and save days of investigative work
on where to go look.John Smithan
Lead Site Reliability Engineer
“Our API gateway team
saw an unusual amount of
requests resulting in a
huge spike in resource
provisioning. YotaScale
pinpointed the exact issue
saving valuable cycles”
Lead Site Reliability Engineer
Going beyond alerting,
YotaScale can provide a
detailed analysis of the
resources that caused an
anomaly.
Key Benefits of Anomaly Detection
1. Get real-time notifications about any unusual cost spikes across any business
critical dimension
2. Serves as insurance policy against runaway cloud costs - can save up to
10-20% of yearly cloud spend
3. Helps troubleshoot root cause of cost spikes and save valuable time for
CloudOps, Finance and Engineering teams
Product Demonstration

More Related Content

What's hot

Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
TigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial CrimesTigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial CrimesTigerGraph
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
AWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAmazon Web Services
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaEdureka!
 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...DianaGray10
 
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...Codiax
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShao-Chuan Wang
 
Crime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveCrime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveBenjamin Ang
 
AWS ML Model Deployment
AWS ML Model DeploymentAWS ML Model Deployment
AWS ML Model DeploymentKnoldus Inc.
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksDatabricks
 
Improving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWSImproving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWSAmazon Web Services
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsIronside
 

What's hot (20)

Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
TigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial CrimesTigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial Crimes
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Semantic search
Semantic searchSemantic search
Semantic search
 
AWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS Cloud
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Crime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveCrime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspective
 
AWS ML Model Deployment
AWS ML Model DeploymentAWS ML Model Deployment
AWS ML Model Deployment
 
Web mining
Web mining Web mining
Web mining
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Improving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWSImproving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWS
 
Generative AI
Generative AIGenerative AI
Generative AI
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 

Similar to Managing AWS Costs with Anomaly Detection and Root Cause Analysis

The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APMJonah Kowall
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialDevOps.com
 
Enterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu OracleEnterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu OracleMarketingArrowECS_CZ
 
Real-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine dataReal-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine datajKool
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extOracle Developers
 
Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!Flexera
 
Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?ManageEngine
 
How to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG EnterpriseHow to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG EnterpriseeG Innovations
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpseG Innovations
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityDevOps.com
 
SOUG Day - autonomous what is next
SOUG Day - autonomous what is nextSOUG Day - autonomous what is next
SOUG Day - autonomous what is nextThomas Teske
 
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Amazon Web Services
 
Unified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HourUnified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HoureG Innovations
 
Securing the Cloud Native stack
Securing the Cloud Native stackSecuring the Cloud Native stack
Securing the Cloud Native stackHector Tapia
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera, Inc.
 

Similar to Managing AWS Costs with Anomaly Detection and Root Cause Analysis (20)

The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Google App engine
Google App engineGoogle App engine
Google App engine
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
 
Enterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu OracleEnterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu Oracle
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Real-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine dataReal-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine data
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_ext
 
Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!
 
Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019
 
Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?
 
How to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG EnterpriseHow to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG Enterprise
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
SOUG Day - autonomous what is next
SOUG Day - autonomous what is nextSOUG Day - autonomous what is next
SOUG Day - autonomous what is next
 
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
 
Wavefront presentation-May-2019
Wavefront presentation-May-2019Wavefront presentation-May-2019
Wavefront presentation-May-2019
 
Unified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HourUnified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The Hour
 
Securing the Cloud Native stack
Securing the Cloud Native stackSecuring the Cloud Native stack
Securing the Cloud Native stack
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Managing AWS Costs with Anomaly Detection and Root Cause Analysis

  • 1. Autonomous Cloud Operations Managing AWS Costs with Anomaly Detection and Root Cause Analysis Imran Moin Chief Product Officer imran@yotascale.com
  • 2. AGENDA • Company Overview • Key Pain Points in managing AWS cloud infrastructure • YotaScale Solution Overview • Deep dive into Anomaly Detection • Real world cost anomalies found by YotaScale • Live Product Demonstration • Q&A
  • 3. COMPANY VISION AUTONOMOUS CLOUD OPERATIONS 3© YotaScale. CONFIDENTIAL.
  • 4. Overview Company with deep domain expertise in enterprise infrastructure, cloud computing and machine learning 4 ASIM RAZZAQ CEO, YotaScale Leadership USMAN ABBASI CTO, YotaScale IMRAN MOIN CPO, YotaScale Investors © YotaScale. CONFIDENTIAL. Company Founded in 2015 HQ - Menlo Park, CA Raised $11.6M till date
  • 5. Cloud Operations Getting Significantly More Complex © YotaScale. CONFIDENTIAL. 5 40% Wastage due to OpEx Cost Model On average enterprises waste 40% cloud infrastructure 30% Time on Manual Cloud Ops On average, enterprises spend 30% of their Cloud Ops time doing manual tasks that should be automated Over 50% of enterprises are looking to decentralize cloud operations 25+ Tools Causing Fatigue On average, enterprises have 25+ tools to manage cost, performance and availability SLAs for cloud workloads >50% Cloud Ops Decentralizing
  • 6. © YotaScale. CONFIDENTIAL. 6 Cost Data Utilization Data Inventory Data Log Data How to optimize each workload to get the right cost, performance and availability? Perf Data Memory Data Operational Metrics ● Logins/sec ● DAU ● Queries/sec Cloud Provider Data Cloud Ops Policies ● Tagging ● Resource Whitelist Cofig Data Container Data Third Party Data Too much data, not enough insights
  • 7. © YotaScale. CONFIDENTIAL. 7 AUTOMATION ● 1-click RI purchase ● Auto-tag cloud resources ● Identify root cause and remediation for anomalous incident ● Automate fixes for policy violations PLANNING ● Accurate forecasting of cloud spend per application, team or BU ● Setting budgets and ensuring compliance ● What-if scenario planning OPTIMIZATION ● Identify cost savings opportunities ● Detect cost spikes ( anomalies) ● RI purchase decisions ● Rightsizing workloads VISIBILITY ● Detailed cost, usage and performance reports per workload ● Custom dashboards for each application team ● Cloud policy compliance ● Pulling business context from other tools like ServiceNow, etc. The Problems that Customers Need to Solve
  • 8. How the YotaScale Platform Works 8 Cloud Provider Data ● Cost ● Utilization ● Inventory ● Logs ● Containers OPTIMIZE Suggestions to remediate issues DIAGNOSE Identify root cause DETECT Discover trends and identify incidents 1 2 3 4 PREDICT Forecast the future Third Party Data ● Performance ● Memory ● Configuration APIs AUTONOMOUS Cloud Ops Policies ● Mandatory Tags ● RegEx Formats ● Resource Whitelist ● Purchase Preferences Enterprise Integrations MANUAL
  • 9. © YotaScale. CONFIDENTIAL. 9 Streamline Cloud Operations • Real-time incident detection and alerting • Workflows integrated with existing Cloud Ops tools - Slack, JIRA, PagerDuty, etc. • Incident investigation through RCA • 1-click implementation Optimize Cloud Workloads • Continuous optimization assessment across all cloud resources • Identify opportunities to purchase reserved instances • Real-time re-balancing of RI inventory • Rightsize workloads • Shutdown orphan resources • Increase performance of existing cloud resources Governance with Cloud Policies • Pre-defined policies for cloud operations best practices • Auto-remediation of violations • Audit trail to identify user responsible for violations • Real-time inventory of all assets • Identify workloads out of compliance with policies YotaScale Platform Use Cases
  • 10. • Detect Anomalies • Root Cause Analysis • Intelligent Workflow Anomaly Detection Live Monitoring PREVENT RUNAWAY COSTS • Contextually aware corrective action • Deep library of best practices • EC2 & PaaS Support Continuous Optimization Up to 40% Savings OPTIMALLY EFFICIENT INFRASTRUCTURE • Scorecard • 100% tag hygiene • Slice and dice analysis • Accountability & transparency Contextual Analytics Org Benchmark ACCOUNTABILITY & TRANSPARENCY Through the use of machine learning, YotaScale processes millions of data signals and provides contextually relevant anomaly detection and optimization recommendations that reduce your cloud spend
  • 11. YotaScale Anomaly Detection Overview ● YotaScale’s ML/AI powered Anomaly Detection can detect cost anomalies happening across any possible dimension ● Customers get alerted real-time via Email, Slack, etc. ● Quick time to resolution due to YotaScale’s Root Cause Analysis (RCA) DETECT/ ALERT Detect and Alert on real-time cost anomalies PROVIDE RCA Provide Root Cause on what caused that anomaly REMEDIATE Suggest possible fixes to the customer
  • 12. Key Features for Anomaly Detection Identify and Customize Anomalies ● Sophisticated ML Models ● Customizable Dimensions ● Severity Per Anomaly Provide Root Cause Analysis ● ML Models find correlations / causations for each anomaly ● Linked to business events (positive or negative) Suggest Possible Fixes ● Identify Solutions ● Manual Scripts ● Approval based implementation ● Automation Workflow Integration ● Single Sign-On (SSO) ● Slack Integration ● JIRA Integration
  • 13. Closed Feedback Loop on Anomaly Models ● Customer actions for each cost anomaly ○ Dismiss ○ Resolve ○ Snooze ● Anomaly ML models fine-tuned based on customer feedback Actions for every anomaly Dismiss Anomalies Resolve Anomalies
  • 14. Remediation for Cost Anomalies (Future Roadmap) Out of Band (manual) instructions Out of Band (manual) script In Band (manual) Script Approval Based Implementation Automation (Autonomous execution by YotaScale)
  • 15. With real-time anomaly detection, root cause and remediation YotaScale caught this anomaly in time and saved thousands of dollars. “Our virus scanning engine died. We could not figure out the right host and in the process spun up hundreds of machines. YotaScale detected the issue in realtime.” Jonathan Monette Senior Architect “Our virus scanning engine died. We could not figure out the right host and in the process spun up hundreds of machines. YotaScale detected the issue in real-time.” Senior Application Architect YotaScale’s Anomaly Detection discovers applications and services and alerts you to significant changes.
  • 16. ANOMALY DETECTION ROOT CAUSE SAVES DAYS OF TROUBLESHOOTING “Our API gateway team saw an unusual amount of requests resulting in a huge spike in resource provisioning. YotaScale pinpointed the exact issue saving valuable cycles” YotaScale was able to pinpoint the exact issue and save days of investigative work on where to go look.John Smithan Lead Site Reliability Engineer “Our API gateway team saw an unusual amount of requests resulting in a huge spike in resource provisioning. YotaScale pinpointed the exact issue saving valuable cycles” Lead Site Reliability Engineer Going beyond alerting, YotaScale can provide a detailed analysis of the resources that caused an anomaly.
  • 17. Key Benefits of Anomaly Detection 1. Get real-time notifications about any unusual cost spikes across any business critical dimension 2. Serves as insurance policy against runaway cloud costs - can save up to 10-20% of yearly cloud spend 3. Helps troubleshoot root cause of cost spikes and save valuable time for CloudOps, Finance and Engineering teams