SlideShare a Scribd company logo
Autonomous Cloud Operations
Managing AWS Costs with Anomaly
Detection and Root Cause Analysis
Imran Moin
Chief Product Officer
imran@yotascale.com
AGENDA
• Company Overview
• Key Pain Points in managing AWS cloud infrastructure
• YotaScale Solution Overview
• Deep dive into Anomaly Detection
• Real world cost anomalies found by YotaScale
• Live Product Demonstration
• Q&A
COMPANY VISION
AUTONOMOUS CLOUD OPERATIONS
3© YotaScale. CONFIDENTIAL.
Overview
Company with deep domain expertise in enterprise infrastructure, cloud
computing and machine learning
4
ASIM RAZZAQ
CEO, YotaScale
Leadership
USMAN ABBASI
CTO, YotaScale
IMRAN MOIN
CPO, YotaScale
Investors
© YotaScale. CONFIDENTIAL.
Company
Founded in 2015 HQ - Menlo Park, CA Raised $11.6M till date
Cloud Operations Getting Significantly More Complex
© YotaScale. CONFIDENTIAL. 5
40%
Wastage due
to OpEx Cost
Model
On average enterprises
waste 40% cloud
infrastructure
30%
Time on Manual
Cloud Ops
On average, enterprises
spend 30% of their Cloud
Ops time doing manual
tasks that should be
automated
Over 50% of enterprises
are looking to
decentralize cloud
operations
25+
Tools Causing
Fatigue
On average, enterprises
have 25+ tools to manage
cost, performance and
availability SLAs for cloud
workloads
>50%
Cloud Ops
Decentralizing
© YotaScale. CONFIDENTIAL. 6
Cost
Data
Utilization
Data
Inventory
Data
Log
Data
How to optimize each workload to
get the right cost, performance and
availability?
Perf
Data
Memory
Data
Operational Metrics
● Logins/sec
● DAU
● Queries/sec
Cloud Provider Data
Cloud Ops Policies
● Tagging
● Resource Whitelist
Cofig
Data
Container
Data
Third Party Data
Too much data, not enough insights
© YotaScale. CONFIDENTIAL. 7
AUTOMATION
● 1-click RI purchase
● Auto-tag cloud resources
● Identify root cause and remediation for anomalous incident
● Automate fixes for policy violations
PLANNING
● Accurate forecasting of cloud spend per application, team or BU
● Setting budgets and ensuring compliance
● What-if scenario planning
OPTIMIZATION
● Identify cost savings opportunities
● Detect cost spikes ( anomalies)
● RI purchase decisions
● Rightsizing workloads
VISIBILITY
● Detailed cost, usage and performance reports per workload
● Custom dashboards for each application team
● Cloud policy compliance
● Pulling business context from other tools like ServiceNow, etc.
The Problems that Customers Need to Solve
How the YotaScale Platform Works
8
Cloud Provider Data
● Cost
● Utilization
● Inventory
● Logs
● Containers
OPTIMIZE
Suggestions to remediate issues
DIAGNOSE
Identify root cause
DETECT
Discover trends and identify incidents
1
2
3
4
PREDICT
Forecast the future
Third Party Data
● Performance
● Memory
● Configuration
APIs
AUTONOMOUS
Cloud Ops Policies
● Mandatory Tags
● RegEx Formats
● Resource Whitelist
● Purchase
Preferences
Enterprise Integrations
MANUAL
© YotaScale. CONFIDENTIAL. 9
Streamline Cloud
Operations
• Real-time incident detection
and alerting
• Workflows integrated with
existing Cloud Ops tools -
Slack, JIRA, PagerDuty, etc.
• Incident investigation through
RCA
• 1-click implementation
Optimize Cloud Workloads
• Continuous optimization
assessment across all cloud
resources
• Identify opportunities to
purchase reserved instances
• Real-time re-balancing of RI
inventory
• Rightsize workloads
• Shutdown orphan resources
• Increase performance of
existing cloud resources
Governance with Cloud
Policies
• Pre-defined policies for cloud
operations best practices
• Auto-remediation of
violations
• Audit trail to identify user
responsible for violations
• Real-time inventory of all
assets
• Identify workloads out of
compliance with policies
YotaScale Platform Use Cases
• Detect Anomalies
• Root Cause Analysis
• Intelligent Workflow
Anomaly Detection
Live Monitoring
PREVENT RUNAWAY COSTS
• Contextually aware
corrective action
• Deep library of best
practices
• EC2 & PaaS Support
Continuous
Optimization
Up to 40% Savings
OPTIMALLY EFFICIENT
INFRASTRUCTURE
• Scorecard
• 100% tag hygiene
• Slice and dice analysis
• Accountability
& transparency
Contextual
Analytics
Org Benchmark
ACCOUNTABILITY
& TRANSPARENCY
Through the use of
machine learning,
YotaScale processes
millions of data signals
and provides
contextually relevant
anomaly detection and
optimization
recommendations that
reduce your cloud spend
YotaScale Anomaly Detection Overview
● YotaScale’s ML/AI powered Anomaly Detection can detect cost anomalies happening across
any possible dimension
● Customers get alerted real-time via Email, Slack, etc.
● Quick time to resolution due to YotaScale’s Root Cause Analysis (RCA)
DETECT/ ALERT
Detect and Alert on
real-time cost anomalies
PROVIDE RCA
Provide Root Cause on
what caused that anomaly
REMEDIATE
Suggest possible fixes to
the customer
Key Features for Anomaly Detection
Identify and Customize
Anomalies
● Sophisticated ML Models
● Customizable Dimensions
● Severity Per Anomaly
Provide Root Cause Analysis
● ML Models find correlations /
causations for each anomaly
● Linked to business events
(positive or negative)
Suggest Possible Fixes
● Identify Solutions
● Manual Scripts
● Approval based
implementation
● Automation
Workflow Integration
● Single Sign-On (SSO)
● Slack Integration
● JIRA Integration
Closed Feedback Loop on Anomaly Models
● Customer actions for each cost anomaly
○ Dismiss
○ Resolve
○ Snooze
● Anomaly ML models fine-tuned based on customer feedback
Actions for every
anomaly
Dismiss
Anomalies
Resolve
Anomalies
Remediation for Cost Anomalies (Future Roadmap)
Out of Band
(manual)
instructions
Out of Band
(manual) script
In Band
(manual) Script
Approval Based
Implementation
Automation
(Autonomous
execution by
YotaScale)
With real-time anomaly detection, root
cause and remediation YotaScale caught
this anomaly in time and saved
thousands of dollars.
“Our virus scanning
engine died. We could
not figure out the right
host and in the process
spun up hundreds of
machines. YotaScale
detected the issue in
realtime.”
Jonathan Monette
Senior Architect
“Our virus scanning
engine died. We could
not figure out the right
host and in the process
spun up hundreds of
machines. YotaScale
detected the issue in
real-time.”
Senior Application Architect
YotaScale’s Anomaly
Detection discovers
applications and services
and alerts you to
significant changes.
ANOMALY DETECTION ROOT CAUSE SAVES DAYS OF TROUBLESHOOTING
“Our API gateway team
saw an unusual amount of
requests resulting in a
huge spike in resource
provisioning. YotaScale
pinpointed the exact issue
saving valuable cycles”
YotaScale was able to pinpoint the exact
issue and save days of investigative work
on where to go look.John Smithan
Lead Site Reliability Engineer
“Our API gateway team
saw an unusual amount of
requests resulting in a
huge spike in resource
provisioning. YotaScale
pinpointed the exact issue
saving valuable cycles”
Lead Site Reliability Engineer
Going beyond alerting,
YotaScale can provide a
detailed analysis of the
resources that caused an
anomaly.
Key Benefits of Anomaly Detection
1. Get real-time notifications about any unusual cost spikes across any business
critical dimension
2. Serves as insurance policy against runaway cloud costs - can save up to
10-20% of yearly cloud spend
3. Helps troubleshoot root cause of cost spikes and save valuable time for
CloudOps, Finance and Engineering teams
Product Demonstration

More Related Content

What's hot

교육전산망 클라우드와 ITSM 발표자료
교육전산망 클라우드와 ITSM 발표자료교육전산망 클라우드와 ITSM 발표자료
교육전산망 클라우드와 ITSM 발표자료
에스티이지 (STEG)
 
PHP x AWS でスケーラブルなシステムをつくろう
PHP x AWS でスケーラブルなシステムをつくろうPHP x AWS でスケーラブルなシステムをつくろう
PHP x AWS でスケーラブルなシステムをつくろうTaiji INOUE
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
DataWorks Summit
 
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
Amazon Web Services Korea
 
Value stream management is essential for dev ops v4
Value stream management is essential for dev ops v4Value stream management is essential for dev ops v4
Value stream management is essential for dev ops v4
DevOps.com
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
Sujai Prakasam
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery Service
Amazon Web Services
 
Governance Strategies & Tools for Cloud Formation
Governance Strategies & Tools for Cloud Formation Governance Strategies & Tools for Cloud Formation
Governance Strategies & Tools for Cloud Formation
Amazon Web Services
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
Amazon Web Services
 
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
Kushan Lahiru Perera
 
Scrum introduction in hsin chu-agilemeetup
Scrum introduction in hsin chu-agilemeetupScrum introduction in hsin chu-agilemeetup
Scrum introduction in hsin chu-agilemeetup
Jen-Chieh Ko
 
Improving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWSImproving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWS
Amazon Web Services
 
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
Amazon Web Services Korea
 
Cloud 101 - What is the Cloud?
Cloud 101 - What is the Cloud?Cloud 101 - What is the Cloud?
Cloud 101 - What is the Cloud?
RapidScale
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Amazon Web Services
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
Amazon Web Services
 
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
Amazon Web Services Korea
 
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Amazon Web Services
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
SANG WON PARK
 

What's hot (20)

교육전산망 클라우드와 ITSM 발표자료
교육전산망 클라우드와 ITSM 발표자료교육전산망 클라우드와 ITSM 발표자료
교육전산망 클라우드와 ITSM 발표자료
 
PHP x AWS でスケーラブルなシステムをつくろう
PHP x AWS でスケーラブルなシステムをつくろうPHP x AWS でスケーラブルなシステムをつくろう
PHP x AWS でスケーラブルなシステムをつくろう
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
통신사의 차별화된 메시징 서비스 아키텍처를 소개합니다 - 정영준 AWS 솔루션즈 아키텍트 / 강성원, 나상화 소프트웨어 엔지니어 무선사업부...
 
Value stream management is essential for dev ops v4
Value stream management is essential for dev ops v4Value stream management is essential for dev ops v4
Value stream management is essential for dev ops v4
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery Service
 
Governance Strategies & Tools for Cloud Formation
Governance Strategies & Tools for Cloud Formation Governance Strategies & Tools for Cloud Formation
Governance Strategies & Tools for Cloud Formation
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
 
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
 
Scrum introduction in hsin chu-agilemeetup
Scrum introduction in hsin chu-agilemeetupScrum introduction in hsin chu-agilemeetup
Scrum introduction in hsin chu-agilemeetup
 
Improving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWSImproving Infrastructure Governance on AWS
Improving Infrastructure Governance on AWS
 
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
AWS 스토리지 서비스 소개 및 실습 - 김용기, AWS 솔루션즈 아키텍트
 
Cloud 101 - What is the Cloud?
Cloud 101 - What is the Cloud?Cloud 101 - What is the Cloud?
Cloud 101 - What is the Cloud?
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
 
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 

Similar to Managing AWS Costs with Anomaly Detection and Root Cause Analysis

The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
Jonah Kowall
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
AppDynamics
 
Google App engine
Google App engineGoogle App engine
Google App engine
Indika Munaweera Kankanamge
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
DevOps.com
 
Enterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu OracleEnterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu Oracle
MarketingArrowECS_CZ
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
Cloudera, Inc.
 
Real-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine dataReal-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine data
jKool
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_ext
Oracle Developers
 
Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!
Flexera
 
Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019
Anil Gupta (AJ) - vExpert
 
Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?
ManageEngine
 
How to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG EnterpriseHow to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG Enterprise
eG Innovations
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
eG Innovations
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
SOUG Day - autonomous what is next
SOUG Day - autonomous what is nextSOUG Day - autonomous what is next
SOUG Day - autonomous what is next
Thomas Teske
 
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Amazon Web Services
 
Wavefront presentation-May-2019
Wavefront presentation-May-2019Wavefront presentation-May-2019
Wavefront presentation-May-2019
Anil Gupta (AJ) - vExpert
 
Unified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HourUnified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The Hour
eG Innovations
 
Securing the Cloud Native stack
Securing the Cloud Native stackSecuring the Cloud Native stack
Securing the Cloud Native stack
Hector Tapia
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
Cloudera, Inc.
 

Similar to Managing AWS Costs with Anomaly Detection and Root Cause Analysis (20)

The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Google App engine
Google App engineGoogle App engine
Google App engine
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
 
Enterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu OracleEnterprise Cloud transformation z pohledu Oracle
Enterprise Cloud transformation z pohledu Oracle
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Real-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine dataReal-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine data
 
Get ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_extGet ready for_an_autonomous_data_driven_future_ext
Get ready for_an_autonomous_data_driven_future_ext
 
Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!Having Trouble Managing All Your Cloud Services? We Know!
Having Trouble Managing All Your Cloud Services? We Know!
 
Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019Wavefront-by-VMware-April-2019
Wavefront-by-VMware-April-2019
 
Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?Are Your Mission Critical Applications Really Performing?
Are Your Mission Critical Applications Really Performing?
 
How to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG EnterpriseHow to Monitor Your Java & .NET Applications with eG Enterprise
How to Monitor Your Java & .NET Applications with eG Enterprise
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
SOUG Day - autonomous what is next
SOUG Day - autonomous what is nextSOUG Day - autonomous what is next
SOUG Day - autonomous what is next
 
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
 
Wavefront presentation-May-2019
Wavefront presentation-May-2019Wavefront presentation-May-2019
Wavefront presentation-May-2019
 
Unified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HourUnified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The Hour
 
Securing the Cloud Native stack
Securing the Cloud Native stackSecuring the Cloud Native stack
Securing the Cloud Native stack
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 

Recently uploaded

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 

Recently uploaded (20)

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 

Managing AWS Costs with Anomaly Detection and Root Cause Analysis

  • 1. Autonomous Cloud Operations Managing AWS Costs with Anomaly Detection and Root Cause Analysis Imran Moin Chief Product Officer imran@yotascale.com
  • 2. AGENDA • Company Overview • Key Pain Points in managing AWS cloud infrastructure • YotaScale Solution Overview • Deep dive into Anomaly Detection • Real world cost anomalies found by YotaScale • Live Product Demonstration • Q&A
  • 3. COMPANY VISION AUTONOMOUS CLOUD OPERATIONS 3© YotaScale. CONFIDENTIAL.
  • 4. Overview Company with deep domain expertise in enterprise infrastructure, cloud computing and machine learning 4 ASIM RAZZAQ CEO, YotaScale Leadership USMAN ABBASI CTO, YotaScale IMRAN MOIN CPO, YotaScale Investors © YotaScale. CONFIDENTIAL. Company Founded in 2015 HQ - Menlo Park, CA Raised $11.6M till date
  • 5. Cloud Operations Getting Significantly More Complex © YotaScale. CONFIDENTIAL. 5 40% Wastage due to OpEx Cost Model On average enterprises waste 40% cloud infrastructure 30% Time on Manual Cloud Ops On average, enterprises spend 30% of their Cloud Ops time doing manual tasks that should be automated Over 50% of enterprises are looking to decentralize cloud operations 25+ Tools Causing Fatigue On average, enterprises have 25+ tools to manage cost, performance and availability SLAs for cloud workloads >50% Cloud Ops Decentralizing
  • 6. © YotaScale. CONFIDENTIAL. 6 Cost Data Utilization Data Inventory Data Log Data How to optimize each workload to get the right cost, performance and availability? Perf Data Memory Data Operational Metrics ● Logins/sec ● DAU ● Queries/sec Cloud Provider Data Cloud Ops Policies ● Tagging ● Resource Whitelist Cofig Data Container Data Third Party Data Too much data, not enough insights
  • 7. © YotaScale. CONFIDENTIAL. 7 AUTOMATION ● 1-click RI purchase ● Auto-tag cloud resources ● Identify root cause and remediation for anomalous incident ● Automate fixes for policy violations PLANNING ● Accurate forecasting of cloud spend per application, team or BU ● Setting budgets and ensuring compliance ● What-if scenario planning OPTIMIZATION ● Identify cost savings opportunities ● Detect cost spikes ( anomalies) ● RI purchase decisions ● Rightsizing workloads VISIBILITY ● Detailed cost, usage and performance reports per workload ● Custom dashboards for each application team ● Cloud policy compliance ● Pulling business context from other tools like ServiceNow, etc. The Problems that Customers Need to Solve
  • 8. How the YotaScale Platform Works 8 Cloud Provider Data ● Cost ● Utilization ● Inventory ● Logs ● Containers OPTIMIZE Suggestions to remediate issues DIAGNOSE Identify root cause DETECT Discover trends and identify incidents 1 2 3 4 PREDICT Forecast the future Third Party Data ● Performance ● Memory ● Configuration APIs AUTONOMOUS Cloud Ops Policies ● Mandatory Tags ● RegEx Formats ● Resource Whitelist ● Purchase Preferences Enterprise Integrations MANUAL
  • 9. © YotaScale. CONFIDENTIAL. 9 Streamline Cloud Operations • Real-time incident detection and alerting • Workflows integrated with existing Cloud Ops tools - Slack, JIRA, PagerDuty, etc. • Incident investigation through RCA • 1-click implementation Optimize Cloud Workloads • Continuous optimization assessment across all cloud resources • Identify opportunities to purchase reserved instances • Real-time re-balancing of RI inventory • Rightsize workloads • Shutdown orphan resources • Increase performance of existing cloud resources Governance with Cloud Policies • Pre-defined policies for cloud operations best practices • Auto-remediation of violations • Audit trail to identify user responsible for violations • Real-time inventory of all assets • Identify workloads out of compliance with policies YotaScale Platform Use Cases
  • 10. • Detect Anomalies • Root Cause Analysis • Intelligent Workflow Anomaly Detection Live Monitoring PREVENT RUNAWAY COSTS • Contextually aware corrective action • Deep library of best practices • EC2 & PaaS Support Continuous Optimization Up to 40% Savings OPTIMALLY EFFICIENT INFRASTRUCTURE • Scorecard • 100% tag hygiene • Slice and dice analysis • Accountability & transparency Contextual Analytics Org Benchmark ACCOUNTABILITY & TRANSPARENCY Through the use of machine learning, YotaScale processes millions of data signals and provides contextually relevant anomaly detection and optimization recommendations that reduce your cloud spend
  • 11. YotaScale Anomaly Detection Overview ● YotaScale’s ML/AI powered Anomaly Detection can detect cost anomalies happening across any possible dimension ● Customers get alerted real-time via Email, Slack, etc. ● Quick time to resolution due to YotaScale’s Root Cause Analysis (RCA) DETECT/ ALERT Detect and Alert on real-time cost anomalies PROVIDE RCA Provide Root Cause on what caused that anomaly REMEDIATE Suggest possible fixes to the customer
  • 12. Key Features for Anomaly Detection Identify and Customize Anomalies ● Sophisticated ML Models ● Customizable Dimensions ● Severity Per Anomaly Provide Root Cause Analysis ● ML Models find correlations / causations for each anomaly ● Linked to business events (positive or negative) Suggest Possible Fixes ● Identify Solutions ● Manual Scripts ● Approval based implementation ● Automation Workflow Integration ● Single Sign-On (SSO) ● Slack Integration ● JIRA Integration
  • 13. Closed Feedback Loop on Anomaly Models ● Customer actions for each cost anomaly ○ Dismiss ○ Resolve ○ Snooze ● Anomaly ML models fine-tuned based on customer feedback Actions for every anomaly Dismiss Anomalies Resolve Anomalies
  • 14. Remediation for Cost Anomalies (Future Roadmap) Out of Band (manual) instructions Out of Band (manual) script In Band (manual) Script Approval Based Implementation Automation (Autonomous execution by YotaScale)
  • 15. With real-time anomaly detection, root cause and remediation YotaScale caught this anomaly in time and saved thousands of dollars. “Our virus scanning engine died. We could not figure out the right host and in the process spun up hundreds of machines. YotaScale detected the issue in realtime.” Jonathan Monette Senior Architect “Our virus scanning engine died. We could not figure out the right host and in the process spun up hundreds of machines. YotaScale detected the issue in real-time.” Senior Application Architect YotaScale’s Anomaly Detection discovers applications and services and alerts you to significant changes.
  • 16. ANOMALY DETECTION ROOT CAUSE SAVES DAYS OF TROUBLESHOOTING “Our API gateway team saw an unusual amount of requests resulting in a huge spike in resource provisioning. YotaScale pinpointed the exact issue saving valuable cycles” YotaScale was able to pinpoint the exact issue and save days of investigative work on where to go look.John Smithan Lead Site Reliability Engineer “Our API gateway team saw an unusual amount of requests resulting in a huge spike in resource provisioning. YotaScale pinpointed the exact issue saving valuable cycles” Lead Site Reliability Engineer Going beyond alerting, YotaScale can provide a detailed analysis of the resources that caused an anomaly.
  • 17. Key Benefits of Anomaly Detection 1. Get real-time notifications about any unusual cost spikes across any business critical dimension 2. Serves as insurance policy against runaway cloud costs - can save up to 10-20% of yearly cloud spend 3. Helps troubleshoot root cause of cost spikes and save valuable time for CloudOps, Finance and Engineering teams