SlideShare a Scribd company logo
1 of 59
Automating Operations
with Machine Intelligence
Rob Harrop
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
microservices-ml-ai
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
CEO @ Skipjaq
Co-founder @ SpringSource
Automated performance management
Why automate operations?
Why now?
What does automated operations look like?
How do we build for automation?
Solving a real problem…
Why automate operations?
More Complexity
Monolith -> Microservices
Strong -> Eventual Consistency
Assume reliability -> Assume failure
More Deployments
Very end of 2009
Today
30
20
10
40
Credit: Mike Brittain, Engineering Director @ Easy
Less time to identify fixes
Rollbacks more likely
Tiny window for human intervention
Harder
Faster
Why now?
We have to
We can
Cloud
Containers
Observability
Microservices
ML/AI
Trends
Current trends provide the impetus
and tools for automation by AI
Automated Operations
Move 37
Move 78 - God’s Touch
AI Human
Wholly performed by human
Wholly performed by AI
Co-operation between human and AI
Actionable insight
Types of Operation Actions
Data is not insight
Gathering metrics is not automating operations
But, metrics are critical to automating operations
On Metrics
Human ≠ Manual
Testing
Deployment
Provisioning
Actions by Human
Anomaly alerting
Rollback broken builds
Dependency upgrade
Cooperative Actions
Predictive auto scaling
Workload placement
Automatic rollback
Performance optimisation?
Security?
Actions by AI
Actions
and
Actionable Insights
Building for Automation
Visible metrics and logs
Ability to start/stop/restart/move workload
Ability to change configuration
Ability to modify dependencies
Ability to wire/rewire external services
Requirements for Operations
Self-contained package
Disposable processes
Externally-configurable
Externally-observable
Externalised dependencies
Externalised service wiring
12+1 Factor
Metrics as event streams
Standard metrics
- CPU usage, memory usage, …
Service-specific metrics
- Leads received, items sold, …
13th Factor - Observability
Detecting Anomalous DB CPU
Case Study
Background
Consumer-facing web application running Rails against PostgreSQL on AWS
RDS
Mix of transactional and batch workloads running against the same database
Question: when is the DB unusually overloaded?
Detecting Anomalies
Policy-based
Statistical model
Predictive model
Classification model
Policy Based
Fixed threshold alerting
How well does this work?
Not Very
Statistical Model
Twitter AnomalyDetection package
- Seasonal Hybrid ESD
Is this point unexpected in our distribution?
- With seasonal and trend effects removed
Statistical Model
Stream
Metrics
Sliding window of observations
(1 month, 1 year?)
Each new observation
run model (S - H - ESD)
Is the new point an outlier?
Predictive Model
Train a model to predict values in the time series
Prediction error > critical value => outlier
x1
x2
x3
+1 +1
Layer L1 Layer L2
Layer L3
hW,b(x)
a2
(2)
a3
(2)
a1
(2)
From: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
h0 h1 h2 h3 h4
x0 x1 x2 x3 x4
A A A A A
Predictive Model
Metrics Stream
Prediction
Training set 

?? last month
Model
Re-Train
(Nightly, weekly?)
Is prediction error
an outlier???
Handling Anomalies
Actionable alerts
- Confidence in predictions
No alerts for pointless things
Handling Anomalies
Taking action
- Rewiring services to read-replica?
- Kill long-running queries?
Handling Anomalies
Confidence in the model leads
to confidence in automation
Summary
Increasing complexity and deployment speed make operational
automation a must
We must build services that are ready for automation
Simple models can often beat complex ones
Cheap compute and storage makes large-scale ML available to
everyone
Thank You
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
microservices-ml-ai

More Related Content

Viewers also liked

150810 Artificial Intelligence Policies
150810 Artificial Intelligence Policies150810 Artificial Intelligence Policies
150810 Artificial Intelligence Policies
Samantha Hautea
 

Viewers also liked (17)

150810 Artificial Intelligence Policies
150810 Artificial Intelligence Policies150810 Artificial Intelligence Policies
150810 Artificial Intelligence Policies
 
Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5
 
Security_issues of AI
Security_issues of AISecurity_issues of AI
Security_issues of AI
 
Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6
 
STOP AI training manual
STOP AI training manualSTOP AI training manual
STOP AI training manual
 
Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...
Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...
Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...
 
Deep Learning - The Force of AI Awakens
Deep Learning - The Force of AI AwakensDeep Learning - The Force of AI Awakens
Deep Learning - The Force of AI Awakens
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
The Impact of Artificial Intelligence on the Built Environment
The Impact of Artificial Intelligence on the Built EnvironmentThe Impact of Artificial Intelligence on the Built Environment
The Impact of Artificial Intelligence on the Built Environment
 
Deep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoDeep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San Francisco
 
What Do You Do with a Problem Like AI?
What Do You Do with a Problem Like AI?What Do You Do with a Problem Like AI?
What Do You Do with a Problem Like AI?
 
How ai work, Abstract for dummies based on excellent artilce from The Economist
How ai work, Abstract for dummies based on excellent artilce from The EconomistHow ai work, Abstract for dummies based on excellent artilce from The Economist
How ai work, Abstract for dummies based on excellent artilce from The Economist
 
Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)
 
RBS - SmartMatch
RBS - SmartMatchRBS - SmartMatch
RBS - SmartMatch
 
CPA Leadership in the Future - What Got You Here Won;t get You There
CPA Leadership in the Future - What Got You Here Won;t get You ThereCPA Leadership in the Future - What Got You Here Won;t get You There
CPA Leadership in the Future - What Got You Here Won;t get You There
 
My resume
My resumeMy resume
My resume
 

More from C4Media

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Autonomous Operations: Microservices, ML and AI