SlideShare a Scribd company logo
1 of 29
Using Machine Learning to Optimize
DevOps Practices
Building Learning into Monitoring and Feedback
Peter Varhol
About me
• International speaker and writer
• Degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
Agenda
• What is machine learning?
• How is machine learning applied to DevOps?
• Challenges in training these systems
• What constitutes an issue?
• Summary and conclusions
What is Machine Learning?
• Layered algorithms that change parameters based on feedback
from know data
• Can be linear or nonlinear
• Algorithms can be fixed in production or adaptive
• Fixed – algorithms do not adjust once deployed
• Adaptive – algorithms continually adjust to new data
• Usually part of a larger system
Adaptive Systems
• Airline pricing
• Ticket prices change three times a day based on demand
• It can cost less to go farther
• It can cost less later
• Ecommerce systems
• Recommendations try to discern what else you might want
• Can I incentivize you to fill up the plane?
Why Use Adaptive?
• The “right” result will vary over time
• Trying to optimize a particular result
• Revenue
• The problem domain is not static
Confidential, Dynatrace LLC
How Are Fixed Systems Used?
• Transportation
• Self-driving cars
• Aircraft/Drones
• Ecommerce
• Recommendation engines
• Medical
• Diagnosis systems
Why Use Fixed Machine Learning Systems
• The problem domain is static
• The expectations remain constant
• The right answer is known under most conditions
• The original algorithms remain valid over a long period of time
DevOps Practices Generate Data
• During development
• Agile metrics, JIRA issues, test case metrics
• During continuous integration
• System test metrics
• During continuous deployment
• Quality metrics for deployments
• After deployment and into production
• Application availability and performance
• Usage log files
Focus on Monitoring
• Ongoing data on availability and performance
• RUM
• Synthetic tests
• Application monitoring
• Monitoring tackles the back end of DevOps
• Identifying unhealthy trends
• Diagnoses failures and poor performance
• Recommends action
• Fixed or adaptive depends on your goals
Where Do Predictive Analytics Come In?
• Big data makes possible predictions of future events
• Are we going to fail?
• How will we perform with traffic surges?
• As well as past events
• What went wrong and how do we fix it
• We can rely on past data
• Adaptive systems may not perform as well
• Clear goals needed
What Technologies Are Involved?
• Neural networks
• Genetic algorithms
• Rules engines
Neural Networks
• Set of layered algorithms whose variables can be
adjusted via a learning process
• The learning process involves training with
known inputs and outputs
• The algorithms adjust coefficients to converge on
the correct answer (or not)
• You freeze the algorithms and coefficients, and
deploy
• Or you optimize on a particular set of characteristics
A Sample Neural Network
Genetic Algorithms
• Use the principle of natural selection
• Create a range of possible solutions
• Try out each of them
• Choose and combine two of the better
alternatives
• Rinse and repeat as necessary
Bringing in DevOps
• DevOps has data that can be used to train neural networks
• Health of the application
• Trends in application traffic and responsiveness
• Application failure
Machine Learning Helps DevOps
• Decisions are complex
• Why is the CPU maxed?
• What is causing disk thrashing?
• Why did the network slow?
• Why did the application fail?
• Data is massive
• Potentially thousands of data points a day
How Good Are Decisions?
• Expert versus machine
• Given the same data
• In many domains they tie
• With additional data, the human can be better
• But machine learning will get better
• But only as good as the data
We Want to Do Two Things
• Identify trends that may indicate future problems
• Increasing response times
• More page errors
• Diagnose faults once they have happened
• Why did the application fail?
• How can we fix it as quickly as possible?
Fixed Algorithms Work for Some Problems
• Immediate performance and failure identification
• Diagnosis of failures and performance issues
• These are readily identifiable from known data
Adaptive Systems Supplement These Tools
• Predictions of future events
• Performance
• Availability
• The target is moving
• So we need current data to adjust the algorithms
The Machine Helps the DevOps Expert
• The machine learning app provides:
• Early warning on possible performance issues and failures
• Immediate notification of failure or impending failure
• Trend analysis of data to predict unhealthy outcomes
• The machine learning is an assistant
• It can’t fix anything
• It can’t necessarily identify the root cause
What is the Goal?
• We have many ways of monitoring
• Many of them are represented at this conference
• Each measures something a little different
• Latency, response time, availability, network, DNS . . .
• Too much data can be no better than no data at all
• Machine learning can correlate across
measurements
• Focus to eliminate false positives
Intelligent Systems Are Sometimes Wrong
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” is good
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
Testing Machine Learning Systems
• Have objective acceptance criteria
• Test with new data
• Don’t count on all results being accurate
• Understand the architecture of the network as a part of
the testing process
• Communicate the level of confidence you have in the
results to management and users
A Cautionary Tale
• All events are not created equal
• AI systems treat events equally
• A failure of a system during busy season is the same as any other
• DevOps pros know otherwise
• And can exert additional effort in response
• And actually fix the problem
• We can’t automate what we don’t understand
• You need the human in the loop
Confidential, Dynatrace LLC
Conclusions
• DevOps is a natural environment for machine learning
systems
• Any activity that generates data and requires a decision is fair game
• Monitoring is low-hanging fruit
• Fixed systems for failure and diagnosis, adaptive for trend
analysis
Confidential, Dynatrace LLC
References
• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-
whos-to-blame/
• https://pvarhol.wordpress.com/2017/07/22/what-brought-about-
our-ai-revolution/
• https://pvarhol.wordpress.com/2017/06/21/analytics-dont-apply-in-
the-clutch/
Confidential, Dynatrace LLC
Thank You
Peter Varhol
peter@petervarhol.com

More Related Content

What's hot

TransPort Workshop
TransPort WorkshopTransPort Workshop
TransPort Workshopjwcampbe
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed SystemsAleksandr Tavgen
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins outDruantia
 
What Do We Automate First
What Do We Automate FirstWhat Do We Automate First
What Do We Automate Firstrrice2000
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Solano Labs
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates YouBradford Stephens
 
SharePoint Troubleshooting
SharePoint TroubleshootingSharePoint Troubleshooting
SharePoint TroubleshootingToby McGrail
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Bradford Stephens
 
New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016Stevan Zivanovic
 
Wix Automation - Automation Manager
Wix Automation - Automation ManagerWix Automation - Automation Manager
Wix Automation - Automation ManagerEfrat Attas
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXDatavail
 
Solano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingSolano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingMassTLC
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins outDruantia
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in AnalyticsKen Goossens
 
Digital Testing Approach
Digital Testing ApproachDigital Testing Approach
Digital Testing ApproachAnand Deshpande
 

What's hot (20)

Optimizing Java
Optimizing JavaOptimizing Java
Optimizing Java
 
TransPort Workshop
TransPort WorkshopTransPort Workshop
TransPort Workshop
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
What Do We Automate First
What Do We Automate FirstWhat Do We Automate First
What Do We Automate First
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
SharePoint Troubleshooting
SharePoint TroubleshootingSharePoint Troubleshooting
SharePoint Troubleshooting
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Wix Automation - Automation Manager
Wix Automation - Automation ManagerWix Automation - Automation Manager
Wix Automation - Automation Manager
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEX
 
4 pc repair
4 pc repair4 pc repair
4 pc repair
 
Solano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingSolano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in Analytics
 
SHEKHAR VERMA
SHEKHAR VERMASHEKHAR VERMA
SHEKHAR VERMA
 
Digital Testing Approach
Digital Testing ApproachDigital Testing Approach
Digital Testing Approach
 

Viewers also liked

Using Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityUsing Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityJosh Atwell
 
The API Side of Monitoring
The API Side of MonitoringThe API Side of Monitoring
The API Side of MonitoringNordic APIs
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsJosh Atwell
 
Managing the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellManaging the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellJosh Atwell
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?Qualitest
 
Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Kevin Cross
 
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Josh Atwell
 
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsJosh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsKevin Cross
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingJon Stevens-Hall
 

Viewers also liked (9)

Using Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityUsing Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps Maturity
 
The API Side of Monitoring
The API Side of MonitoringThe API Side of Monitoring
The API Side of Monitoring
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
 
Managing the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellManaging the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShell
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?
 
Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?
 
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
 
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsJosh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
 

Similar to Using Machine Learning to Optimize DevOps Practices

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatracePeter Varhol
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systemsPeter Varhol
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyTimetrix
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Making a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanMaking a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanQA or the Highway
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldLucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldDevOps Enterprise Summit
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Montrium
 
The Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsThe Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsBizTalk360
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hellYun Ki Lee
 
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryAlphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryInternetwork Engineering (IE)
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 

Similar to Using Machine Learning to Optimize DevOps Practices (20)

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Making a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanMaking a Mock by Kelsey Shannahan
Making a Mock by Kelsey Shannahan
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldLucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​
 
The Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsThe Analysis Part of Integration Projects
The Analysis Part of Integration Projects
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hell
 
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryAlphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 

More from Peter Varhol

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor SyndromePeter Varhol
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the futurePeter Varhol
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisPeter Varhol
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsPeter Varhol
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debtPeter Varhol
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignitePeter Varhol
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightningPeter Varhol
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Peter Varhol
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varholPeter Varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolPeter Varhol
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testingPeter Varhol
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Peter Varhol
 

More from Peter Varhol (12)

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Using Machine Learning to Optimize DevOps Practices

  • 1. Using Machine Learning to Optimize DevOps Practices Building Learning into Monitoring and Feedback Peter Varhol
  • 2. About me • International speaker and writer • Degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. Agenda • What is machine learning? • How is machine learning applied to DevOps? • Challenges in training these systems • What constitutes an issue? • Summary and conclusions
  • 4. What is Machine Learning? • Layered algorithms that change parameters based on feedback from know data • Can be linear or nonlinear • Algorithms can be fixed in production or adaptive • Fixed – algorithms do not adjust once deployed • Adaptive – algorithms continually adjust to new data • Usually part of a larger system
  • 5. Adaptive Systems • Airline pricing • Ticket prices change three times a day based on demand • It can cost less to go farther • It can cost less later • Ecommerce systems • Recommendations try to discern what else you might want • Can I incentivize you to fill up the plane?
  • 6. Why Use Adaptive? • The “right” result will vary over time • Trying to optimize a particular result • Revenue • The problem domain is not static Confidential, Dynatrace LLC
  • 7. How Are Fixed Systems Used? • Transportation • Self-driving cars • Aircraft/Drones • Ecommerce • Recommendation engines • Medical • Diagnosis systems
  • 8. Why Use Fixed Machine Learning Systems • The problem domain is static • The expectations remain constant • The right answer is known under most conditions • The original algorithms remain valid over a long period of time
  • 9. DevOps Practices Generate Data • During development • Agile metrics, JIRA issues, test case metrics • During continuous integration • System test metrics • During continuous deployment • Quality metrics for deployments • After deployment and into production • Application availability and performance • Usage log files
  • 10. Focus on Monitoring • Ongoing data on availability and performance • RUM • Synthetic tests • Application monitoring • Monitoring tackles the back end of DevOps • Identifying unhealthy trends • Diagnoses failures and poor performance • Recommends action • Fixed or adaptive depends on your goals
  • 11. Where Do Predictive Analytics Come In? • Big data makes possible predictions of future events • Are we going to fail? • How will we perform with traffic surges? • As well as past events • What went wrong and how do we fix it • We can rely on past data • Adaptive systems may not perform as well • Clear goals needed
  • 12. What Technologies Are Involved? • Neural networks • Genetic algorithms • Rules engines
  • 13. Neural Networks • Set of layered algorithms whose variables can be adjusted via a learning process • The learning process involves training with known inputs and outputs • The algorithms adjust coefficients to converge on the correct answer (or not) • You freeze the algorithms and coefficients, and deploy • Or you optimize on a particular set of characteristics
  • 14. A Sample Neural Network
  • 15. Genetic Algorithms • Use the principle of natural selection • Create a range of possible solutions • Try out each of them • Choose and combine two of the better alternatives • Rinse and repeat as necessary
  • 16. Bringing in DevOps • DevOps has data that can be used to train neural networks • Health of the application • Trends in application traffic and responsiveness • Application failure
  • 17. Machine Learning Helps DevOps • Decisions are complex • Why is the CPU maxed? • What is causing disk thrashing? • Why did the network slow? • Why did the application fail? • Data is massive • Potentially thousands of data points a day
  • 18. How Good Are Decisions? • Expert versus machine • Given the same data • In many domains they tie • With additional data, the human can be better • But machine learning will get better • But only as good as the data
  • 19. We Want to Do Two Things • Identify trends that may indicate future problems • Increasing response times • More page errors • Diagnose faults once they have happened • Why did the application fail? • How can we fix it as quickly as possible?
  • 20. Fixed Algorithms Work for Some Problems • Immediate performance and failure identification • Diagnosis of failures and performance issues • These are readily identifiable from known data
  • 21. Adaptive Systems Supplement These Tools • Predictions of future events • Performance • Availability • The target is moving • So we need current data to adjust the algorithms
  • 22. The Machine Helps the DevOps Expert • The machine learning app provides: • Early warning on possible performance issues and failures • Immediate notification of failure or impending failure • Trend analysis of data to predict unhealthy outcomes • The machine learning is an assistant • It can’t fix anything • It can’t necessarily identify the root cause
  • 23. What is the Goal? • We have many ways of monitoring • Many of them are represented at this conference • Each measures something a little different • Latency, response time, availability, network, DNS . . . • Too much data can be no better than no data at all • Machine learning can correlate across measurements • Focus to eliminate false positives
  • 24. Intelligent Systems Are Sometimes Wrong • The problem domain is ambiguous • There is no single “right” answer • “Close enough” is good • We don’t know quite why the software responds as it does • We can’t easily trace code paths
  • 25. Testing Machine Learning Systems • Have objective acceptance criteria • Test with new data • Don’t count on all results being accurate • Understand the architecture of the network as a part of the testing process • Communicate the level of confidence you have in the results to management and users
  • 26. A Cautionary Tale • All events are not created equal • AI systems treat events equally • A failure of a system during busy season is the same as any other • DevOps pros know otherwise • And can exert additional effort in response • And actually fix the problem • We can’t automate what we don’t understand • You need the human in the loop Confidential, Dynatrace LLC
  • 27. Conclusions • DevOps is a natural environment for machine learning systems • Any activity that generates data and requires a decision is fair game • Monitoring is low-hanging fruit • Fixed systems for failure and diagnosis, adaptive for trend analysis Confidential, Dynatrace LLC

Editor's Notes

  1. These types of software are becoming increasingly common, in areas such as ecommerce, public transportation, automotive, finance, and computer networks. They have the potential to make decisions given sufficiently well-defined inputs and goals. In some instances, they are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
  2. Most machine learning systems are based on neural networks. A neural network is a set of layered algorithms whose variables can be adjusted via a learning process. The learning process involves using known data inputs to create outputs that are then compared with known results. When the algorithms reflect the known results with the desired degree of accuracy, the algebraic coefficients are frozen and production code is generated. Today, this comprises much of what we understand as artificial intelligence.
  3. But there is a type of software where having a defined output is no longer the case. Actually, two types. One is machine learning systems. The second is predictive analytics, or adaptive systems.
  4. Have objective acceptance criteria. Know the amount of error you and your users are willing to accept. Test with new data. Once you’ve trained the network and frozen the architecture and coefficients, use fresh inputs and outputs to verify its accuracy. Don’t count on all results being accurate. That’s just the nature of the beast. And you may have to recommend throwing out the entire network architecture and starting over. Understand the architecture of the network as a part of the testing process. Few if any will be able to actually follow a set of inputs through the network of algorithms, but understanding how the network is constructed will help testers determine if another architecture might produce better results. Communicate the level of confidence you have in the results to management and users. Machine learning systems offer you the unique opportunity to describe confidence in statistical terms, so use them. One important thing to note is that the training data itself could well contain inaccuracies. In this case, because of measurement error, the recorded wind speed and direction could be off or ambiguous. In other cases, the cooling of the filament likely has some error in its measurement.