SlideShare a Scribd company logo
1 of 27
Machine Learning Vital Signs
Metrics and Monitoring of AI in Production
Donald Miner
Miner & Kasch
OSCON
July 16, 2019
We build a model
• Someone builds a model
• They test it
• Everyone is happy
• It works in prod
We build lots of models
Multiple people
build multiple models
to solve multiple problems
Who’s watching all these models?
Which ones are working?
What does working mean?
The world changes slowly
Over time the nature of the world changes
Our models will not work as well
Big things can happen and fundamentally change the world
This may render previous models less useful or worthless
The world changes abruptly
Seasonal and periodic changes happen
This can impact model effectiveness temporarily or permanently
The world changes periodically
Current events can change the world for a small period of time
Model effectiveness (usually for worse) for a short period of time
Weird things happen then go away
They will happen
Can be troublesome to detect in machine learning pipelines
Bugs
Bad people exist
Could they exploit your model or training set to your detriment?
Adversaries
Proposed solution: Metrics & Monitoring
Instrument your models with “vital signs”
Timely catch your model:
• Suddenly breaking
• Drifting into worthlessness
• Doing something strange
Machine Learning Vital Signs
• Some metric from a productionalized model that you can
monitor for change over time
• Have alerts in place that detect:
• An unacceptable amount of drift over time
• A surprise and strange amount of errors in one period
• What is the average of the vital?
• What is the standard deviation of the vital?
• What are acceptable bounds for the vital?
Vital: Accuracy
How often the model is correct or not correct
• Naturally will decrease over time
• Big dips (or jumps) can be indicative of something wrong
• Can mimic how the data was initially labeled
• Automatically labeled as part of the data
• Manually labeled… uh oh
Vital: Accuracy
Vital: Accuracy Per Label
How often the model is correct or not correct, for each
potential output label
• More fine grained than Accuracy
• Can sometimes catch things that Accuracy with large class
imbalance
Vital: Accuracy Per Label
Vital: Model Agreement
How often the previous models, not in production, agree
with the new model
• Some disagreement is natural, but a large number amounts of
disagreement can be indicative of a bug or problem
• Can be an alternative to Accuracy if Accuracy is hard to
measure
Vital: Model Agreement
Vital: Output Distribution
How often each class is predicted or the distribution of
regression output values
• Can catch long-term trends, permanent changes, and seasonal
changes
• Can catch bugs and problems with large swings outside of a
few standard deviations
• Can be an alternative to Accuracy, but hard to tell the difference
between “weird” and “bad”
Vital: Output Distribution
GOOD
MAYBE BAD
Vital: Canaries
Does a test input case predict what we expect?
• Can catch obvious issues if a test case the model should get
right returns a wrong output value
• Good at testing all or nothing problems, but struggles on trends
• Very simple to implement and brutally effective
Vital: Canaries
Vital: Human Complaints
Do humans agree with what the model is doing?
• People love complaining about AI
• Harness that power to give you feedback
• Effective in large-scale applications that interact with humans
• Can double as a continuous data labeling exercise
Metrics and Monitoring Tips
• Figure out which vital signs can be done for each model
• Create log files
• Send the logfiles somewhere
• Make pretty charts
• Build a dashboard
• Watch it
Summary
Track what your models are doing
Watch what your models are doing
Machine Learning Vital Signs
Metrics and Monitoring of AI in Production
Donald Miner
Miner & Kasch
OSCON
July 16, 2019

More Related Content

What's hot

Automation vs. intelligence - "follow me if you want to live"
Automation vs. intelligence - "follow me if you want to live"Automation vs. intelligence - "follow me if you want to live"
Automation vs. intelligence - "follow me if you want to live"Viktor Slavchev
 
User Research @ Bitspiration2013
User Research @ Bitspiration2013User Research @ Bitspiration2013
User Research @ Bitspiration2013BDressler
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers ProductFolks
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersMahmoud Jalajel
 
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...MeetMagentoNY2014
 
Product Experimentation Pitfalls & How to Avoid Them
Product Experimentation Pitfalls & How to Avoid Them Product Experimentation Pitfalls & How to Avoid Them
Product Experimentation Pitfalls & How to Avoid Them Optimizely
 
Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollViktor Slavchev
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systemsPeter Varhol
 
Socialcam - Ammon bartam
Socialcam - Ammon bartamSocialcam - Ammon bartam
Socialcam - Ammon bartamnxtcon
 
Test automation – the bitter truth
Test automation – the bitter truthTest automation – the bitter truth
Test automation – the bitter truthViktor Slavchev
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerProduct School
 
Seriously Advanced A/B Testing by Wyatt Jenkins
Seriously Advanced A/B Testing	by Wyatt JenkinsSeriously Advanced A/B Testing	by Wyatt Jenkins
Seriously Advanced A/B Testing by Wyatt JenkinsLean Startup Co.
 
"Worst" practices of software testing
"Worst" practices of software testing"Worst" practices of software testing
"Worst" practices of software testingViktor Slavchev
 
Exploratory testing part 1
Exploratory testing part 1Exploratory testing part 1
Exploratory testing part 1Dawn Code
 
Building Lean and Agile in the Real World
Building Lean and Agile in the Real WorldBuilding Lean and Agile in the Real World
Building Lean and Agile in the Real WorldKevin Goldsmith
 
Testing for everyone agile yorkshire
Testing for everyone agile yorkshireTesting for everyone agile yorkshire
Testing for everyone agile yorkshireAdy Stokes
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reportsNeil Studd
 

What's hot (20)

Automation vs. intelligence - "follow me if you want to live"
Automation vs. intelligence - "follow me if you want to live"Automation vs. intelligence - "follow me if you want to live"
Automation vs. intelligence - "follow me if you want to live"
 
User Research @ Bitspiration2013
User Research @ Bitspiration2013User Research @ Bitspiration2013
User Research @ Bitspiration2013
 
Doing monitoring right
Doing monitoring rightDoing monitoring right
Doing monitoring right
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
 
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...
Conversion rate optimisation. What's realluy proved to matter? Viacheslav kra...
 
Product Experimentation Pitfalls & How to Avoid Them
Product Experimentation Pitfalls & How to Avoid Them Product Experimentation Pitfalls & How to Avoid Them
Product Experimentation Pitfalls & How to Avoid Them
 
Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing troll
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
Socialcam - Ammon bartam
Socialcam - Ammon bartamSocialcam - Ammon bartam
Socialcam - Ammon bartam
 
Test automation – the bitter truth
Test automation – the bitter truthTest automation – the bitter truth
Test automation – the bitter truth
 
Mark Alston
Mark AlstonMark Alston
Mark Alston
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product Manager
 
Seriously Advanced A/B Testing by Wyatt Jenkins
Seriously Advanced A/B Testing	by Wyatt JenkinsSeriously Advanced A/B Testing	by Wyatt Jenkins
Seriously Advanced A/B Testing by Wyatt Jenkins
 
"Worst" practices of software testing
"Worst" practices of software testing"Worst" practices of software testing
"Worst" practices of software testing
 
Exploratory testing part 1
Exploratory testing part 1Exploratory testing part 1
Exploratory testing part 1
 
Building Lean and Agile in the Real World
Building Lean and Agile in the Real WorldBuilding Lean and Agile in the Real World
Building Lean and Agile in the Real World
 
Testing for everyone agile yorkshire
Testing for everyone agile yorkshireTesting for everyone agile yorkshire
Testing for everyone agile yorkshire
 
Quick & easy problem solving tools
Quick & easy problem solving toolsQuick & easy problem solving tools
Quick & easy problem solving tools
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reports
 

Similar to Machine Learning Vital Signs

Safeabilty: Analyzing the Relationship between Safety and Reliability
Safeabilty: Analyzing the Relationship between Safety and Reliability Safeabilty: Analyzing the Relationship between Safety and Reliability
Safeabilty: Analyzing the Relationship between Safety and Reliability PlantEngineering
 
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & TableauUnlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & TableauQualtrics
 
Adapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesAdapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesPrabhat Sinha
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Future of software development - Danger of Oversimplification
Future of software development - Danger of OversimplificationFuture of software development - Danger of Oversimplification
Future of software development - Danger of OversimplificationJon Ruby
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
Poka yoke error proofing
Poka yoke error proofing Poka yoke error proofing
Poka yoke error proofing ssusercee91d
 
Continuous Quality - Moving Beyond Bug Reports
Continuous Quality - Moving Beyond Bug ReportsContinuous Quality - Moving Beyond Bug Reports
Continuous Quality - Moving Beyond Bug ReportsNeil Studd
 
Peter Shanley, Principal & Evangelist at Neo
Peter Shanley, Principal & Evangelist at Neo Peter Shanley, Principal & Evangelist at Neo
Peter Shanley, Principal & Evangelist at Neo Startup Product
 
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeBrighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeCraig Sullivan
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesIgor Moochnick
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Edmund FOng
 
Usability tests - everything you need to know to start n less than 15 slides
Usability tests - everything you need to know to start n less than 15 slidesUsability tests - everything you need to know to start n less than 15 slides
Usability tests - everything you need to know to start n less than 15 slideszliron
 
Usability testing - everything you need to know to start, in less than 15 slides
Usability testing - everything you need to know to start, in less than 15 slidesUsability testing - everything you need to know to start, in less than 15 slides
Usability testing - everything you need to know to start, in less than 15 slideszliron
 
Leading Change from the Quality team
Leading Change from the Quality teamLeading Change from the Quality team
Leading Change from the Quality teamJohn Ruberto
 

Similar to Machine Learning Vital Signs (20)

Ml2 production
Ml2 productionMl2 production
Ml2 production
 
Safeabilty: Analyzing the Relationship between Safety and Reliability
Safeabilty: Analyzing the Relationship between Safety and Reliability Safeabilty: Analyzing the Relationship between Safety and Reliability
Safeabilty: Analyzing the Relationship between Safety and Reliability
 
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & TableauUnlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
 
Adapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesAdapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored Processes
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Future of software development - Danger of Oversimplification
Future of software development - Danger of OversimplificationFuture of software development - Danger of Oversimplification
Future of software development - Danger of Oversimplification
 
Model validation
Model validationModel validation
Model validation
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Agile Metrics
Agile MetricsAgile Metrics
Agile Metrics
 
Poka yoke error proofing
Poka yoke error proofing Poka yoke error proofing
Poka yoke error proofing
 
Continuous Quality - Moving Beyond Bug Reports
Continuous Quality - Moving Beyond Bug ReportsContinuous Quality - Moving Beyond Bug Reports
Continuous Quality - Moving Beyond Bug Reports
 
Peter Shanley, Principal & Evangelist at Neo
Peter Shanley, Principal & Evangelist at Neo Peter Shanley, Principal & Evangelist at Neo
Peter Shanley, Principal & Evangelist at Neo
 
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeBrighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best Practices
 
Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2Ericriesleanstartuppresentationforweb2
Ericriesleanstartuppresentationforweb2
 
QA is Broken, Fix it!
QA is Broken, Fix it!QA is Broken, Fix it!
QA is Broken, Fix it!
 
DevOps Year One
DevOps Year OneDevOps Year One
DevOps Year One
 
Usability tests - everything you need to know to start n less than 15 slides
Usability tests - everything you need to know to start n less than 15 slidesUsability tests - everything you need to know to start n less than 15 slides
Usability tests - everything you need to know to start n less than 15 slides
 
Usability testing - everything you need to know to start, in less than 15 slides
Usability testing - everything you need to know to start, in less than 15 slidesUsability testing - everything you need to know to start, in less than 15 slides
Usability testing - everything you need to know to start, in less than 15 slides
 
Leading Change from the Quality team
Leading Change from the Quality teamLeading Change from the Quality team
Leading Change from the Quality team
 

More from Donald Miner

10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MDDonald Miner
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Survey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing DataSurvey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing DataDonald Miner
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to AccumuloDonald Miner
 
Data, The New Currency
Data, The New CurrencyData, The New Currency
Data, The New CurrencyDonald Miner
 
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest Donald Miner
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data ScienceDonald Miner
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design PatternsDonald Miner
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and HadoopDonald Miner
 

More from Donald Miner (11)

10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MD
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Survey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing DataSurvey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing Data
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
 
SQL on Accumulo
SQL on AccumuloSQL on Accumulo
SQL on Accumulo
 
Data, The New Currency
Data, The New CurrencyData, The New Currency
Data, The New Currency
 
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Machine Learning Vital Signs

  • 1. Machine Learning Vital Signs Metrics and Monitoring of AI in Production Donald Miner Miner & Kasch OSCON July 16, 2019
  • 2. We build a model • Someone builds a model • They test it • Everyone is happy • It works in prod
  • 3. We build lots of models Multiple people build multiple models to solve multiple problems
  • 4. Who’s watching all these models?
  • 5. Which ones are working? What does working mean?
  • 6. The world changes slowly Over time the nature of the world changes Our models will not work as well
  • 7. Big things can happen and fundamentally change the world This may render previous models less useful or worthless The world changes abruptly
  • 8. Seasonal and periodic changes happen This can impact model effectiveness temporarily or permanently The world changes periodically
  • 9. Current events can change the world for a small period of time Model effectiveness (usually for worse) for a short period of time Weird things happen then go away
  • 10. They will happen Can be troublesome to detect in machine learning pipelines Bugs
  • 11. Bad people exist Could they exploit your model or training set to your detriment? Adversaries
  • 12. Proposed solution: Metrics & Monitoring Instrument your models with “vital signs” Timely catch your model: • Suddenly breaking • Drifting into worthlessness • Doing something strange
  • 13. Machine Learning Vital Signs • Some metric from a productionalized model that you can monitor for change over time • Have alerts in place that detect: • An unacceptable amount of drift over time • A surprise and strange amount of errors in one period • What is the average of the vital? • What is the standard deviation of the vital? • What are acceptable bounds for the vital?
  • 14. Vital: Accuracy How often the model is correct or not correct • Naturally will decrease over time • Big dips (or jumps) can be indicative of something wrong • Can mimic how the data was initially labeled • Automatically labeled as part of the data • Manually labeled… uh oh
  • 16. Vital: Accuracy Per Label How often the model is correct or not correct, for each potential output label • More fine grained than Accuracy • Can sometimes catch things that Accuracy with large class imbalance
  • 18. Vital: Model Agreement How often the previous models, not in production, agree with the new model • Some disagreement is natural, but a large number amounts of disagreement can be indicative of a bug or problem • Can be an alternative to Accuracy if Accuracy is hard to measure
  • 20. Vital: Output Distribution How often each class is predicted or the distribution of regression output values • Can catch long-term trends, permanent changes, and seasonal changes • Can catch bugs and problems with large swings outside of a few standard deviations • Can be an alternative to Accuracy, but hard to tell the difference between “weird” and “bad”
  • 22. Vital: Canaries Does a test input case predict what we expect? • Can catch obvious issues if a test case the model should get right returns a wrong output value • Good at testing all or nothing problems, but struggles on trends • Very simple to implement and brutally effective
  • 24. Vital: Human Complaints Do humans agree with what the model is doing? • People love complaining about AI • Harness that power to give you feedback • Effective in large-scale applications that interact with humans • Can double as a continuous data labeling exercise
  • 25. Metrics and Monitoring Tips • Figure out which vital signs can be done for each model • Create log files • Send the logfiles somewhere • Make pretty charts • Build a dashboard • Watch it
  • 26. Summary Track what your models are doing Watch what your models are doing
  • 27. Machine Learning Vital Signs Metrics and Monitoring of AI in Production Donald Miner Miner & Kasch OSCON July 16, 2019

Editor's Notes

  1. Fashion trends Real estate market prices
  2. Brexit Change of power in parliament or presidential power
  3. Thanksgiving Security models
  4. Carwash – snow, rain, etc.. “Fatigue” around these things
  5. Pdf extraction tool changed behavior; deep learning models adhered to this bias; update made everything go to shit