SlideShare a Scribd company logo
1 of 40
Download to read offline
Machine Learning
The High Interest Credit Card of Technical Debt
The Market Intelligence Company of the Digital World
$65M
Funding
2007
Founded
6
Offices
300+
Employees
Market Intelligence Company
of the Digital World
The
Learned | Estimated
Machine learning: The high interest credit card of technical debt
(2014)
Hidden technical debt in machine learning systems (2015)
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips,
Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois
Crespo, Dan Dennison
A Few Words About The Papers
Systems engineering papers
About Machine Learning systems
Give a lot of names to a lot of things (which we know is hard)
We found them in 2015 and liked them a lot
A Few Words About The Papers
What is ML and what is Technical Debt?
Sources of Technical Debt in ML systems
Mitigation
Today
Machine Learning
Train Predict
Data
Algorithm
Data
Data
Hyperparameters
Why Machine Learning?
Allows us to convert data to software
We often already have data
Some problems are hard or impossible to
solve otherwise
http://xkcd.com/1425/
A metaphor for the long term costs of moving quickly
Lack of testing, bad modularity, non-redundant systems, etc.
Somewhat similar to fiscal debt
There are good reasons to take it, but it needs to be serviced
Hidden technical debt - a special, evil, variant
Technical Debt
Boundary Erosion
Components, interfaces, all that jazz
Think MVC, microservices
Implicitly assumed in “good” systems
Makes components easy to:
- Test
- Change
- Reason about
- Monitor
Boundaries in Systems Engineering
Entanglement
ML System “Inputs”
Learning
settings
Hyperparams
Data prep
settings
Real world
inputs
?
Other systems
outputs
Issues
Change in distribution of any input influences
all outputs
Adding/Removing a feature changes the
model and output distribution
Any configuration parameter is just as
coupled
Retraining not reproducible
Changing Anything Changes Everything (CACE)
Model parts
Correction Cascades
Output Output Output
We sometimes use output from an existing
model as a feature to get a small correction
Easier than training a new model
Easier than teaching an existing model new
tricks
A
B
C
Correction Cascades
Output Output Output
Improvement
Degradation
Model improvements cause degradation
down the line
Corrections might lead to an “improvement
deadlock”
A
B
C
Outputs of ML systems include:
- Predictions
- Weights and other state
Data is easy to consume
In turn makes it hard to improve model
May create hidden feedback loops
Undeclared Consumers
Data Dependencies
Data Dependencies
Regular system
ComponentInput
Component Output
ComponentInput Output
Data Dependencies
Regular system
ComponentInput
Component Output
ComponentInput Output
Data dependency
Data Dependencies
Regular system
ComponentInput
ML System
Component Output
ComponentInput Output
Input Logs
Weights
Output
ML
Component
Trainer
PredictInput Output
Data dependency
Data Dependencies
Regular system
ComponentInput
ML System
Component Output
ComponentInput Output
Input Logs
Weights
Output
ML
Component
Trainer
PredictInput Output
Data dependency
Features for training can be outputs of other models
IDF tables, Word2Vec embeddings..
Logs, intermediate results, monitoring feeds..
But if they change schema?
Stop being updated?
Disappear?
Unstable Dependencies
Legacy features - Nobody maintains / wants to maintain them
Bundled features - Not sure which ones we need
Correlated features - May mask features with actual causality
Epsilon features - Improve the result by very little
Underutilized Dependencies
Software Issues
ML as Software
Actual machine learning is a lot more than modeling
Configuration
Data
Collection
Feature
Extraction
Data
Verification
Process Management
Resource
Management
Analysis Tools Serving
Infrastructure
Monitoring
Model
Glue code
Software issues
Pipeline jungles
Dead experimental paths
Abstraction Debt
Multiple languages, systems, packages
Need to configure/test/deploy:
- Hyper-parameters
- Schema (including semantics)
- Data dependencies
Hard to understand or visualize what changed
Configuration Debt
Interactions
Experience has shown that the external world is rarely stable
- Word2Vec for “Pokemon”
- Population of Sudan
- Gregorian dates of holidays
Makes monitoring essential.
Makes testing very hard.
Changes in The External World
A model sometimes influences its future training data
This is common in:
- Recommendation systems
- Ad placement
- Systems that affect the physical world
Especially hard if change is gradual and model updates infrequently
Direct Feedback Loops
Often happen when two different systems learn from each other’s
outputs
Classic example is algo-trading
But two independent content generation systems running on the
same page also qualify
Undeclared consumers can be a cause
Hidden Feedback Loops
..But wait
There’s More!
Data Testing
Reproducibility
Process Management
Cultural Debt
More!
Mitigation
How easily can an entirely new algorithmic approach be tested at full
scale?
What is the transitive closure of all data dependencies?
How precisely can the impact of a new change to the system be
measured?
Be Aware of Debt
Does improving one model or signal degrade others?
How quickly can new members of the team be brought up to speed?
Be Aware of Debt
Merge mature models into a single, well defined, well tested system
Prune experimental code paths
Make each feature count
Monitor
Map consumers
Test data
Paying per model
Configuration system - versioned, comprehensive, testable
Data dependency system - versioned, comprehensive, testable
Consolidate mature systems
Reproducibility is awesome
Pay off cultural debt
Paying for Systems
Other Questions?
We Are Hiring!
similarweb.com/corp/jobs

More Related Content

Similar to Machine learning the high interest credit card of technical debt [PWL]

#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...Agile Testing Alliance
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - finalAndrew White
 
Commissioning And Procurement
Commissioning And ProcurementCommissioning And Procurement
Commissioning And Procurementalecfraher
 
Commissioning And Procurement
Commissioning And ProcurementCommissioning And Procurement
Commissioning And Procurementalecfraher
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarImpetus Technologies
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesObeo
 
System Development Life Cycle ( Sdlc )
System Development Life Cycle ( Sdlc )System Development Life Cycle ( Sdlc )
System Development Life Cycle ( Sdlc )Jennifer Wright
 
Introduction to simulation and modeling
Introduction to simulation and modelingIntroduction to simulation and modeling
Introduction to simulation and modelingantim19
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CNseg
 
2019 State of DevOps Report: Database Best Practices for Strong DevOps
2019 State of DevOps Report: Database Best Practices for Strong DevOps2019 State of DevOps Report: Database Best Practices for Strong DevOps
2019 State of DevOps Report: Database Best Practices for Strong DevOpsDevOps.com
 
Mit104 software engineering
Mit104  software engineeringMit104  software engineering
Mit104 software engineeringsmumbahelp
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
Be Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBe Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBrenda Bao
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 

Similar to Machine learning the high interest credit card of technical debt [PWL] (20)

#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
 
Commissioning And Procurement
Commissioning And ProcurementCommissioning And Procurement
Commissioning And Procurement
 
Commissioning And Procurement
Commissioning And ProcurementCommissioning And Procurement
Commissioning And Procurement
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinar
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
 
System Development Life Cycle ( Sdlc )
System Development Life Cycle ( Sdlc )System Development Life Cycle ( Sdlc )
System Development Life Cycle ( Sdlc )
 
Introduction to simulation and modeling
Introduction to simulation and modelingIntroduction to simulation and modeling
Introduction to simulation and modeling
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner
 
2019 State of DevOps Report: Database Best Practices for Strong DevOps
2019 State of DevOps Report: Database Best Practices for Strong DevOps2019 State of DevOps Report: Database Best Practices for Strong DevOps
2019 State of DevOps Report: Database Best Practices for Strong DevOps
 
Mit104 software engineering
Mit104  software engineeringMit104  software engineering
Mit104 software engineering
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
The ZDLC Brief
The ZDLC BriefThe ZDLC Brief
The ZDLC Brief
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Be Agile Rather Than Do Agile
Be Agile Rather Than Do AgileBe Agile Rather Than Do Agile
Be Agile Rather Than Do Agile
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 

Recently uploaded

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Machine learning the high interest credit card of technical debt [PWL]