SlideShare a Scribd company logo
1 of 16
Download to read offline
Managing Machines
Peter Skomoroch - @peteskomoroch
July 18, 2019
Better, Faster Decisions at Scale
• Machine learning drove massive growth at
consumer internet companies over the last
decade
• A wave of AI startups and vertical machine
learning applications are emerging across
other industries
• For many problems, machine learning
makes better, faster, and more repeatable
decisions at scale
AI Products
Automated systems that collect & learn from data to make user facing decisions with ML
Open Source Foundations for AI
Beyond Code: Open Data, Papers
• The mission of Papers With
Code is to create a free and
open resource with machine
learning papers, code and
evaluation tables.
• https://PapersWithCode.com
• https://arXiv.org
• http://commoncrawl.org
• https://voice.mozilla.org
Source: @paperswithcode
Machine Learning is Still Difficult
• Hard to know which models will work
• ML needs lots of labelled training data
• ML is prone to fail in unexpected ways
• Development progress is often episodic
and unpredictable
• Many research & production systems still
held together by duct tape, notebooks
• Input data & dependencies change, so
past model runs are difficult to reproduce
Machine Learning Dev Process
1. Problem definition and framing
2. Model design
3. Data collection, labelling, and cleaning
4. Feature engineering
5. Model training and offline validation
6. Model deployment, monitoring &
online training
Problem
Definition
Model
Design
Data
Collection &
Cleaning
Feature
Engineering
Model
Training
Deployment
& Monitoring
Better Tools for AI Developers
Source: @mrogati
Data Quality & Standardization
• Guide user input when you can
• Use auto suggest fields vs. free text
• Validate user inputs, emails
• Collect user tags, votes, ratings
• Track impressions, queries, clicks
• Sessionize & de-identify logs
• Disambiguate and annotate entities
• GDPR, Compliance, retention
Source: @brookLYNevery1
Training & Model Management
Source: @OpenAI @weights_biases
Machine Learning Testing
• Real world data changes over time
• Model tests and benchmarks get stale
• Maintain a suite of model input test fixtures
• Test for silent failures & output shifts
• Decouple testing of your data & infrastructure
from testing of your ML code
• Train models with sample data and run
through entire ML pipeline
Source: Machine Learning Testing: Survey, Landscapes and Horizons
Monitoring: Data, Features, Models
• Diversity of ever-changing data sources
• Data issues cascade through the ML stack
• Features also change, drift, break over time
• Model accuracy can decay over time
• Need data monitoring to spot statistical
changes in data inputs, anomalies
• Unlike traditional applications, model
performance is non-deterministic
Source: @fiddlerlabs
Troubleshooting ML
Filling Out the AI Dev Stack
Data & Model Versioning
Data & Model Monitoring
Model Training
Model Evaluation
Model Deployment
Visualization & Exploration
IDEs & Collaboration
Data Labelling
Feature Management
Parameter Tuning
Testing
Debugging
Profiling
Explainability
Data Anomaly Detection
We need better tools for:
More Resources
• Redpoint’s ML Workflow Landscape
• Awesome production machine learning
• Berkeley’s Full Stack Deep Learning Bootcamp
• O’Reilly: AI adoption is being fueled by an improved tool ecosystem
• Machine Learning Testing: Survey, Landscapes and Horizons
• What’s your ML Test Score? A rubric for ML production systems
• AllenAI: Writing Code for NLP Research
• Models and microservices should be running on the same continuous delivery stack
• Google’s Rules of Machine Learning: Best Practices for ML Engineering
• AWS Managing Machine Learning Projects

More Related Content

What's hot

How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLPSkyl.ai
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Sri Ambati
 
Watson Analytics - Специалист по обработке данных "в коробке"
Watson Analytics - Специалист по обработке данных "в коробке"Watson Analytics - Специалист по обработке данных "в коробке"
Watson Analytics - Специалист по обработке данных "в коробке"Irina Podlevskikh
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsApplause
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science projectAdam Sroka
 
Why Machine Learning Operations
Why Machine Learning OperationsWhy Machine Learning Operations
Why Machine Learning OperationsPrem Naraindas
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectEugene Mandel
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Product Management 101 for Data and Analytics
Product Management 101 for Data and Analytics Product Management 101 for Data and Analytics
Product Management 101 for Data and Analytics Ravi Padaki
 
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...Sri Ambati
 
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang JungLightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Junglucenerevolution
 
Are you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testBertil Hatt
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Textkernel
 
Data-driven design (UX Antwerp 24/09/19)
Data-driven design (UX Antwerp 24/09/19)Data-driven design (UX Antwerp 24/09/19)
Data-driven design (UX Antwerp 24/09/19)Peter Vermaercke
 
Growing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature SolutionGrowing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature SolutionDavid J Pileggi Jr
 
Big Data, Machine Learning till Business Intelligence
Big Data, Machine Learning till Business IntelligenceBig Data, Machine Learning till Business Intelligence
Big Data, Machine Learning till Business IntelligenceDinesh Kumar
 
Devon_Granillo resume final
Devon_Granillo resume finalDevon_Granillo resume final
Devon_Granillo resume finalDevon Granillo
 
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...Sri Ambati
 

What's hot (20)

How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
 
Watson Analytics - Специалист по обработке данных "в коробке"
Watson Analytics - Специалист по обработке данных "в коробке"Watson Analytics - Специалист по обработке данных "в коробке"
Watson Analytics - Специалист по обработке данных "в коробке"
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI Applications
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science project
 
Why Machine Learning Operations
Why Machine Learning OperationsWhy Machine Learning Operations
Why Machine Learning Operations
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Product Management 101 for Data and Analytics
Product Management 101 for Data and Analytics Product Management 101 for Data and Analytics
Product Management 101 for Data and Analytics
 
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...
Robert Coop, Stanley Black & Decker - Optimizing Manufacturing with Driverles...
 
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang JungLightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
 
Are you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point test
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
 
Data-driven design (UX Antwerp 24/09/19)
Data-driven design (UX Antwerp 24/09/19)Data-driven design (UX Antwerp 24/09/19)
Data-driven design (UX Antwerp 24/09/19)
 
Growing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature SolutionGrowing Your SharePoint Toddler to a Mature Solution
Growing Your SharePoint Toddler to a Mature Solution
 
Big Data Careers
Big Data CareersBig Data Careers
Big Data Careers
 
Big Data, Machine Learning till Business Intelligence
Big Data, Machine Learning till Business IntelligenceBig Data, Machine Learning till Business Intelligence
Big Data, Machine Learning till Business Intelligence
 
Devon_Granillo resume final
Devon_Granillo resume finalDevon_Granillo resume final
Devon_Granillo resume final
 
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...
Get Behind the Wheel with H2O Driverless AI - Hands on Lab - H2O World San Fr...
 

Similar to Managing Machines: The New AI Dev Stack

Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoSri Ambati
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learningMaya Perry
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineersCameron Joannidis
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleDatabricks
 
Artificial Intelligence As a Service
Artificial Intelligence As a ServiceArtificial Intelligence As a Service
Artificial Intelligence As a ServiceJohn Liu
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Sri Ambati
 
Data engineering track module 2
Data engineering track module 2Data engineering track module 2
Data engineering track module 2Pranav Prakash
 
Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Terence Siganakis
 

Similar to Managing Machines: The New AI Dev Stack (20)

Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learning
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Machine learning
Machine learningMachine learning
Machine learning
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Artificial Intelligence As a Service
Artificial Intelligence As a ServiceArtificial Intelligence As a Service
Artificial Intelligence As a Service
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Data engineering track module 2
Data engineering track module 2Data engineering track module 2
Data engineering track module 2
 
Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance
 

More from Peter Skomoroch

Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With DataPeter Skomoroch
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and SearchPeter Skomoroch
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data SciencePeter Skomoroch
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science SummitPeter Skomoroch
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Peter Skomoroch
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 

More from Peter Skomoroch (13)

Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and Search
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data Science
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science Summit
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Managing Machines: The New AI Dev Stack

  • 1. Managing Machines Peter Skomoroch - @peteskomoroch July 18, 2019
  • 2. Better, Faster Decisions at Scale • Machine learning drove massive growth at consumer internet companies over the last decade • A wave of AI startups and vertical machine learning applications are emerging across other industries • For many problems, machine learning makes better, faster, and more repeatable decisions at scale
  • 3. AI Products Automated systems that collect & learn from data to make user facing decisions with ML
  • 5. Beyond Code: Open Data, Papers • The mission of Papers With Code is to create a free and open resource with machine learning papers, code and evaluation tables. • https://PapersWithCode.com • https://arXiv.org • http://commoncrawl.org • https://voice.mozilla.org Source: @paperswithcode
  • 6. Machine Learning is Still Difficult • Hard to know which models will work • ML needs lots of labelled training data • ML is prone to fail in unexpected ways • Development progress is often episodic and unpredictable • Many research & production systems still held together by duct tape, notebooks • Input data & dependencies change, so past model runs are difficult to reproduce
  • 7. Machine Learning Dev Process 1. Problem definition and framing 2. Model design 3. Data collection, labelling, and cleaning 4. Feature engineering 5. Model training and offline validation 6. Model deployment, monitoring & online training Problem Definition Model Design Data Collection & Cleaning Feature Engineering Model Training Deployment & Monitoring
  • 8. Better Tools for AI Developers Source: @mrogati
  • 9. Data Quality & Standardization • Guide user input when you can • Use auto suggest fields vs. free text • Validate user inputs, emails • Collect user tags, votes, ratings • Track impressions, queries, clicks • Sessionize & de-identify logs • Disambiguate and annotate entities • GDPR, Compliance, retention Source: @brookLYNevery1
  • 10. Training & Model Management Source: @OpenAI @weights_biases
  • 11. Machine Learning Testing • Real world data changes over time • Model tests and benchmarks get stale • Maintain a suite of model input test fixtures • Test for silent failures & output shifts • Decouple testing of your data & infrastructure from testing of your ML code • Train models with sample data and run through entire ML pipeline Source: Machine Learning Testing: Survey, Landscapes and Horizons
  • 12. Monitoring: Data, Features, Models • Diversity of ever-changing data sources • Data issues cascade through the ML stack • Features also change, drift, break over time • Model accuracy can decay over time • Need data monitoring to spot statistical changes in data inputs, anomalies • Unlike traditional applications, model performance is non-deterministic Source: @fiddlerlabs
  • 14. Filling Out the AI Dev Stack Data & Model Versioning Data & Model Monitoring Model Training Model Evaluation Model Deployment Visualization & Exploration IDEs & Collaboration Data Labelling Feature Management Parameter Tuning Testing Debugging Profiling Explainability Data Anomaly Detection We need better tools for:
  • 15.
  • 16. More Resources • Redpoint’s ML Workflow Landscape • Awesome production machine learning • Berkeley’s Full Stack Deep Learning Bootcamp • O’Reilly: AI adoption is being fueled by an improved tool ecosystem • Machine Learning Testing: Survey, Landscapes and Horizons • What’s your ML Test Score? A rubric for ML production systems • AllenAI: Writing Code for NLP Research • Models and microservices should be running on the same continuous delivery stack • Google’s Rules of Machine Learning: Best Practices for ML Engineering • AWS Managing Machine Learning Projects