SlideShare a Scribd company logo
Machine Learning in Production
Manu Mukerji
What is this talk about?
 Agenda:
- Introduction to the business problem
- Normal ML Flow
- Training Data
- Test Data
- Model Creation
- Testing
- How it ties together: The production flow
- Team Setup
- ML Production Examples
- Questions?
- Live Demo!
Typical ML Example
 This is the ”hello world” equivalent of what you find online:
Generalized ML Flow
1. Gather Data
2. Train Model
3. Test for accuracy (most examples end here)
4. Save model for external consumption
5. Use saved model for prediction
Our use case
 Categorization of products
 Categorization of products into ~4800 categories
 Categorization of 6B products into ~4800 categories across 30 countries
 Categorization of 6B products into ~4800 categories across 30 countries every day!
Training Data
Data is the asset!
Training data …lack of training data
Gathering Training Data
Annotation UI
HDFSElasticsearch
• Annotate at category level (Less than 10K/country)
• Try and map customer category to Google Category
• Expand training data and infer GCAT from mapping (this brings
it into millions of products)
• What else did we try:
• Mechanical Turk
• External Companies that provide data
More About Training Data
 Bad data, bad predictions
 Overfitting: if you have a hammer, everything looks like a nail
 When to retrain?
 When to add more data?
Test Set
 Test set selection
- Normal method: split 10-20% from training data
- Production method: custom test set based on business value
 Scoring of test set
- Normal method correct/total
- Problem that can occur with test set scoring
- Advanced version: negative points for negative customer value
Manual Overrides… Why?
 Prediction will never be 100% accurate
 When its wrong it impacts business
Testing your model
 Training takes time, this is why your test set is really important
 Automate the build pipeline to run evaluation and deploy your model only if its better than
existing one
 Canary test the whole pipeline
 Advanced Resource:
- Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765
Scaling Out
 Now we need to do this in 30 countries
 Easier with Latin alphabet languages
 Deep learning to the rescue: this isn't just a cool thing that folks are talking about
4.7 英寸 Retina HD 示器。一款采用显 64 位 面 架 的桌 级 构 A8 芯片。 焦点像素带
的 8MP iSight 相机。触摸 ID ...
4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip.
8MP iSight with focus pixel. Touch ID
Accuracy over time
 70% accuracy ….1 month
 80% accuracy ….3-6 months
 ~90% accuracy …. 1 year
 The last mile……. Rest of your life!
Team setup for ML
 Team setup
- UI/API Team
- ML/AI Engineers
- ML/AI Research
Knowing when to stop?
 This is hard!
 Get something out!
 Don’t work in a vacuum
 Get the circular data flow working
 Remember business value, don’t over engineer it
How it all ties in together
Examples of interesting AI in production
 Self driving cars
- Some Components:
- Lots of sensors, cameras
- Object detection, and distinguishing what can move vs not
- Lane detection
- Red light vs green
Self Driving..continued
 If I trained a self driving car model with just 50hrs of data would you trust it?
 Probably not…
 According to the DMV in order to get a license: “Have completed 50 hours of practice with
an adult 25 of age years or older.”1
 50 Hours at 60 MPH is 3000 miles…
 Would you trust 1,300,000,000 miles?
1
https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit
2
https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data
The future
Questions?
Live Demo: Help me with training data
 You remember how I said Training data is hard to get!
Thank You
 TensorFlow
 Pandas, Scikit-learn
 Andrew Ng!

More Related Content

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Amazon Web Services Korea
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
 
Webinar - AI Powered Recommendation Engine for Businesses
Webinar - AI Powered Recommendation Engine for BusinessesWebinar - AI Powered Recommendation Engine for Businesses
Webinar - AI Powered Recommendation Engine for Businesses
JK Tech
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
AyeshaSharma29
 
CRO analytics - How to Continually Optimise
CRO analytics - How to Continually OptimiseCRO analytics - How to Continually Optimise
CRO analytics - How to Continually Optimise
Phil Pearce
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
QuantUniversity
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
Guido X Jansen
 
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
Infoshare
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
ISPMAIndia
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Brian Brazil
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon Web Services
 
L'évolution du métier du DAF induite par la transformation digitale
L'évolution du métier du DAF induite par la transformation digitale L'évolution du métier du DAF induite par la transformation digitale
L'évolution du métier du DAF induite par la transformation digitale
Microsoft Ideas
 
Service industry metrics
Service industry metricsService industry metrics
Service industry metrics
Dan Wilson
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
Bruno Capuano
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
Bruno Capuano
 
Serverless Machine Learning - Hanoi Google Next 2019
Serverless Machine Learning - Hanoi Google Next 2019Serverless Machine Learning - Hanoi Google Next 2019
Serverless Machine Learning - Hanoi Google Next 2019
Vũ Đào
 
Xpanse-Manufacturing-2023.pdf
Xpanse-Manufacturing-2023.pdfXpanse-Manufacturing-2023.pdf
Xpanse-Manufacturing-2023.pdf
NiallWalsh25
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
Marco Zamana
 

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018 (20)

Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Webinar - AI Powered Recommendation Engine for Businesses
Webinar - AI Powered Recommendation Engine for BusinessesWebinar - AI Powered Recommendation Engine for Businesses
Webinar - AI Powered Recommendation Engine for Businesses
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
 
CRO analytics - How to Continually Optimise
CRO analytics - How to Continually OptimiseCRO analytics - How to Continually Optimise
CRO analytics - How to Continually Optimise
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
L'évolution du métier du DAF induite par la transformation digitale
L'évolution du métier du DAF induite par la transformation digitale L'évolution du métier du DAF induite par la transformation digitale
L'évolution du métier du DAF induite par la transformation digitale
 
Service industry metrics
Service industry metricsService industry metrics
Service industry metrics
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
 
Serverless Machine Learning - Hanoi Google Next 2019
Serverless Machine Learning - Hanoi Google Next 2019Serverless Machine Learning - Hanoi Google Next 2019
Serverless Machine Learning - Hanoi Google Next 2019
 
Xpanse-Manufacturing-2023.pdf
Xpanse-Manufacturing-2023.pdfXpanse-Manufacturing-2023.pdf
Xpanse-Manufacturing-2023.pdf
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
 

Recently uploaded

TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 

Recently uploaded (20)

TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

  • 1. Machine Learning in Production Manu Mukerji
  • 2. What is this talk about?  Agenda: - Introduction to the business problem - Normal ML Flow - Training Data - Test Data - Model Creation - Testing - How it ties together: The production flow - Team Setup - ML Production Examples - Questions? - Live Demo!
  • 3. Typical ML Example  This is the ”hello world” equivalent of what you find online:
  • 4. Generalized ML Flow 1. Gather Data 2. Train Model 3. Test for accuracy (most examples end here) 4. Save model for external consumption 5. Use saved model for prediction
  • 5. Our use case  Categorization of products  Categorization of products into ~4800 categories  Categorization of 6B products into ~4800 categories across 30 countries  Categorization of 6B products into ~4800 categories across 30 countries every day!
  • 7. Training data …lack of training data
  • 8. Gathering Training Data Annotation UI HDFSElasticsearch • Annotate at category level (Less than 10K/country) • Try and map customer category to Google Category • Expand training data and infer GCAT from mapping (this brings it into millions of products) • What else did we try: • Mechanical Turk • External Companies that provide data
  • 9. More About Training Data  Bad data, bad predictions  Overfitting: if you have a hammer, everything looks like a nail  When to retrain?  When to add more data?
  • 10. Test Set  Test set selection - Normal method: split 10-20% from training data - Production method: custom test set based on business value  Scoring of test set - Normal method correct/total - Problem that can occur with test set scoring - Advanced version: negative points for negative customer value
  • 11. Manual Overrides… Why?  Prediction will never be 100% accurate  When its wrong it impacts business
  • 12. Testing your model  Training takes time, this is why your test set is really important  Automate the build pipeline to run evaluation and deploy your model only if its better than existing one  Canary test the whole pipeline  Advanced Resource: - Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765
  • 13. Scaling Out  Now we need to do this in 30 countries  Easier with Latin alphabet languages  Deep learning to the rescue: this isn't just a cool thing that folks are talking about 4.7 英寸 Retina HD 示器。一款采用显 64 位 面 架 的桌 级 构 A8 芯片。 焦点像素带 的 8MP iSight 相机。触摸 ID ... 4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip. 8MP iSight with focus pixel. Touch ID
  • 14. Accuracy over time  70% accuracy ….1 month  80% accuracy ….3-6 months  ~90% accuracy …. 1 year  The last mile……. Rest of your life!
  • 15. Team setup for ML  Team setup - UI/API Team - ML/AI Engineers - ML/AI Research
  • 16. Knowing when to stop?  This is hard!  Get something out!  Don’t work in a vacuum  Get the circular data flow working  Remember business value, don’t over engineer it
  • 17. How it all ties in together
  • 18. Examples of interesting AI in production  Self driving cars - Some Components: - Lots of sensors, cameras - Object detection, and distinguishing what can move vs not - Lane detection - Red light vs green
  • 19. Self Driving..continued  If I trained a self driving car model with just 50hrs of data would you trust it?  Probably not…  According to the DMV in order to get a license: “Have completed 50 hours of practice with an adult 25 of age years or older.”1  50 Hours at 60 MPH is 3000 miles…  Would you trust 1,300,000,000 miles? 1 https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit 2 https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data
  • 20.
  • 23. Live Demo: Help me with training data  You remember how I said Training data is hard to get!
  • 24. Thank You  TensorFlow  Pandas, Scikit-learn  Andrew Ng!

Editor's Notes

  1. About me
  2. This is a typical ML tutorial you can find online… there is no problem with this per say but this isn’t what happens in production, my goal in this talk is to bridge that gap between ML in production and ML on papers
  3. Data is the asset! In the world of online advertising we have a saying, “If you are not buying a product online, you are the product being sold” and no where is this more true than in Machine Learning… Alexa is a good example of this… apple had a head start on the personal assistant with Siri and Amazon was able to come in and break that monopoly… and now products say Works with Alexa not works with Siri Now imagine you have a great idea about a text to speech engine using deep learning, your algorithm is way better than Amazons but without that data your model will never perform as well..
  4. These are the kind of labels we need, this is hard to get correct, for example if you see a photo of a phone case how do you know if case is its own category or its under phone accessories
  5. When a user would come in they would see 50 random products from the category and we would ask to confirm if everything looks fine.. We would do this 3times per category If its not correct they can remove a product or change a category
  6. Bad data bad predictions: wrong categories from customers would mess up category mappings Overfitting: frys example.. Hard drive vs laptop When to retrain: seasonality example sweaters When to add more data: category coverage is more important than total volume, if we can lable all 6b products we wouldn’t need ML… if you realize that you are not getting good results in a particular category you can try adding more data, but this doesn’t always work and its expensive
  7. Test set selection: normal is a random sample, what we did was we based it on revenue.. You still want to get a good mix but have more products in that group.. Scoring of test set: example with earthquake warning app…4/5 points Advanced version: apple fruit vs apple products apple wont care… sansung/apple but if you categorize an galaxy s8 as an iphone they will care
  8. Click to example: buying soup
  9. Training: takes a long time, if you are using DL it can take days, the code doesn’t break or complain you just see poor results Automate the pipeline Canary test.. In our case we made a fake retailer with know products and some overrides and make sure that when we run those we get the expected results Advance resource: I wont get into it here, but Chase talks about some very good ways of adding unit tests..
  10. I don’t reallly have data points for these numbers… Walking is an example Driving is an example..
  11. Front end work, annotation UI, elastic, getting data in and out, ML eng: working on model pipeline, scale issues, data engineering Research: feature engineering, trying new weights etc, new types of models, new papers… image learning is an example.. Pizza box team … with a side of salad.. The important distinction is their work doesn’t fit into a sprint…
  12. Get something out Don’t work in a vacuum: researchers working on same dataset for years.. Iterrate after.. Get the circular data flow working
  13. Self driving is getting a lot of press… its overhyped on one end and on the other end people are scared of it.. This is my attempt to generalize it..
  14. Child Walking Experience is training your neural net with data Matrix helicopter example