Production ML
@osamakhn
who am I?
Osama Khan
Big Data Engineer @ACLServices
Grad Student @GTComputing
AWS Big Data Specialist+
! Vancouver, BC
" : Java " : C# (via J#) # : Python
$ : Golang, NodeJS % : Scala
Previously: Robot Soccer, Credit Rating, AML, O&G
Portfolio, NLP/Governance, Doctor Triage, Energy
Monitoring, Consulting, Private Equity
Recently: Data/ML Pipeline, Tools & Platforms
what are we going to
talk about?
The goal of this talk is to provide a high level overview
of the data landscape, introduce AWS and run
through an exercise of containerizing an ML service
1) Data Landscape (v2018): Changing ecosystem
and new roles
2) Just Enough AWS: AWS Intro, EC2, et. al
3) Workshop: RESTful ML Service
4) Demos: Athena, Sagemaker, Quicksight, ModelDB,
Heroku ML API, Docker ML API
classic model
classic model
classic model
data science silo
Data Source Data & Feature
Engineering
Adaptation of slide by Ben Lorica
Model
Building
Deploy
Monitor
maturity spectrum
what’s changing(-ed)?
1. Cloud (faas, serverless data pipelines, ml-as-a-service)
2. Consumer demand for ML features/products/applications
3. Targeted Models (we need to manage 20MM models for 10MM users maybe)
4. Localization (ASEAN facial recognition)
5. Security (Adv. ML, Side-channel attacks)
6. Transparency (Bias is a BUG)
7. Many toy sophisticated solutions but conventional, simpler techniques (regression)
still deliver more business value!
8. Monitoring to ensure deployed models are making high quality predictions
9. Need practices to maintain (update or rebuild) models over time
10. and ….
feature engineering, wat?
By @MLpuppy
data science (v2017+)
model: monitoring & maintenance
- What models are being deployed? [Model Inventory]
- Are we seeing deviations from expected performance? [Model Output monitoring]
- Reasons for performance degradation? [Data monitoring]
- Take action on out of ordinary situations
rise of machine learning engineers
intro to aws
https://acloud.guru
relevant technologies & references
https://acloud.guru
Storage Compute ETL Viz
CS349D: Cloud Computing Technology
DEMO: Sagemaker, Athena, Quicksight, RESTful ML
www.productionml.org

Production Machine Learning

  • 1.
  • 2.
    who am I? OsamaKhan Big Data Engineer @ACLServices Grad Student @GTComputing AWS Big Data Specialist+ ! Vancouver, BC " : Java " : C# (via J#) # : Python $ : Golang, NodeJS % : Scala Previously: Robot Soccer, Credit Rating, AML, O&G Portfolio, NLP/Governance, Doctor Triage, Energy Monitoring, Consulting, Private Equity Recently: Data/ML Pipeline, Tools & Platforms
  • 3.
    what are wegoing to talk about? The goal of this talk is to provide a high level overview of the data landscape, introduce AWS and run through an exercise of containerizing an ML service 1) Data Landscape (v2018): Changing ecosystem and new roles 2) Just Enough AWS: AWS Intro, EC2, et. al 3) Workshop: RESTful ML Service 4) Demos: Athena, Sagemaker, Quicksight, ModelDB, Heroku ML API, Docker ML API
  • 4.
  • 5.
  • 6.
  • 7.
    data science silo DataSource Data & Feature Engineering Adaptation of slide by Ben Lorica Model Building Deploy Monitor
  • 8.
  • 9.
    what’s changing(-ed)? 1. Cloud(faas, serverless data pipelines, ml-as-a-service) 2. Consumer demand for ML features/products/applications 3. Targeted Models (we need to manage 20MM models for 10MM users maybe) 4. Localization (ASEAN facial recognition) 5. Security (Adv. ML, Side-channel attacks) 6. Transparency (Bias is a BUG) 7. Many toy sophisticated solutions but conventional, simpler techniques (regression) still deliver more business value! 8. Monitoring to ensure deployed models are making high quality predictions 9. Need practices to maintain (update or rebuild) models over time 10. and ….
  • 10.
  • 11.
  • 12.
    model: monitoring &maintenance - What models are being deployed? [Model Inventory] - Are we seeing deviations from expected performance? [Model Output monitoring] - Reasons for performance degradation? [Data monitoring] - Take action on out of ordinary situations
  • 13.
    rise of machinelearning engineers
  • 14.
  • 15.
    relevant technologies &references https://acloud.guru Storage Compute ETL Viz CS349D: Cloud Computing Technology
  • 16.
    DEMO: Sagemaker, Athena,Quicksight, RESTful ML
  • 17.