www.globalbigdataconference.com
Twitter : @bigdataconf
Global Data Science Conference 2017
Fast, Scalable, Reusable:
A New Perspective on
Production ML/AI Systems
by
Ekrem AKSOY, CTO
Global Data Science Conference 2017
Agenda
• AI/ML in the wild business
• Houston! We have a problem (again)
• Top 5 Ideas to Steal
• Summary
Global Data Science Conference 2017
AI/ML in the wild business
Global Data Science Conference 2017
Seriously ???
*Reused with permission of Vladimir Iglovikov
Global Data Science Conference 2017
AI/ML in the wild business
Houston! We have a problem (again)
• Quotation #1:
“Once we have a great model, we used it X times, than it’s
performance felt down. We do not use it anymore…”
Problem is models get corrupted with time and…data.
i.e. AI/ML has not Data Immunity !!!
Global Data Science Conference 2017
Houston! We have a problem (again)
Global Data Science Conference 2017
Usual Software System:
Stays intact after deployment
(i.e. no functional changes w.r.t. data)
Houston! We have a problem (again)
• Quotation #2:
“We have a great team, they’ve built great models, but we need
to process X million rows per (sec, min, …)”
Problem is tech./infra.
used is not enough.
Global Data Science Conference 2017
Houston! We have a problem (again)
Data Infrastructures getting complicated:
- Too many components
- Diverse characteristics of
data
- Configuration Mgmt.
- Re-engineering of models
- …
Global Data Science Conference 2017
Houston! We have a problem (again)
• Quotation #3:
“We invest into XYZ tool, but we can’t use it effectively,
because we need to export manually each time we need it”
Problem is integrability.
Global Data Science Conference 2017
Houston! We have a problem (again)
- Data Inconsistency (format, etc.)
- Non-standard API’s
- Incompatible API’s
- …
Global Data Science Conference 2017
Houston! We have a problem (again)
• Quotation #4:
“Each time it takes X hrs. to produce results, because we
do it manually/it does not scale”
Problem is scalability.
Global Data Science Conference 2017
Houston! We have a problem (again)
- Bootstrap/Cold start issues
- Data hose coupling/de-coupling
- …
Global Data Science Conference 2017
Top 5 Ideas to Steal
• Idea #1: Use basic DevOps cycle
Global Data Science Conference 2017
, but be careful!
Top 5 Ideas to Steal
• Idea #2: De-couple API/Model
Global Data Science Conference 2017
Top 5 Ideas to Steal
• Idea #3: Use schedulers & containers
Global Data Science Conference 2017
Top 5 Ideas to Steal
• Idea #4: Consider Re-writing (or don’t stick to a framework)
Global Data Science Conference 2017
Top 5 Ideas to Steal
• Idea #5: Automatize!
Global Data Science Conference 2017
Summary
1. Production/Business use of AI/ML is different than Academics
or Competition focus.
2. Focus on how to keep models up and running
3. Remember Data Immunity Problem
4. API/Model decoupling is important
5. Adopt best practices (already established, e.g. DevOps)
6. Automatize
Global Data Science Conference 2017
Shameless Self Promotion
If you want to try out, let me know.
ekrem@hiddenslate.com
Global Data Science Conference 2017

Production use of AI/ML Systems

  • 1.
  • 2.
    Fast, Scalable, Reusable: ANew Perspective on Production ML/AI Systems by Ekrem AKSOY, CTO Global Data Science Conference 2017
  • 3.
    Agenda • AI/ML inthe wild business • Houston! We have a problem (again) • Top 5 Ideas to Steal • Summary Global Data Science Conference 2017
  • 4.
    AI/ML in thewild business Global Data Science Conference 2017 Seriously ??? *Reused with permission of Vladimir Iglovikov
  • 5.
    Global Data ScienceConference 2017 AI/ML in the wild business
  • 6.
    Houston! We havea problem (again) • Quotation #1: “Once we have a great model, we used it X times, than it’s performance felt down. We do not use it anymore…” Problem is models get corrupted with time and…data. i.e. AI/ML has not Data Immunity !!! Global Data Science Conference 2017
  • 7.
    Houston! We havea problem (again) Global Data Science Conference 2017 Usual Software System: Stays intact after deployment (i.e. no functional changes w.r.t. data)
  • 8.
    Houston! We havea problem (again) • Quotation #2: “We have a great team, they’ve built great models, but we need to process X million rows per (sec, min, …)” Problem is tech./infra. used is not enough. Global Data Science Conference 2017
  • 9.
    Houston! We havea problem (again) Data Infrastructures getting complicated: - Too many components - Diverse characteristics of data - Configuration Mgmt. - Re-engineering of models - … Global Data Science Conference 2017
  • 10.
    Houston! We havea problem (again) • Quotation #3: “We invest into XYZ tool, but we can’t use it effectively, because we need to export manually each time we need it” Problem is integrability. Global Data Science Conference 2017
  • 11.
    Houston! We havea problem (again) - Data Inconsistency (format, etc.) - Non-standard API’s - Incompatible API’s - … Global Data Science Conference 2017
  • 12.
    Houston! We havea problem (again) • Quotation #4: “Each time it takes X hrs. to produce results, because we do it manually/it does not scale” Problem is scalability. Global Data Science Conference 2017
  • 13.
    Houston! We havea problem (again) - Bootstrap/Cold start issues - Data hose coupling/de-coupling - … Global Data Science Conference 2017
  • 14.
    Top 5 Ideasto Steal • Idea #1: Use basic DevOps cycle Global Data Science Conference 2017 , but be careful!
  • 15.
    Top 5 Ideasto Steal • Idea #2: De-couple API/Model Global Data Science Conference 2017
  • 16.
    Top 5 Ideasto Steal • Idea #3: Use schedulers & containers Global Data Science Conference 2017
  • 17.
    Top 5 Ideasto Steal • Idea #4: Consider Re-writing (or don’t stick to a framework) Global Data Science Conference 2017
  • 18.
    Top 5 Ideasto Steal • Idea #5: Automatize! Global Data Science Conference 2017
  • 19.
    Summary 1. Production/Business useof AI/ML is different than Academics or Competition focus. 2. Focus on how to keep models up and running 3. Remember Data Immunity Problem 4. API/Model decoupling is important 5. Adopt best practices (already established, e.g. DevOps) 6. Automatize Global Data Science Conference 2017
  • 20.
    Shameless Self Promotion Ifyou want to try out, let me know. ekrem@hiddenslate.com Global Data Science Conference 2017