Productionizing Data Science at
A Startup's (ongoing) Journey building Machine Learning Products
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Who am I?
• Matt Mills
• Born and raised in Atlanta
• BS in Industrial and Systems Engineering 2014, MS in Analytics 2015
• @statmills or www.statmills.com
Experience's mobile commerce, ticketing, and data
solutions empower sports and entertainment
leaders to generate new revenue streams, sell more
tickets, and make smarter decisions.
www.expapp.com/solutions
What is Experience?
What is Experience?
What is Experience?
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Predictive
Model
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
• Some considerations
• Minimal use of limited Engineering Resources
• Scalable (speed and processing power)
• Cheap, like, super cheap (read: Free)
• Had to handle data cleansing
Some Potential Solutions
Some Potential Solutions
• Build own R/Python Server
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
• Models exportable as Java Objects
to embed in other apps
• Can embed python pre-processing
scripts within the POJO
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
h2o Architecture
https://github.com/h2oai/h2o-meetups/blob/master/2017_09_12_Dublin/
2017_09_12_H2O_Intro_and_AutoML.pdf
h2o Algorithm List
h2o vs scikit-learn Syntax and Process
https://github.com/h2oai/h2o-meetups/blob/master/2015_05_14_H2O_Overview/H2O_Overview.pdf
Experience Production Pipeline
Experience Production Pipeline
Experience Predictive Modeling Pipeline
1. App Sends Data
2. Data Cleaning in Python 3. Predictions done in h2o
4. App Gets Prediction
input()
sys.stdout.flush()
{JSON}
Benefits of Using Open Source Software
Experience Predictive Modeling Setup
Model Deployment
Code
Pulled via Github
Terraform
to create infrastructure
and manage state
Served
via ECS
Dockerize
via Dockerfile and
stored in ECR
Discovery
via Consul
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Pros and Cons of Current Set-Up
Pros
• Automated process to deploy
models into production
• Can iterate models with no/limited
effort from engineering
Cons
• Can only use algorithms available
to h2o (e.g. no multilevel models,
GAMs, Bayesian)
• h2o drives Python, why not the
other way around?
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
www.statmills.com
? http://docs.h2o.ai/
https://www.expapp.com/about/#careers

Productionizing Data Science at Experience