Data Science in Production: Technologies That Drive Adoption of Data Science Solutions at JW Player

Data Science in Production
How Docker, K8s, and Airﬂow Drive
Adoption of Data Science Solutions at
JW Player
Nir Yungster

For this talk, we’ll cover technical approaches that
can help drive adoption of data science solutions...
But technology is not a remedy for everything
(people, process,…)
Disclaimer!

Applied Data Science Is Driving Innovation
Across Industries

Data Science Requires Three Pieces to Succeed
1. Access to data
2. Eﬀectiveness in research, development of solutions
3. Ability to deliver solutions when and where they’re needed

Part I: The Challenge of Data Science in Production
Part II: The Data Science Platform at JW Player
Part III: Data Science in Production at JW Player
1
2
3
Agenda

The Challenge of Data Science in
Production
Part I

● Model Performance
○ E.g. accuracy, precision, etc
● Production-Level Code
○ Portability
○ Maintainability
○ Scalability
○ Reliability
What Does Production Data Science Mean?
— Ease of deploying across environments
— Testing, monitoring, documentation
— Ability to handle high traﬃc volume
— Service up-time

Solution: Scientists and Engineers Collaborating

Scientists and Engineers Collaborating
I want model
performance! I want model
performance!

I want accuracy,
interpretability,
& validation!!
I want model
performance! I want model
performance!
Scientist Engineer
I want efficiency,
reliability, &
SLEEP!!

Collaboration: The Good, the Bad, and The Ugly
● The Good
○ Positive collaboration
○ Both sides primary goals achieved
● The Bad
○ Models in Limbo
○ Mutant models
● The Ugly
○ Misunderstanding, distrust
○ Barriers between teams

There are tools that can help!
● To make production data science more feasible
● To make Data Science teams more self suﬃcient
● To enable better collaboration across teams

The Data Science Platform at
JW Player
Part II

About JW Player
● Video player + platform
● Headquarters in NYC
● SaaS business
● 15k subscribers, 2M free
● 5% of video plays across the web

● Video Recommendation Engine
Video Publisher Data
Products
● Automated Thumbnail Selection
● Shot/Scene Detection

● Provide R&D for data products
● Centralized team (6 members)
○ Including 2 software developers
● Work with a variety of product and
engineering teams across the
company
Data Science Within JW Player

Key Elements of JW Data Science Infrastructure
Container Service Workﬂow Orchestration Application Orchestration
Scalability, Reliability
Portability Maintainabiilty

Docker is a Container Service
What’s a container?
● A standard wrapper for
tasks & applications so
that they run consistently
across environments

● Applications / tasks can run in any
environment
● Removes friction arising from
development and deployment in
diﬀerent environments
○ Across teams, within teams
Container Portability Reduces Integration Pain
dockerize all the things!

Airflow Orchestrates Workflows
● Workflow consist of a series of tasks
○ E.g. data processing, model training
○ Workflows run on a schedule
● Airflow helps with Maintainability
○ Monitoring & alerting
○ Web interface for investigating logs,
rerunning tasks / entire workflows

● Deploy & manage dockerized
applications that run continuously (e.g.
an API service)
● Built-in Scaling, Reliability, Monitoring
● JW Player maintains an internal
deployment service powered by
Kubernetes
Kubernetes Orchestrates Applications

Kubernetes
Master Node
Worker
Node
Kubernetes Basics
Worker
Node
Worker
Node
Worker
Node
● Application scaling made easy
○ Choose number of replicas
○ Scaling up is a conﬁguration change
1 1
2
2
2
App.yaml
Pod-1:
Replicas: 2
Pod-2:
Replicas: 3
2
● Reliability
○ Master node monitors system
○ Ensures correct number of replicas

Data Science in Production
at JW Player
Part III

Three ﬂavors of production data science
● Backend Microservices
○ Server-side API Running in Kubernetes
● Plugins (aka Frontend microservices)
○ Client-side plugin running alongside the Player
● “Integrations” with engineering
○ Data Science conducts R&D, develops a model
○ Works with Engineering to productionize

Backend Microservice
● What is involved?
○ Deploy model as application on Kubernetes
○ Backend service with API
● When is this approach common?
○ Easiest for a new model
● Beneﬁts
○ Data Science in full control of model, updates
○ Decoupled architecture
○ Clear ownership, boundaries
Backend
Frontend
Microservice

Client-side Plugin
○ Effectively a client-side microservice
○ Written in JavaScript
○ Easiest for a new model
○ If the model is lightweight
● Benefits
○ Decoupled architecture
○ Reduced network traffic, low latency
Backend
Frontend
Plugin

○ Translating / integrating model
○ Requires very close coordination
○ Often involves rewriting model code
○ Often the case when you’re
improving upon an existing product
Integration with Engineering
Backend
Frontend
Model
??
● Possible Pitfalls
○ Tangled web
○ Unclear path to update/iterate

Some Takeaways
● Owning models means more maintenance responsibility
○ Can take away from core DS mission
● Microservices don’t remove need to collaborate with
other teams on models
○ To ensure feature ﬁdelity
○ Ensure proper usage
○ SLAs

● Think about production from the beginning of R&D
● Build intelligent fallbacks to ease reliability concerns
○ When one element of a service fails, allowing for slightly
degraded state (e.g. serving a stale model)
● Build a microservice that you jointly maintain with engineers
● Consider if your next hire should be a software engineer
Some Tips

Acknowledgement
Graham Edge
Nil Timor
Olga Minkina
Rik Heijdens
Rob van Ejik

Data Science in Production: Technologies That Drive Adoption of Data Science Solutions at JW Player

Recommended

Recommended

More Related Content

Similar to Data Science in Production: Technologies That Drive Adoption of Data Science Solutions at JW Player

Similar to Data Science in Production: Technologies That Drive Adoption of Data Science Solutions at JW Player (20)

Recently uploaded

Recently uploaded (20)

Data Science in Production: Technologies That Drive Adoption of Data Science Solutions at JW Player