Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Real-Time Deployment
Cloud Infrastructure
Online Updates
Testing and CI/CD
ML Data Management
Introduction
Recommender Systems
Items DB
Users DB
What should I watch?
I need this
user tastes
I need all
the items
Recommender Systems
Items DB
Model
Predictions
Users DB
What should I watch?
I need this
user tastes
I need all
the items
Recommender Systems
Items DB
Model
Predictions
Users DB
Top K
What should I watch?
Star Wars!
I need this
user tastes
I need all
the items
Recommender Systems – Previous Meetups
(1) With started with the dataset.
(2) Then we trained models.
(3) We selected the best one.
(4) And now, we want to deploy the model to production!
Recommender Systems Datasets – Previous Meetups
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Explicit feedback Implicit feedback
Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses
Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases
Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
Recommender Systems Datasets – This Meetup
Abstract User/Item Interactions
user-id item-id value
1234 4321 5.0
1234 654 2.0
456 2432 3.5
456 654 1.0
456 432 4.5
987 12 5.0
Recommender Systems Models – Previous Meetups
Example of model we want to deploy
Recommender Systems Models – Previous Meetups
users embeddings
items embeddings
model weights
Example of model we want to deploy
Recommender Systems Models – This Meetup
User Embeddings Item Embeddings
Model Weights
Alice Star Wars
4.5/5
Software vs ML
Traditional Software Machine Learning Software
Stateless Stateful
Explicit Specifications No Specifications
Rule-based Logic from Code Model-based Logic from Data
Cloud Infrastructure
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Frequent item updates and new items
New items are added continuously
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Frequent item updates and new items
New items are added continuously
Non-trivial model
Can’t be built in e.g. SQL or ElasticSearch
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
(Adding new items in real time may be supported though)
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
(Adding new items in real time may be supported though)
Recommendations Are Not Stored
You do not need to save recommendations
vs vs vs
Cloud Infrastructure – Plan
Users Embeddings Storage
Items Embeddings Storage
Model Weights Storage
Users Embeddings Storage
User Embeddings
Binary blob <1KB
Keep changing
Need only one per user request
Key-Value Store
Fetch the user embeddings from the cloud at each request
Atomic key-value store like redis will handle concurrency for free
Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data
Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data
(1B items * 1KB each → 1TB total data)
Network Issue – Solution
Items DB
Model
Predictions
Users DB
Top K
Candidates
Generation
What should I
watch?
Star Wars!
Solution!
Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike
Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike (WIP without efficient Top-K [issue#42326])
Items Embeddings – The Bigger Issue
Items Embeddings – The Bigger Issue
Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution
Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution (DevOps Nightmares!)
🎃🧛🕷 You need to keep items embeddings in memory of your processes 🕷🧛🎃
Items Embeddings Storage
Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues
Items Embeddings Storage
Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
>1B items → big data storage (AWS RedShift, Google BigQuery, Hadoop/HDFS), updating will be “interesting”
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues
>1B items → cannot load everything at init, and require cache-like mechanisms
Model Weights Storage
Cloud Storage
Models weights are typically <1MB
Models weights are not updated often
Static file storage works well
In-Memory Replica
Small enough, any strategy will work (duplication, copy-on-write, shared-memory, …)
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Weights
in S3
Items kNN
DB
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch?
7
Star Wars!
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Online Updates
Online Updates – Plan
Users Embeddings Update
Items Embeddings Update
Model Weights Update
User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
Technical Issue
To update the user embeddings you usually need all users interactions
Some user may have interacted with thousands of items (e.g. listening history on Spotify)
User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks
User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks
(hello “Discover Weekly” 👋)
User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub
User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub
Sharing Memory
Executing the user update requires to load all item embeddings in memory again
● Either you do not share memory, and do not update too often
● Or you do the update in your main workers
● Or you background workers are forks or your main workers (e.g. uwsgi spooling)
Item Updates
New Items (Cold Start)
New items don’t have any user interactions
Collaborative-Filtering models do not support adding new items
Content-Based models do support adding new items
Update Items from New Interactions
SVD-based Matrix Factorization supports restricted forms of online-update
Otherwise heuristics might work, such as single-step of gradient
Benefits typically do not worth the DevOps troubles
Item Updates
Update Cloud Storage
Typically replace the entire file, or use rsync for smarter in-place updates
Update In-Memory Replica
Not possible with “copy-on-write sharing” → re-deploy all your workers
Do-able with shared memory, but small benefits might not worth the DevOps troubles
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy
(hello again “Discover Weekly” 👋)
Testing and CI/CD
Testing and CI/CD – Plan
Unit-Testing and Advanced Tests
Versioning
Orchestration
Unit-Tests for ML
Difference with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
Unit-Tests for ML
Difference with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
Small Specific Unit-Tests
Make unit-tests absurdly easy to pass. If they fail, you must be sure you have a bug
Test your code, not the generalization ability of your model:
● well known algorithm? → there is maths and proofs
● heuristic? → expected to fail, this shouldn’t make CI fail
E.g. test that your model can successfully overfit train data when you remove all regularization
Advanced Tests for ML
Goal
Test generalization ability of your models
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
Compare Against Baselines
Hard to find good performance thresholds for the tests to pass
Easy to test that your model performs better than simple baselines
Example: Model Test
def test_model(self):
""" test that model performs better than constant baseline """
# generate sample data
dataset = self._sample_synthetic_dataset()
# train/valid split
train_data, valid_data = self._trainvalid_split(dataset)
# fit model
model = self._get_model()
model.fit(train_data)
model_valid_cost = self._get_valid_scores(model, valid_data)
# compare to dummy baseline model
cst_model = ConstantByUserModel()
cst_model.fit(train_data)
cst_model_valid_cost = self._get_valid_scores(cst_model, valid_data)
# test our model is better
assert model_valid_cost < cst_model_valid_cost
Versioning – Databases vs ML Data Objects
Database ML Data Objects
Persistent Ephemeral
Slow incremental updates Frequent drastic changes from scratch
Foreign keys and locks to prevent
corrupted data
Objects dependencies not expressed
programmatically
Versioning
ML Objects Identifiers
ML objects keep changing and depend on each other (e.g. datasets, models weights, items weights)
Store unique identifiers for all ML objects (data hash or unique id)
Keep track of identifiers of dependencies to prevent corrupted data
Versioning
Version ML objects to keep track of architecture changes
Match code version against data version
e.g. weights for 3-layers n-net must never be used in the code for 2-layers architecture
Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
Dedicated Workflow Orchestrator and Jobs DAGs
When you have dozens of ML jobs that depend on each other, you will end up doing mistakes:
● making pipeline fail
● or worst: deploying corrupted data
Workflow Management System: Airflow, Luigi, Oozie, ...
WMS tutorial: http://bit.ly/2MUM0r9
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication
● Microservices for RecSys require lots of memory and compute
→ favor a few big nodes more than many small ones
Micro-Services and Kubernetes
Model Split-Brain Problem
● No-downtime updates require blue/green or canary deployment
● Your architecture has to serve two models at once
● And you should not expect updating both the code and the data at the same time
Micro-Services and Kubernetes
items v1.0
model v1.0
Model
v1.0
users v1.0
redis S3
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Update step 1/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Model
v2.0
Update step 2/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 3/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 4/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 5/6
Micro-Services and Kubernetes
items v2.0
model v2.0
users v2.0
redis S3
Model
v2.0
Update step 6/6
ML Data Management
ML Data Management – Plan
ML Data File Format
Cloud Storage
Tools
ML Data File Format
Cross-language
HDFS has great integration in python/pandas
Protobuf has great integration in Go, but very slow in python (as of today)
In Python: pickle+gzip
Can pickle numpy array efficiently
Only uses standard library
Faster to read/write than HDFS
Alternative: high level library like sklearn’s joblib
Cloud Storage – Experiments Results
NoSQL ML Experiments Management Comparison
Comparison with code samples: http://bit.ly/2JtZg3M
Storage Language Scaling UI
MlFlow file-based Python, R, Java Small internal
DVC file-based Shell Small no
Sacred MongoDB Python Large external
TensorBoard file-based TF, PyTorch Medium great
Cloud Storage – Large ML Files
Static Storage for Saved Models and Datasets
File-based static storage
● either cloud (AWS S3, Google Storage, etc.)
● or NAS virtual file system
Limited metadata support (e.g. tags)
Requires strict file naming conventions
Tools
Not much 🤷‍♂
In Python
● Use numpy structured arrays http://bit.ly/2BP1hU3
● With Intel CPU, use intel-numpy-mkl https://dockr.ly/321u2Yp
● Use scipy sparse matrices http://bit.ly/36bJwMt
● Pandas
(but may unnecessary slow vs pure numpy)
● Stay tuned! github.com/Crossing-Minds
Real-Time Deployment – Conclusion
● Replicate items embeddings and model weights in memory
● Re-deploy to update (read-only for easy memory sharing)
● User embeddings can be managed in a cloud key-value store
● Proper identification, versioning, and naming convention for ML data files
● Manual workflow, then managed by DAGs of jobs
Real-Time Deployment – Conclusion
Thanks!

Recommender Systems from A to Z – Real-Time Deployment

  • 2.
    Recommender Systems fromA to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3.
    Recommender Systems fromA to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4.
    Real-Time Deployment Cloud Infrastructure OnlineUpdates Testing and CI/CD ML Data Management
  • 5.
  • 6.
    Recommender Systems Items DB UsersDB What should I watch? I need this user tastes I need all the items
  • 7.
    Recommender Systems Items DB Model Predictions UsersDB What should I watch? I need this user tastes I need all the items
  • 8.
    Recommender Systems Items DB Model Predictions UsersDB Top K What should I watch? Star Wars! I need this user tastes I need all the items
  • 9.
    Recommender Systems –Previous Meetups (1) With started with the dataset. (2) Then we trained models. (3) We selected the best one. (4) And now, we want to deploy the model to production!
  • 10.
    Recommender Systems Datasets– Previous Meetups Explicit feedback (users’ ratings) Implicit feedback (users’ clicks) Explicit feedback Implicit feedback Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
  • 11.
    Recommender Systems Datasets– This Meetup Abstract User/Item Interactions user-id item-id value 1234 4321 5.0 1234 654 2.0 456 2432 3.5 456 654 1.0 456 432 4.5 987 12 5.0
  • 12.
    Recommender Systems Models– Previous Meetups Example of model we want to deploy
  • 13.
    Recommender Systems Models– Previous Meetups users embeddings items embeddings model weights Example of model we want to deploy
  • 14.
    Recommender Systems Models– This Meetup User Embeddings Item Embeddings Model Weights Alice Star Wars 4.5/5
  • 15.
    Software vs ML TraditionalSoftware Machine Learning Software Stateless Stateful Explicit Specifications No Specifications Rule-based Logic from Code Model-based Logic from Data
  • 16.
  • 17.
    Key Objectives Highly scalable,highly available Large number of users and items No downtime
  • 18.
    Key Objectives Highly scalable,highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch)
  • 19.
    Key Objectives Highly scalable,highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch) Frequent item updates and new items New items are added continuously
  • 20.
    Key Objectives Highly scalable,highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch) Frequent item updates and new items New items are added continuously Non-trivial model Can’t be built in e.g. SQL or ElasticSearch
  • 21.
    Assumptions Not Too Big Canbe trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items
  • 22.
    Assumptions Not Too Big Canbe trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items vs vs vs
  • 23.
    Assumptions Not Too Big Canbe trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live vs vs vs
  • 24.
    Assumptions Not Too Big Canbe trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live (Adding new items in real time may be supported though) vs vs vs
  • 25.
    Assumptions Not Too Big Canbe trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live (Adding new items in real time may be supported though) Recommendations Are Not Stored You do not need to save recommendations vs vs vs
  • 26.
    Cloud Infrastructure –Plan Users Embeddings Storage Items Embeddings Storage Model Weights Storage
  • 27.
    Users Embeddings Storage UserEmbeddings Binary blob <1KB Keep changing Need only one per user request Key-Value Store Fetch the user embeddings from the cloud at each request Atomic key-value store like redis will handle concurrency for free
  • 28.
    Items Embeddings –The Big Issue Network Problem If the items data is stored in a database like SQL, and the model is too complex to be expressed in SQL: then you need to fetch 100% of the items data from the DB to your compute instance ...at each request! Rule of Thumb 1M items * 1KB each → 1GB total data
  • 29.
    Items Embeddings –The Big Issue Network Problem If the items data is stored in a database like SQL, and the model is too complex to be expressed in SQL: then you need to fetch 100% of the items data from the DB to your compute instance ...at each request! Rule of Thumb 1M items * 1KB each → 1GB total data (1B items * 1KB each → 1TB total data)
  • 30.
    Network Issue –Solution Items DB Model Predictions Users DB Top K Candidates Generation What should I watch? Star Wars! Solution!
  • 31.
    Items Candidate Generation Goal:pre-select thousand(s) of items for your model, without need to see all embeddings Model-Free E.g. kNN item-item with pre-computed kNN tables Easy with ML-ready DB like Spark Do-able in ElasticSearch or even SQL Model-Based E.g. linear matrix factorization, then smart implementation of Top-K Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike
  • 32.
    Items Candidate Generation Goal:pre-select thousand(s) of items for your model, without need to see all embeddings Model-Free E.g. kNN item-item with pre-computed kNN tables Easy with ML-ready DB like Spark Do-able in ElasticSearch or even SQL Model-Based E.g. linear matrix factorization, then smart implementation of Top-K Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike (WIP without efficient Top-K [issue#42326])
  • 33.
    Items Embeddings –The Bigger Issue
  • 34.
    Items Embeddings –The Bigger Issue
  • 35.
    Throughput Problem andSolution Throughput Problem You can’t get thousands of arbitrary items embeddings at each request The physical limit is DB CPU, not network speed Hosting a local redis co-located in the same physical machine as your process does not help Throughput Solution
  • 36.
    Throughput Problem andSolution Throughput Problem You can’t get thousands of arbitrary items embeddings at each request The physical limit is DB CPU, not network speed Hosting a local redis co-located in the same physical machine as your process does not help Throughput Solution (DevOps Nightmares!) 🎃🧛🕷 You need to keep items embeddings in memory of your processes 🕷🧛🎃
  • 37.
    Items Embeddings Storage CloudStorage Read once when spawning your processes Fully updated every time you deploy a new model <1M items → static file storage works well (e.g. AWS S3, Google Storage) In-Memory Replica Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free Otherwise shared-memory but with concurrency issues
  • 38.
    Items Embeddings Storage CloudStorage Read once when spawning your processes Fully updated every time you deploy a new model <1M items → static file storage works well (e.g. AWS S3, Google Storage) >1B items → big data storage (AWS RedShift, Google BigQuery, Hadoop/HDFS), updating will be “interesting” In-Memory Replica Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free Otherwise shared-memory but with concurrency issues >1B items → cannot load everything at init, and require cache-like mechanisms
  • 39.
    Model Weights Storage CloudStorage Models weights are typically <1MB Models weights are not updated often Static file storage works well In-Memory Replica Small enough, any strategy will work (duplication, copy-on-write, shared-memory, …)
  • 40.
    Infrastructure Items Embs in S3 UsersEmbs in Redis Model Weights in S3 Items kNN DB
  • 41.
    Infrastructure Items Embs in S3 UsersEmbs in Redis Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 42.
    Infrastructure Items Embs in S3 UsersEmbs in Redis 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 43.
    Infrastructure Items Embs in S3 UsersEmbs in Redis 2 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 44.
    Infrastructure Items Embs in S3 UsersEmbs in Redis 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM3
  • 45.
    Infrastructure Items Embs in S3 UsersEmbs in Redis 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 3
  • 46.
    Infrastructure Items Embs in S3 UsersEmbs in Redis Model Predictions 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 47.
    Infrastructure Items Embs in S3 UsersEmbs in Redis Model Predictions 6 2 Top K kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 48.
    Infrastructure Items Embs in S3 UsersEmbs in Redis Model Predictions 6 2 Top K kNN Candidates Generation 1 What should I watch? 7 Star Wars! Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 49.
  • 50.
    Online Updates –Plan Users Embeddings Update Items Embeddings Update Model Weights Update
  • 51.
    User Updates –The Problem Goal Update user embeddings on new user/item interaction Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
  • 52.
    User Updates –The Problem Goal Update user embeddings on new user/item interaction Critical for “session-based” recommendations (e.g. anonymous browsing on retail website) Technical Issue To update the user embeddings you usually need all users interactions Some user may have interacted with thousands of items (e.g. listening history on Spotify)
  • 53.
    User Updates –The Solution Have two kinds of user updates, a quick one and a slow one: Quick “Single-Step” Update Do one step of online update given only the data for the new interaction e.g. one step of stochastic gradient descent Slow “Full” Update Periodically schedule user updates from scratch, happening in the background Requires a scheduler or a technology for background tasks
  • 54.
    User Updates –The Solution Have two kinds of user updates, a quick one and a slow one: Quick “Single-Step” Update Do one step of online update given only the data for the new interaction e.g. one step of stochastic gradient descent Slow “Full” Update Periodically schedule user updates from scratch, happening in the background Requires a scheduler or a technology for background tasks (hello “Discover Weekly” 👋)
  • 55.
    User Updates –Background Tasks Scheduling / Task Queue Technologies ● Cron ● Celery, RabbitMQ, Redis Pub/Sub
  • 56.
    User Updates –Background Tasks Scheduling / Task Queue Technologies ● Cron ● Celery, RabbitMQ, Redis Pub/Sub Sharing Memory Executing the user update requires to load all item embeddings in memory again ● Either you do not share memory, and do not update too often ● Or you do the update in your main workers ● Or you background workers are forks or your main workers (e.g. uwsgi spooling)
  • 57.
    Item Updates New Items(Cold Start) New items don’t have any user interactions Collaborative-Filtering models do not support adding new items Content-Based models do support adding new items Update Items from New Interactions SVD-based Matrix Factorization supports restricted forms of online-update Otherwise heuristics might work, such as single-step of gradient Benefits typically do not worth the DevOps troubles
  • 58.
    Item Updates Update CloudStorage Typically replace the entire file, or use rsync for smarter in-place updates Update In-Memory Replica Not possible with “copy-on-write sharing” → re-deploy all your workers Do-able with shared memory, but small benefits might not worth the DevOps troubles
  • 59.
    Model Updates Goal Update themodel weights based on all new user/item interactions Deploy new model weights without downtime
  • 60.
    Model Updates Goal Update themodel weights based on all new user/item interactions Deploy new model weights without downtime Real-Time? Like for items embeddings, everything is much easier if model weights are read-only Small benefits of real-time update of the model weights You probably want to retrain your model from scratch once in a while and re-deploy
  • 61.
    Model Updates Goal Update themodel weights based on all new user/item interactions Deploy new model weights without downtime Real-Time? Like for items embeddings, everything is much easier if model weights are read-only Small benefits of real-time update of the model weights You probably want to retrain your model from scratch once in a while and re-deploy (hello again “Discover Weekly” 👋)
  • 62.
  • 63.
    Testing and CI/CD– Plan Unit-Testing and Advanced Tests Versioning Orchestration
  • 64.
    Unit-Tests for ML Differencewith Traditional Software No way to programmatically express the specifications of a ML software C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
  • 65.
    Unit-Tests for ML Differencewith Traditional Software No way to programmatically express the specifications of a ML software C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj Small Specific Unit-Tests Make unit-tests absurdly easy to pass. If they fail, you must be sure you have a bug Test your code, not the generalization ability of your model: ● well known algorithm? → there is maths and proofs ● heuristic? → expected to fail, this shouldn’t make CI fail E.g. test that your model can successfully overfit train data when you remove all regularization
  • 66.
    Advanced Tests forML Goal Test generalization ability of your models
  • 67.
    Advanced Tests forML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong
  • 68.
    Advanced Tests forML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong (disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
  • 69.
    Advanced Tests forML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong (disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓) Compare Against Baselines Hard to find good performance thresholds for the tests to pass Easy to test that your model performs better than simple baselines
  • 70.
    Example: Model Test deftest_model(self): """ test that model performs better than constant baseline """ # generate sample data dataset = self._sample_synthetic_dataset() # train/valid split train_data, valid_data = self._trainvalid_split(dataset) # fit model model = self._get_model() model.fit(train_data) model_valid_cost = self._get_valid_scores(model, valid_data) # compare to dummy baseline model cst_model = ConstantByUserModel() cst_model.fit(train_data) cst_model_valid_cost = self._get_valid_scores(cst_model, valid_data) # test our model is better assert model_valid_cost < cst_model_valid_cost
  • 71.
    Versioning – Databasesvs ML Data Objects Database ML Data Objects Persistent Ephemeral Slow incremental updates Frequent drastic changes from scratch Foreign keys and locks to prevent corrupted data Objects dependencies not expressed programmatically
  • 72.
    Versioning ML Objects Identifiers MLobjects keep changing and depend on each other (e.g. datasets, models weights, items weights) Store unique identifiers for all ML objects (data hash or unique id) Keep track of identifiers of dependencies to prevent corrupted data Versioning Version ML objects to keep track of architecture changes Match code version against data version e.g. weights for 3-layers n-net must never be used in the code for 2-layers architecture
  • 73.
    Orchestration Manual Workflow At first,execute your workflow manually or with CI/CD triggers E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
  • 74.
    Orchestration Manual Workflow At first,execute your workflow manually or with CI/CD triggers E.g. bash scripts, python scripts, Gitlab triggers, cron jobs Dedicated Workflow Orchestrator and Jobs DAGs When you have dozens of ML jobs that depend on each other, you will end up doing mistakes: ● making pipeline fail ● or worst: deploying corrupted data Workflow Management System: Airflow, Luigi, Oozie, ... WMS tutorial: http://bit.ly/2MUM0r9
  • 75.
    Microservices and Kubernetes Microservices ●All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud
  • 76.
    Microservices and Kubernetes Microservices ●All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes]
  • 77.
    Microservices and Kubernetes Microservices ●All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes] ● Memory is expensive → split microservices in a way you can still share memory between workers and avoid duplication
  • 78.
    Microservices and Kubernetes Microservices ●All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes] ● Memory is expensive → split microservices in a way you can still share memory between workers and avoid duplication ● Microservices for RecSys require lots of memory and compute → favor a few big nodes more than many small ones
  • 79.
    Micro-Services and Kubernetes ModelSplit-Brain Problem ● No-downtime updates require blue/green or canary deployment ● Your architecture has to serve two models at once ● And you should not expect updating both the code and the data at the same time
  • 80.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 Model v1.0 users v1.0 redis S3
  • 81.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 redis S3 Update step 1/6
  • 82.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 redis S3 Model v2.0 Update step 2/6
  • 83.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 3/6
  • 84.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 4/6
  • 85.
    Micro-Services and Kubernetes itemsv1.0 model v1.0 items v2.0 model v2.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 5/6
  • 86.
    Micro-Services and Kubernetes itemsv2.0 model v2.0 users v2.0 redis S3 Model v2.0 Update step 6/6
  • 87.
  • 88.
    ML Data Management– Plan ML Data File Format Cloud Storage Tools
  • 89.
    ML Data FileFormat Cross-language HDFS has great integration in python/pandas Protobuf has great integration in Go, but very slow in python (as of today) In Python: pickle+gzip Can pickle numpy array efficiently Only uses standard library Faster to read/write than HDFS Alternative: high level library like sklearn’s joblib
  • 90.
    Cloud Storage –Experiments Results NoSQL ML Experiments Management Comparison Comparison with code samples: http://bit.ly/2JtZg3M Storage Language Scaling UI MlFlow file-based Python, R, Java Small internal DVC file-based Shell Small no Sacred MongoDB Python Large external TensorBoard file-based TF, PyTorch Medium great
  • 91.
    Cloud Storage –Large ML Files Static Storage for Saved Models and Datasets File-based static storage ● either cloud (AWS S3, Google Storage, etc.) ● or NAS virtual file system Limited metadata support (e.g. tags) Requires strict file naming conventions
  • 92.
    Tools Not much 🤷‍♂ InPython ● Use numpy structured arrays http://bit.ly/2BP1hU3 ● With Intel CPU, use intel-numpy-mkl https://dockr.ly/321u2Yp ● Use scipy sparse matrices http://bit.ly/36bJwMt ● Pandas (but may unnecessary slow vs pure numpy) ● Stay tuned! github.com/Crossing-Minds
  • 93.
    Real-Time Deployment –Conclusion ● Replicate items embeddings and model weights in memory ● Re-deploy to update (read-only for easy memory sharing) ● User embeddings can be managed in a cloud key-value store ● Proper identification, versioning, and naming convention for ML data files ● Manual workflow, then managed by DAGs of jobs
  • 94.
  • 95.