SlideShare a Scribd company logo
1 of 95
Download to read offline
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Real-Time Deployment
Cloud Infrastructure
Online Updates
Testing and CI/CD
ML Data Management
Introduction
Recommender Systems
Items DB
Users DB
What should I watch?
I need this
user tastes
I need all
the items
Recommender Systems
Items DB
Model
Predictions
Users DB
What should I watch?
I need this
user tastes
I need all
the items
Recommender Systems
Items DB
Model
Predictions
Users DB
Top K
What should I watch?
Star Wars!
I need this
user tastes
I need all
the items
Recommender Systems – Previous Meetups
(1) With started with the dataset.
(2) Then we trained models.
(3) We selected the best one.
(4) And now, we want to deploy the model to production!
Recommender Systems Datasets – Previous Meetups
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Explicit feedback Implicit feedback
Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses
Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases
Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
Recommender Systems Datasets – This Meetup
Abstract User/Item Interactions
user-id item-id value
1234 4321 5.0
1234 654 2.0
456 2432 3.5
456 654 1.0
456 432 4.5
987 12 5.0
Recommender Systems Models – Previous Meetups
Example of model we want to deploy
Recommender Systems Models – Previous Meetups
users embeddings
items embeddings
model weights
Example of model we want to deploy
Recommender Systems Models – This Meetup
User Embeddings Item Embeddings
Model Weights
Alice Star Wars
4.5/5
Software vs ML
Traditional Software Machine Learning Software
Stateless Stateful
Explicit Specifications No Specifications
Rule-based Logic from Code Model-based Logic from Data
Cloud Infrastructure
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Frequent item updates and new items
New items are added continuously
Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)
Frequent item updates and new items
New items are added continuously
Non-trivial model
Can’t be built in e.g. SQL or ElasticSearch
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
(Adding new items in real time may be supported though)
vs vs vs
Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
(Adding new items in real time may be supported though)
Recommendations Are Not Stored
You do not need to save recommendations
vs vs vs
Cloud Infrastructure – Plan
Users Embeddings Storage
Items Embeddings Storage
Model Weights Storage
Users Embeddings Storage
User Embeddings
Binary blob <1KB
Keep changing
Need only one per user request
Key-Value Store
Fetch the user embeddings from the cloud at each request
Atomic key-value store like redis will handle concurrency for free
Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data
Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data
(1B items * 1KB each → 1TB total data)
Network Issue – Solution
Items DB
Model
Predictions
Users DB
Top K
Candidates
Generation
What should I
watch?
Star Wars!
Solution!
Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike
Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike (WIP without efficient Top-K [issue#42326])
Items Embeddings – The Bigger Issue
Items Embeddings – The Bigger Issue
Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution
Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution (DevOps Nightmares!)
🎃🧛🕷 You need to keep items embeddings in memory of your processes 🕷🧛🎃
Items Embeddings Storage
Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues
Items Embeddings Storage
Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
>1B items → big data storage (AWS RedShift, Google BigQuery, Hadoop/HDFS), updating will be “interesting”
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues
>1B items → cannot load everything at init, and require cache-like mechanisms
Model Weights Storage
Cloud Storage
Models weights are typically <1MB
Models weights are not updated often
Static file storage works well
In-Memory Replica
Small enough, any strategy will work (duplication, copy-on-write, shared-memory, …)
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Weights
in S3
Items kNN
DB
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch?
7
Star Wars!
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3
Online Updates
Online Updates – Plan
Users Embeddings Update
Items Embeddings Update
Model Weights Update
User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
Technical Issue
To update the user embeddings you usually need all users interactions
Some user may have interacted with thousands of items (e.g. listening history on Spotify)
User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks
User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks
(hello “Discover Weekly” 👋)
User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub
User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub
Sharing Memory
Executing the user update requires to load all item embeddings in memory again
● Either you do not share memory, and do not update too often
● Or you do the update in your main workers
● Or you background workers are forks or your main workers (e.g. uwsgi spooling)
Item Updates
New Items (Cold Start)
New items don’t have any user interactions
Collaborative-Filtering models do not support adding new items
Content-Based models do support adding new items
Update Items from New Interactions
SVD-based Matrix Factorization supports restricted forms of online-update
Otherwise heuristics might work, such as single-step of gradient
Benefits typically do not worth the DevOps troubles
Item Updates
Update Cloud Storage
Typically replace the entire file, or use rsync for smarter in-place updates
Update In-Memory Replica
Not possible with “copy-on-write sharing” → re-deploy all your workers
Do-able with shared memory, but small benefits might not worth the DevOps troubles
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy
Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy
(hello again “Discover Weekly” 👋)
Testing and CI/CD
Testing and CI/CD – Plan
Unit-Testing and Advanced Tests
Versioning
Orchestration
Unit-Tests for ML
Difference with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
Unit-Tests for ML
Difference with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
Small Specific Unit-Tests
Make unit-tests absurdly easy to pass. If they fail, you must be sure you have a bug
Test your code, not the generalization ability of your model:
● well known algorithm? → there is maths and proofs
● heuristic? → expected to fail, this shouldn’t make CI fail
E.g. test that your model can successfully overfit train data when you remove all regularization
Advanced Tests for ML
Goal
Test generalization ability of your models
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
Advanced Tests for ML
Goal
Test generalization ability of your models
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
Compare Against Baselines
Hard to find good performance thresholds for the tests to pass
Easy to test that your model performs better than simple baselines
Example: Model Test
def test_model(self):
""" test that model performs better than constant baseline """
# generate sample data
dataset = self._sample_synthetic_dataset()
# train/valid split
train_data, valid_data = self._trainvalid_split(dataset)
# fit model
model = self._get_model()
model.fit(train_data)
model_valid_cost = self._get_valid_scores(model, valid_data)
# compare to dummy baseline model
cst_model = ConstantByUserModel()
cst_model.fit(train_data)
cst_model_valid_cost = self._get_valid_scores(cst_model, valid_data)
# test our model is better
assert model_valid_cost < cst_model_valid_cost
Versioning – Databases vs ML Data Objects
Database ML Data Objects
Persistent Ephemeral
Slow incremental updates Frequent drastic changes from scratch
Foreign keys and locks to prevent
corrupted data
Objects dependencies not expressed
programmatically
Versioning
ML Objects Identifiers
ML objects keep changing and depend on each other (e.g. datasets, models weights, items weights)
Store unique identifiers for all ML objects (data hash or unique id)
Keep track of identifiers of dependencies to prevent corrupted data
Versioning
Version ML objects to keep track of architecture changes
Match code version against data version
e.g. weights for 3-layers n-net must never be used in the code for 2-layers architecture
Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
Dedicated Workflow Orchestrator and Jobs DAGs
When you have dozens of ML jobs that depend on each other, you will end up doing mistakes:
● making pipeline fail
● or worst: deploying corrupted data
Workflow Management System: Airflow, Luigi, Oozie, ...
WMS tutorial: http://bit.ly/2MUM0r9
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication
Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication
● Microservices for RecSys require lots of memory and compute
→ favor a few big nodes more than many small ones
Micro-Services and Kubernetes
Model Split-Brain Problem
● No-downtime updates require blue/green or canary deployment
● Your architecture has to serve two models at once
● And you should not expect updating both the code and the data at the same time
Micro-Services and Kubernetes
items v1.0
model v1.0
Model
v1.0
users v1.0
redis S3
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Update step 1/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Model
v2.0
Update step 2/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 3/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 4/6
Micro-Services and Kubernetes
items v1.0
model v1.0
items v2.0
model v2.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 5/6
Micro-Services and Kubernetes
items v2.0
model v2.0
users v2.0
redis S3
Model
v2.0
Update step 6/6
ML Data Management
ML Data Management – Plan
ML Data File Format
Cloud Storage
Tools
ML Data File Format
Cross-language
HDFS has great integration in python/pandas
Protobuf has great integration in Go, but very slow in python (as of today)
In Python: pickle+gzip
Can pickle numpy array efficiently
Only uses standard library
Faster to read/write than HDFS
Alternative: high level library like sklearn’s joblib
Cloud Storage – Experiments Results
NoSQL ML Experiments Management Comparison
Comparison with code samples: http://bit.ly/2JtZg3M
Storage Language Scaling UI
MlFlow file-based Python, R, Java Small internal
DVC file-based Shell Small no
Sacred MongoDB Python Large external
TensorBoard file-based TF, PyTorch Medium great
Cloud Storage – Large ML Files
Static Storage for Saved Models and Datasets
File-based static storage
● either cloud (AWS S3, Google Storage, etc.)
● or NAS virtual file system
Limited metadata support (e.g. tags)
Requires strict file naming conventions
Tools
Not much 🤷‍♂
In Python
● Use numpy structured arrays http://bit.ly/2BP1hU3
● With Intel CPU, use intel-numpy-mkl https://dockr.ly/321u2Yp
● Use scipy sparse matrices http://bit.ly/36bJwMt
● Pandas
(but may unnecessary slow vs pure numpy)
● Stay tuned! github.com/Crossing-Minds
Real-Time Deployment – Conclusion
● Replicate items embeddings and model weights in memory
● Re-deploy to update (read-only for easy memory sharing)
● User embeddings can be managed in a cloud key-value store
● Proper identification, versioning, and naming convention for ML data files
● Manual workflow, then managed by DAGs of jobs
Real-Time Deployment – Conclusion
Thanks!

More Related Content

What's hot

Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation enginehkbhadraa
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative FilteringTayfun Sen
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Varad Meru
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationAlessandro Liparoti
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...Iván Palomares Carrascosa
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender EnginesThomas Hess
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systemsKapil Garg
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakDeepak Agarwal
 

What's hot (20)

Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
kdd2015
kdd2015kdd2015
kdd2015
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Project presentation
Project presentationProject presentation
Project presentation
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative Information
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 

Similar to Recommender Systems from A to Z – Real-Time Deployment

2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나스타트업과 개발자를 위한 AWS 클라우드 태권 세미나
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나Amazon Web Services Korea
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganMongoDB
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Codemotion
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Customer Presentation - Conde Nast
AWS Customer Presentation - Conde NastAWS Customer Presentation - Conde Nast
AWS Customer Presentation - Conde NastAmazon Web Services
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara AnjargolianHakka Labs
 

Similar to Recommender Systems from A to Z – Real-Time Deployment (20)

2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나스타트업과 개발자를 위한 AWS 클라우드 태권 세미나
스타트업과 개발자를 위한 AWS 클라우드 태권 세미나
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Customer Presentation - Conde Nast
AWS Customer Presentation - Conde NastAWS Customer Presentation - Conde Nast
AWS Customer Presentation - Conde Nast
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
 

Recently uploaded

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

Recommender Systems from A to Z – Real-Time Deployment

  • 1.
  • 2. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4. Real-Time Deployment Cloud Infrastructure Online Updates Testing and CI/CD ML Data Management
  • 6. Recommender Systems Items DB Users DB What should I watch? I need this user tastes I need all the items
  • 7. Recommender Systems Items DB Model Predictions Users DB What should I watch? I need this user tastes I need all the items
  • 8. Recommender Systems Items DB Model Predictions Users DB Top K What should I watch? Star Wars! I need this user tastes I need all the items
  • 9. Recommender Systems – Previous Meetups (1) With started with the dataset. (2) Then we trained models. (3) We selected the best one. (4) And now, we want to deploy the model to production!
  • 10. Recommender Systems Datasets – Previous Meetups Explicit feedback (users’ ratings) Implicit feedback (users’ clicks) Explicit feedback Implicit feedback Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
  • 11. Recommender Systems Datasets – This Meetup Abstract User/Item Interactions user-id item-id value 1234 4321 5.0 1234 654 2.0 456 2432 3.5 456 654 1.0 456 432 4.5 987 12 5.0
  • 12. Recommender Systems Models – Previous Meetups Example of model we want to deploy
  • 13. Recommender Systems Models – Previous Meetups users embeddings items embeddings model weights Example of model we want to deploy
  • 14. Recommender Systems Models – This Meetup User Embeddings Item Embeddings Model Weights Alice Star Wars 4.5/5
  • 15. Software vs ML Traditional Software Machine Learning Software Stateless Stateful Explicit Specifications No Specifications Rule-based Logic from Code Model-based Logic from Data
  • 17. Key Objectives Highly scalable, highly available Large number of users and items No downtime
  • 18. Key Objectives Highly scalable, highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch)
  • 19. Key Objectives Highly scalable, highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch) Frequent item updates and new items New items are added continuously
  • 20. Key Objectives Highly scalable, highly available Large number of users and items No downtime Continuous user updates New user/item interactions (e.g. ratings, clicks, watch) Frequent item updates and new items New items are added continuously Non-trivial model Can’t be built in e.g. SQL or ElasticSearch
  • 21. Assumptions Not Too Big Can be trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items
  • 22. Assumptions Not Too Big Can be trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items vs vs vs
  • 23. Assumptions Not Too Big Can be trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live vs vs vs
  • 24. Assumptions Not Too Big Can be trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live (Adding new items in real time may be supported though) vs vs vs
  • 25. Assumptions Not Too Big Can be trained on big enough ML server No need for distributed ML Less than 1TB of data, so less than 1B items Model Updates Can Be Delayed You do not need to update the entire model several times per hour in live (Adding new items in real time may be supported though) Recommendations Are Not Stored You do not need to save recommendations vs vs vs
  • 26. Cloud Infrastructure – Plan Users Embeddings Storage Items Embeddings Storage Model Weights Storage
  • 27. Users Embeddings Storage User Embeddings Binary blob <1KB Keep changing Need only one per user request Key-Value Store Fetch the user embeddings from the cloud at each request Atomic key-value store like redis will handle concurrency for free
  • 28. Items Embeddings – The Big Issue Network Problem If the items data is stored in a database like SQL, and the model is too complex to be expressed in SQL: then you need to fetch 100% of the items data from the DB to your compute instance ...at each request! Rule of Thumb 1M items * 1KB each → 1GB total data
  • 29. Items Embeddings – The Big Issue Network Problem If the items data is stored in a database like SQL, and the model is too complex to be expressed in SQL: then you need to fetch 100% of the items data from the DB to your compute instance ...at each request! Rule of Thumb 1M items * 1KB each → 1GB total data (1B items * 1KB each → 1TB total data)
  • 30. Network Issue – Solution Items DB Model Predictions Users DB Top K Candidates Generation What should I watch? Star Wars! Solution!
  • 31. Items Candidate Generation Goal: pre-select thousand(s) of items for your model, without need to see all embeddings Model-Free E.g. kNN item-item with pre-computed kNN tables Easy with ML-ready DB like Spark Do-able in ElasticSearch or even SQL Model-Based E.g. linear matrix factorization, then smart implementation of Top-K Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike
  • 32. Items Candidate Generation Goal: pre-select thousand(s) of items for your model, without need to see all embeddings Model-Free E.g. kNN item-item with pre-computed kNN tables Easy with ML-ready DB like Spark Do-able in ElasticSearch or even SQL Model-Based E.g. linear matrix factorization, then smart implementation of Top-K Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike (WIP without efficient Top-K [issue#42326])
  • 33. Items Embeddings – The Bigger Issue
  • 34. Items Embeddings – The Bigger Issue
  • 35. Throughput Problem and Solution Throughput Problem You can’t get thousands of arbitrary items embeddings at each request The physical limit is DB CPU, not network speed Hosting a local redis co-located in the same physical machine as your process does not help Throughput Solution
  • 36. Throughput Problem and Solution Throughput Problem You can’t get thousands of arbitrary items embeddings at each request The physical limit is DB CPU, not network speed Hosting a local redis co-located in the same physical machine as your process does not help Throughput Solution (DevOps Nightmares!) 🎃🧛🕷 You need to keep items embeddings in memory of your processes 🕷🧛🎃
  • 37. Items Embeddings Storage Cloud Storage Read once when spawning your processes Fully updated every time you deploy a new model <1M items → static file storage works well (e.g. AWS S3, Google Storage) In-Memory Replica Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free Otherwise shared-memory but with concurrency issues
  • 38. Items Embeddings Storage Cloud Storage Read once when spawning your processes Fully updated every time you deploy a new model <1M items → static file storage works well (e.g. AWS S3, Google Storage) >1B items → big data storage (AWS RedShift, Google BigQuery, Hadoop/HDFS), updating will be “interesting” In-Memory Replica Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free Otherwise shared-memory but with concurrency issues >1B items → cannot load everything at init, and require cache-like mechanisms
  • 39. Model Weights Storage Cloud Storage Models weights are typically <1MB Models weights are not updated often Static file storage works well In-Memory Replica Small enough, any strategy will work (duplication, copy-on-write, shared-memory, …)
  • 40. Infrastructure Items Embs in S3 Users Embs in Redis Model Weights in S3 Items kNN DB
  • 41. Infrastructure Items Embs in S3 Users Embs in Redis Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 42. Infrastructure Items Embs in S3 Users Embs in Redis 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 43. Infrastructure Items Embs in S3 Users Embs in Redis 2 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM
  • 44. Infrastructure Items Embs in S3 Users Embs in Redis 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM3
  • 45. Infrastructure Items Embs in S3 Users Embs in Redis 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 3
  • 46. Infrastructure Items Embs in S3 Users Embs in Redis Model Predictions 2 kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 47. Infrastructure Items Embs in S3 Users Embs in Redis Model Predictions 6 2 Top K kNN Candidates Generation 1 What should I watch? Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 48. Infrastructure Items Embs in S3 Users Embs in Redis Model Predictions 6 2 Top K kNN Candidates Generation 1 What should I watch? 7 Star Wars! Items Embs in RAM Model Weights in S3 Items kNN DB Model Weights in RAM 4 5 3
  • 50. Online Updates – Plan Users Embeddings Update Items Embeddings Update Model Weights Update
  • 51. User Updates – The Problem Goal Update user embeddings on new user/item interaction Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
  • 52. User Updates – The Problem Goal Update user embeddings on new user/item interaction Critical for “session-based” recommendations (e.g. anonymous browsing on retail website) Technical Issue To update the user embeddings you usually need all users interactions Some user may have interacted with thousands of items (e.g. listening history on Spotify)
  • 53. User Updates – The Solution Have two kinds of user updates, a quick one and a slow one: Quick “Single-Step” Update Do one step of online update given only the data for the new interaction e.g. one step of stochastic gradient descent Slow “Full” Update Periodically schedule user updates from scratch, happening in the background Requires a scheduler or a technology for background tasks
  • 54. User Updates – The Solution Have two kinds of user updates, a quick one and a slow one: Quick “Single-Step” Update Do one step of online update given only the data for the new interaction e.g. one step of stochastic gradient descent Slow “Full” Update Periodically schedule user updates from scratch, happening in the background Requires a scheduler or a technology for background tasks (hello “Discover Weekly” 👋)
  • 55. User Updates – Background Tasks Scheduling / Task Queue Technologies ● Cron ● Celery, RabbitMQ, Redis Pub/Sub
  • 56. User Updates – Background Tasks Scheduling / Task Queue Technologies ● Cron ● Celery, RabbitMQ, Redis Pub/Sub Sharing Memory Executing the user update requires to load all item embeddings in memory again ● Either you do not share memory, and do not update too often ● Or you do the update in your main workers ● Or you background workers are forks or your main workers (e.g. uwsgi spooling)
  • 57. Item Updates New Items (Cold Start) New items don’t have any user interactions Collaborative-Filtering models do not support adding new items Content-Based models do support adding new items Update Items from New Interactions SVD-based Matrix Factorization supports restricted forms of online-update Otherwise heuristics might work, such as single-step of gradient Benefits typically do not worth the DevOps troubles
  • 58. Item Updates Update Cloud Storage Typically replace the entire file, or use rsync for smarter in-place updates Update In-Memory Replica Not possible with “copy-on-write sharing” → re-deploy all your workers Do-able with shared memory, but small benefits might not worth the DevOps troubles
  • 59. Model Updates Goal Update the model weights based on all new user/item interactions Deploy new model weights without downtime
  • 60. Model Updates Goal Update the model weights based on all new user/item interactions Deploy new model weights without downtime Real-Time? Like for items embeddings, everything is much easier if model weights are read-only Small benefits of real-time update of the model weights You probably want to retrain your model from scratch once in a while and re-deploy
  • 61. Model Updates Goal Update the model weights based on all new user/item interactions Deploy new model weights without downtime Real-Time? Like for items embeddings, everything is much easier if model weights are read-only Small benefits of real-time update of the model weights You probably want to retrain your model from scratch once in a while and re-deploy (hello again “Discover Weekly” 👋)
  • 63. Testing and CI/CD – Plan Unit-Testing and Advanced Tests Versioning Orchestration
  • 64. Unit-Tests for ML Difference with Traditional Software No way to programmatically express the specifications of a ML software C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
  • 65. Unit-Tests for ML Difference with Traditional Software No way to programmatically express the specifications of a ML software C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj Small Specific Unit-Tests Make unit-tests absurdly easy to pass. If they fail, you must be sure you have a bug Test your code, not the generalization ability of your model: ● well known algorithm? → there is maths and proofs ● heuristic? → expected to fail, this shouldn’t make CI fail E.g. test that your model can successfully overfit train data when you remove all regularization
  • 66. Advanced Tests for ML Goal Test generalization ability of your models
  • 67. Advanced Tests for ML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong
  • 68. Advanced Tests for ML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong (disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
  • 69. Advanced Tests for ML Goal Test generalization ability of your models Synthetic Datasets > Real Datasets Full control of the assumptions You can make it easy enough so if the test fails, something is wrong (disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓) Compare Against Baselines Hard to find good performance thresholds for the tests to pass Easy to test that your model performs better than simple baselines
  • 70. Example: Model Test def test_model(self): """ test that model performs better than constant baseline """ # generate sample data dataset = self._sample_synthetic_dataset() # train/valid split train_data, valid_data = self._trainvalid_split(dataset) # fit model model = self._get_model() model.fit(train_data) model_valid_cost = self._get_valid_scores(model, valid_data) # compare to dummy baseline model cst_model = ConstantByUserModel() cst_model.fit(train_data) cst_model_valid_cost = self._get_valid_scores(cst_model, valid_data) # test our model is better assert model_valid_cost < cst_model_valid_cost
  • 71. Versioning – Databases vs ML Data Objects Database ML Data Objects Persistent Ephemeral Slow incremental updates Frequent drastic changes from scratch Foreign keys and locks to prevent corrupted data Objects dependencies not expressed programmatically
  • 72. Versioning ML Objects Identifiers ML objects keep changing and depend on each other (e.g. datasets, models weights, items weights) Store unique identifiers for all ML objects (data hash or unique id) Keep track of identifiers of dependencies to prevent corrupted data Versioning Version ML objects to keep track of architecture changes Match code version against data version e.g. weights for 3-layers n-net must never be used in the code for 2-layers architecture
  • 73. Orchestration Manual Workflow At first, execute your workflow manually or with CI/CD triggers E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
  • 74. Orchestration Manual Workflow At first, execute your workflow manually or with CI/CD triggers E.g. bash scripts, python scripts, Gitlab triggers, cron jobs Dedicated Workflow Orchestrator and Jobs DAGs When you have dozens of ML jobs that depend on each other, you will end up doing mistakes: ● making pipeline fail ● or worst: deploying corrupted data Workflow Management System: Airflow, Luigi, Oozie, ... WMS tutorial: http://bit.ly/2MUM0r9
  • 75. Microservices and Kubernetes Microservices ● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud
  • 76. Microservices and Kubernetes Microservices ● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes]
  • 77. Microservices and Kubernetes Microservices ● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes] ● Memory is expensive → split microservices in a way you can still share memory between workers and avoid duplication
  • 78. Microservices and Kubernetes Microservices ● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load → source of truth for data cannot be a microservice → has to be managed in the cloud ● Fine-tune liveness probes (and readiness probes) to allow slow init of containers [new in Kubernetes 1.16: startup probes] ● Memory is expensive → split microservices in a way you can still share memory between workers and avoid duplication ● Microservices for RecSys require lots of memory and compute → favor a few big nodes more than many small ones
  • 79. Micro-Services and Kubernetes Model Split-Brain Problem ● No-downtime updates require blue/green or canary deployment ● Your architecture has to serve two models at once ● And you should not expect updating both the code and the data at the same time
  • 80. Micro-Services and Kubernetes items v1.0 model v1.0 Model v1.0 users v1.0 redis S3
  • 81. Micro-Services and Kubernetes items v1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 redis S3 Update step 1/6
  • 82. Micro-Services and Kubernetes items v1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 redis S3 Model v2.0 Update step 2/6
  • 83. Micro-Services and Kubernetes items v1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 3/6
  • 84. Micro-Services and Kubernetes items v1.0 model v1.0 items v2.0 model v2.0 Model v1.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 4/6
  • 85. Micro-Services and Kubernetes items v1.0 model v1.0 items v2.0 model v2.0 users v1.0 users v2.0 redis S3 Model v2.0 Update step 5/6
  • 86. Micro-Services and Kubernetes items v2.0 model v2.0 users v2.0 redis S3 Model v2.0 Update step 6/6
  • 88. ML Data Management – Plan ML Data File Format Cloud Storage Tools
  • 89. ML Data File Format Cross-language HDFS has great integration in python/pandas Protobuf has great integration in Go, but very slow in python (as of today) In Python: pickle+gzip Can pickle numpy array efficiently Only uses standard library Faster to read/write than HDFS Alternative: high level library like sklearn’s joblib
  • 90. Cloud Storage – Experiments Results NoSQL ML Experiments Management Comparison Comparison with code samples: http://bit.ly/2JtZg3M Storage Language Scaling UI MlFlow file-based Python, R, Java Small internal DVC file-based Shell Small no Sacred MongoDB Python Large external TensorBoard file-based TF, PyTorch Medium great
  • 91. Cloud Storage – Large ML Files Static Storage for Saved Models and Datasets File-based static storage ● either cloud (AWS S3, Google Storage, etc.) ● or NAS virtual file system Limited metadata support (e.g. tags) Requires strict file naming conventions
  • 92. Tools Not much 🤷‍♂ In Python ● Use numpy structured arrays http://bit.ly/2BP1hU3 ● With Intel CPU, use intel-numpy-mkl https://dockr.ly/321u2Yp ● Use scipy sparse matrices http://bit.ly/36bJwMt ● Pandas (but may unnecessary slow vs pure numpy) ● Stay tuned! github.com/Crossing-Minds
  • 93. Real-Time Deployment – Conclusion ● Replicate items embeddings and model weights in memory ● Re-deploy to update (read-only for easy memory sharing) ● User embeddings can be managed in a cloud key-value store ● Proper identification, versioning, and naming convention for ML data files ● Manual workflow, then managed by DAGs of jobs