productionising-recommenders

Challenges and Considerations for
Productionizing Recommender Systems

Personalized
recommendations
Users’
data
RecSys
Recommender Systems

Everything is a recommendation, once
you see them, they are everywhere

Everything is a recommendation

Classical recommendation model
Three types of entities: Users, Items and Contexts
1. A background knowledge:
• A set of ratings – preferences
• r: Users x Items x Contexts --> {1, 2, 3, 4, 5}
• A set of “features” of the Users, Items and Contexts
2. A method for predicting the function r where it is unknown:
• r*(u, i, c) = Average ratings r(u’, i, c’): users u’ are similar to u and context c’ is
similar to c
3. A method for selecting the items to recommend (choice):
• In context c recommend to u the item i* with the largest predicted rating
r*(u,i,c)

The goal is to find items that the user will
happily choose

Requirements surrounding RecSys
Sculley, David, et al. "Hidden technical debt in machine learning systems." Advances in neural information processing systems 28 (2015).

Requirements surrounding RecSys
Sculley, David, et al. "Hidden technical debt in machine learning systems." Advances in neural information processing systems 28 (2015).
User
Modeling

Scoping
1. Decide the recommendation paradigm and degree of
personalization
• Resources and product
• Type of user we are catering to
2. KPI metrics:
• Online: CVR, CTR, ad-hoc metrics
• Offline: recall, nDCG
Ricci, Francesco, Lior Rokach, and Bracha Shapira. "Recommender systems: introduction and challenges." Recommender systems handbook (2015): 1-34.
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring

The life cycle of a recommender
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring
1. Features:
• Reusable
• Transformable
• Interpretable
• Reliable
2. Type of feedback
• Implicit data is (usually):
• More dense, and available
for all users
• Better representative of
user behavior vs. user
reflection
• More related to final
objective function ○ Better
correlated with AB test
results
• E.g. Rating vs watching
3. Cleaning
• Dropping bots
• Missing data strategy
• Standardisation
• Bucketing
Beliakov, Gleb, Tomasa Calvo, and Simon James. "Aggregation of preferences in recommender systems." Recommender systems handbook (2011): 705-734.

Big Data Pipelines
On-Premises
Edge
Cloud
Databases
Application
Servers
Mainframes
Logs IoT
Data Stores
Replication and Ingestion
Streaming Ingestion
Data Integration
Data Lake
Data Integration
Data Cleaning
Recommender
Data Producers Data Pipeline Data Consumer
Data Warehouse

Data preparation
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring
1. Features:
• Reusable
• Transformable
• Interpretable
• Reliable
2. Type of feedback
• Implicit data is (usually):
• More dense, and available
for all users
• Better representative of
user behavior vs. user
reflection
• More related to final
objective function ○ Better
correlated with AB test
results
• E.g. Rating vs watching
3. Cleaning
• Dropping bots
• Missing data strategy
• Standardisation
• Bucketing
Beliakov, Gleb, Tomasa Calvo, and Simon James. "Aggregation of preferences in recommender systems." Recommender systems handbook (2011): 705-734.

Modelling
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Offline evaluation
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring
Latency is key!

RecSys paradigmes
Collaborative Filtering Content-based Knowledge-based

Collaborative Filtering
Coba, L., Rook, L., Zanker, M., & Symeonidis, P. (2019, March). Decision making strategies differ in the presence of collaborative explanations: two conjoint studies. IUI 19.

Content-based
Dominguez, V., Messina, P., Donoso-Guzmán, I., & Parra, D. (2019, March). The effect of explanations and algorithmic accuracy on visual recommender systems of artistic images. IUI 2019.

Selecting the model
• There is no single winner
• Usually, models are ensembled
• Multi-stage architectures are
common
• Live predictions or in batches
• DNNs are not always deployable
(cost-benefit tradeoff and latency)

A non personalized recommender
• We can use a hybrid
recommender: collaborative
filtering + content based
• Collaborative filtering suffers
from cold start, but it is better
performing
• If the catalogue doesn’t
change often then we can pre-
compute interactions

Non-personalized recommender
Edge device
API Call
Recommendations
Pre-computed
recommendations

Non-personalized recommender
Edge device
API Call
Recommendations
ANN
Search
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point https://github.com/spotify/annoy

Scalable personalized recommender
Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the 10th ACM conference on recommender systems. 2016.

Scalable personalized recommender
Amatrian, Xavier. “Blueprints for recommender system architectures: 10th anniversary edition.” https://amatriain.net/blog/RecsysArchitectures

Modelling
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Offline evaluation
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring
• Latency is key!
• Choose the right architecture based on the situation
• Pick the right metrics:
• recall for retrieval
• nDCG or MRR for ranking @ K, or precision if ranking is irrelevant
• Go beyond traditional metrics, measure diversity, novelty and see whether the models are biased
• UAT
• Define data contracts

Deploying
1. Test
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Deploying
1. Test – high code coverage
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Deploying
2. Test
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Deploying
2. Test – smart deployment strategy
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Tech debt sources
• Glue code
• Pipeline Jungles
• Dead Experimental Codepaths
• Plain-Old-Data Type Smell
• Multiple-Language Smell
• Prototype Smell
• Reproducibility Debt
• Cultural Debt

Deploying
3. Test
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Deploying
3. Test – A/B testing, interleaving or MAB
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Deploying
3. Test – A/B testing, interleaving or MAB
4. Keep monitoring
Scoping
Data
• Features
• Cleaning
• Transformation
Modelling
• Select model
• Check
• UAT
Deployment
• A/B testing
• Continuous
Monitoring

Continuous monitoring
• Feedback Loops
• Direct Feedback Loops
• Hidden Feedback Loops
• Data drift
• Action Limits
• Prediction Biases
• Performance Deterioration

Thank you!
Questions?
https://www.slideshare.net/LudovikCoba
https://www.linkedin.com/in/ludovik-coba-8b96a0154/
https://scholar.google.at/citations?user=gsj8LcwAAAAJ&hl=en

productionising-recommenders

Recommended

Recommended

More Related Content

Similar to productionising-recommenders

Similar to productionising-recommenders (20)

Recently uploaded

Recently uploaded (20)

productionising-recommenders