Recommendation systems

Collect requirements / Design / Analyze
Recommendation systems
01

1
Intro
This presentation does not contain sensitive implementation details.
The primary focus is on general system design.
No specific knowledge is required.
Duration is approximately 1 hour + questions.
Please write your questions in the chat and include the slide number (if
applicable).
Architecture diagrams are more specific to media services.

2
About
https://www.linkedin.com/in/anton-ermak
Rayman.k26@gmail.com
Training / Consulting
Anton Ermak
9 YOE in Software Development

System design starts here
Clarify the expected system behavior.
Refine the data volume for today and the future.
Don't hesitate to ask questions.
Consider non-technical requirements.
Link references and check assumptions.
Explain the general approach.
It's a vast discussion topic, so be mindful of your time.
3

4
General definitions
A service with users and items.
The goal is to provide the most relevant items for each user.
Examples include music/video services and e-commerce platforms.
Data collection includes implicit (behavioral) and/or explicit (rating) feedback.

5
Functional requirements
Provide a set of the most relevant items for a user.
Handle cold start and the sparsity problems.
Support multiple platforms.
Control recommendation diversity.
Ensure real-time updates of recommendation results.
Present the approach with a step-by-step process.
API:
- getRelevantItems(userProfile, count) -> [ItemId]

6
Non-functional requirements
Realtime response for concurrent recommendation invocations.
Scalable on millions users and items.
Monitoring and alerting.
Privacy and security.
Robustness.
Adaptability and customization.
A/B testing and Experimentation.

7
Initial assumptions
Long-tail distribution of items popularity.
Users categories: engaged and cold users.
Often items size >> users size.
Popularity
# of items
Popular items

8
A step-by-step plan
Top chart approach. Show the same most popular items to users (if applicable).
Personalize recommendations by incorporating popular items.
Apply content-based recommendations to include rare items.
Address the cold start problem.
Utilize a hybrid approach by combining multiple algorithms.

9
Side note: feature flags
Reduce risks
Improve flexibility
Enable experimentations
Essential building block for A/B tests
Self testing, colleagues, beta testers
Updated in runtime

Service layer /
Uploader
CDN
items meta db items blob
storage
users db
API Gateway
Initial service architecture

10
Feedback collection
It's essential to understand how users interact with items.
Usually, the pace of data collection is rapid.
Data includes implicit feedback (e.g., items listened to or viewed).
Data also includes explicit feedback (e.g., movie ratings or e-commerce item reviews).
Data is a primary source for most recommendation algorithms.
Consider implementing batching for data processing.
API: recordFeedback([(userId, itemId, feedback, ts)]) -> void

Service layer /
Uploader
CDN
API Gateway
Feedback
service
storage
users db
Service with feedback collection pipeline.
streaming
feedback db
Step 1.

11
Show most popular items
Create a background scheduled job to aggregate feedback for items.
Introduce a Recommender service and implement the API call.
This is a good first step for addressing the cold start problem.
The system might be further improved by employing user features (e.g., demographic
data).
Deploy the system and measure its performance through A/B testing.
Avoid feedback loop

Service layer /
Uploader
CDN
API Gateway
Feedback
service
storage
users db
Implement the first ETL job and setup a Recommender service
streaming
feedback db
Step 2.
Recommender
service
ETL jobs
recs db

Collaborative
filtering
12
Personalized recommendations
Deep learning
methods
Personalized
recommendations
Easy to implement
Low runtime overhead
Well-known
Allows side features
Provides flexibility
Might be hard to get right
Runtime overhead

13
Collaborative filtering example
5
4
4
5 2
–
–
–
– 3
5
4 3
1
–
5
Embeddings:
U_n*k - user matrix
V_m*k - items matrix
U(i) * V' = relevant items
for a user

14
Implement collaborative filtering
Having the feedback collection already in place we can analyze (user, item)
interactions in past history.
Given this pairs we want to compute embeddings for users and items.
Can be done using Matrix factorization.
Mostly works only for popular items and engaged users (remember the long tail chart).
Deploy and measure (A/B test).
Embeddings (same length "latent" vectors):
user_n = [0.12, 0.234, 0.34, ..., 0.893]
item_n = [0.843, 0.553, 0.123, ..., 0.23]

15
Collaborative filtering features
By using the embeddings it's possible to rank items for a given user.
Similarity measure (e.g. cosine) can be used to find similar items.
Often dataset is small to fit in memory.
Nightly updates work fine.
Many things to fine tune (algorithm, vector length). But possible to test offline.

Service layer /
Uploader
CDN
API Gateway
Feedback
service
storage
users db
Add a collaborative filtering job and serve results in runtime
streaming
feedback db
Step 3.
Recommender
service
ETL jobs
recs db
Load and serve
recommendations
from memory
Add collaborative
filtering job

16
Tackling long tail problem
Content-based algorithms are popular in this field.
It's possible to learn content features based on well-known collaborative filtering data.
For example, if a track sounds close to an another track their embeddings should be
similar.
We can produce embeddings based on content features even without users'
interaction.
Use a vector database to find a set of similar items for an item given.
Deploy and measure (A/B test).
Vector DB API: getSimilarItems(itemId, count) -> [ItemId]

Content-based idea (music example)
Track 1
Track 2
Embedding 1
17

Content-based approach
Add a vector db of embeddings.
Add a new pipeline to index new items.
In the runtime find similar items based on the latest users' feedback.
18

Hybrid recommender system
Depending on a situation it's possible to mix different algorithms in runtime.
Almost every item from the long tail might be considered. Might be risky, filters are
required.
Fallback strategy of cutting out some systems temporally.
19

Service layer /
Uploader
CDN
API Gateway
Feedback
service
data lake
Recommender
service
ETL jobs
Item content
embedder
storage
users db
Add content-based recommendations
streaming
Step 4.
recs db
uploading stream
items embeddings

Challenges
Many moving parts
Atomic updates
Data filtering
Business performance monitoring
ML-heavy problem
20

Wrap up
The approach scales well.
Can be accomplished by iterations.
It's crucial to continuously watch over the system.
Embeddings open endless customization options of a the recommendation engine.
21

Recommendation systems

More Related Content

Similar to Recommendation systems

Recently uploaded

Recommendation systems