SlideShare a Scribd company logo
1 of 58
Download to read offline
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
Powering Personalized
Binge-Watching
Recommendations: A Journey of
Realtime Multi-Interest Based
Retrieval
Jaya Kawale
Vice President of Engineering (Machine Learning), Tubi
© Tubi, proprietary and confidential
What to expect ?
● Part I: How ML can help streaming services like Tubi ?
● Part II: Case Study of Retrieval
2
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
Introduction
3
© Tubi, proprietary and confidential 4
Free streaming service - Watch free movie, tv, news & sports!
© Tubi, proprietary and confidential 5
Tubi
● More than 64 Million monthly
active users.
● Available across several
countries including US,
Canada, Mexico & LatAm.
● Most watched Free Ad
Supported Television (FAST)
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
How can Machine Learning help ?
6
Part I
7
Three Pillars
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
Recommendation
8
© Tubi, proprietary and confidential
Personalized Recommendations
Content
Ranking
Container
Ranking
Image Ranking
Search
Notifications
Container
Generation
Cold
starting
titles
© Tubi, proprietary and confidential
Personalized Recommendations
Content
Ranking
Container
Ranking
Image Ranking
Search
Notifications
Container
Generation
Cold
starting
titles
● 70+ models helping organize the
homepage!
● Rank content and containers based
on users’ features and past
interactions
● Ranking based on GBDT, Deep
Neural Network
● Retrieval based upon a lot of
Embeddings (e.g. two tower, NLP
embeddings, etc)
● Distilled models for new users
● Exploration strategies for new titles
© Tubi, proprietary and confidential
01
02
03
Offline vs Online
Feedback loops
Changing tastes and catalog
Why is it challenging ?
© Tubi, proprietary and confidential
01
02
03
Offline vs Online
Feedback loops
Changing tastes and catalog
Why is it challenging ?
Beyond Accuracy
at the Top!
© Tubi, proprietary and confidential
01 Offline vs Online
13
© Tubi, proprietary and confidential
Typical Metrics
14
Typical Offline
Metrics
Typical Online
Metrics
Ranking metrics:
NDCG, NMRR,
Precision @K
Streaming,
Retention
© Tubi, proprietary and confidential
Correlation vs Causation
Offline evaluation
● Use historical data
● Cheap, fast, risk free
● Correlation based
● Counterfactuality of rewards: Do not capture what would have happened if ?
Online evaluation
● Randomized experiment (A/B tests)
● Wait for days to compute the reward
● Reliable but expensive
15
© Tubi, proprietary and confidential
Dynamic Environment
● Recommender dynamics can affect the performance in ways not captured by
the offline metrics. E.g. impression caps.
● Recommendations can influence user preferences in ways not captured by
offline metrics. E.g. Did you watch a title because it was recommended ?
● User dynamics and confounding factors can influence the watch behavior in
ways not captured by offline metrics. E.g. Watching a title because it was
recommended by a friend.
16
© Tubi, proprietary and confidential
Counterfactual evaluation
● Estimate the potential outcome of a policy offline using logged data.
● Inverse Propensity Scoring (IPS): Importance weighting to account for the
mismatch in the distribution of logged data and the policy to evaluate.
● Several variants - CIPS, SNIPS, etc.
17
© Tubi, proprietary and confidential
02 Feedback loops
18
03 Changing User tastes & Catalog
© Tubi, proprietary and confidential
Feedback loops
● Different algorithms on the
homepage influencing one another
● Underlying data influencing the
algorithms
● Recommendations influencing the
watch behavior
● Watch behavior influences the
data.
19
Observational Data
© Tubi, proprietary and confidential
Feedback loops
● Typical offline training - clicks, watched, plays
● Implicit feedback has inherent biases
● Position/ recommendations influence the data collection
20
© Tubi, proprietary and confidential
Changing User tastes & Catalog
● Users adapt and change their preferences over time.
● Also, new titles come into the system whereas some others leave the service.
● Uncertainty around new users and titles
● Trends outside influence the watch behavior.
21
© Tubi, proprietary and confidential
● Tradeoff: Explore unknown choices to gather
information vs exploit known preferences.
● Exploration helps break feedback loops and
helps with uncertainty around new items/
users.
● Caveat: Designing good exploration that
works in practice is hard due to
non-stationarity of the data and large
dynamic action spaces. Reward is myopic
● RL: Optimize long term
Exploration and bandits
22
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
Content Understanding
23
ML for Content
● Content understanding helps
understand the rich metadata
Helps us improve
● Recommendations
● Content acquisition decisions
● Cold starting of titles
● Container Genesis
● Image Ranking
● …
24
Plot Synopsis
Cast
Genre
Box office
Ratings
Posters/ images
Language
Video trailers
© Tubi, proprietary and confidential 25
Content Understanding
Easy
Hard
Keyword Search
Review/ Sentiment Classification
Topic Extraction
Embedding Generation
Natural Language Understanding
Video Understanding
Multi-modal data Understanding
(e.g. Text + Images)
© Tubi, proprietary and confidential 26
Spock Platform
● Platform for data ingestion, preprocessing and cleaning.
● Generates a variety of embeddings powering the different use cases across
the product.
● Helps assess embeddings quality via surrogate tasks.
1st & 3rd
Party Data
Audience
Assessment
Viewer-oriented data
Title-oriented
data
Products
Models
Embeddings (CTXT, MD, MMD,
Genre, Demos, Actor, et al)
Universe of Content + Metadata
Use Cases
Beam from
Universe to
Tubiverse
Cold➔
Warm➔Hot
Starting
Content Value
Assessment
Tiering
Inventory in
Tubiverse
Augmented
Search
Seeding
Growth
Coordinated
Pursuit of New
Audience
Portfolio
Analysis /
Simulation
Spock Platform
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
ML for Ads
28
Overview of ML for AdTech
29
Audience Segments: Leverage
data to generate Audience segments
for targeting Ad break finder: Detect where to place an Ad break
in a video using Computer Vision
Time series forecasting: Forecast Ad Opportunities
Ad Understanding: Understand what an ad is about.
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
The Journey of Retrieval
30
Part II
© Tubi, proprietary and confidential
Retrieval
● Retrieval helps reduce the
candidate space to a much
smaller number.
● Typically lightweight methods
to prune candidates.
● Smaller candidates ->
Latency room for a
complicated ranker
31
Retrieval:
Reduces the
candidate
space to
hundreds
Ranker:
Ranks
hundreds
of content
HomePage
© Tubi, proprietary and confidential
How it started ?
● Catalog was small. DAU was small.
● Ranking entire catalog for all users
possible.
● Offline Batch Based Jobs - Publish
Ranking Daily for all users. No real
time inference support needed.
● Issues: Daily ingest jobs. Compute
& storage cost.
© Tubi, proprietary and confidential
As time goes by..
● Tubi starts becoming more popular.
Catalog grows.
● Ranking large catalog for all users
daily became compute intensive.
● Limit the number of candidates
ranked per device to save the daily
ingestion costs (say 200).
● Ingestion cost reduces but entire
page is not personalized.
© Tubi, proprietary and confidential
Fast forward Ranking …
● We moved Rankers to real time
inference.
● Got rid of daily ingest jobs per user.
Huge savings in compute and storage.
● Also gives us room to personalize the
entire homepage.
© Tubi, proprietary and confidential
Retrieval Gen 1.0
● Reduce the candidates for ranking and storing.
● Start with popularity based measures. No need to rank larger catalog.
● Simple measures: Popular in Country, Language, Genre, Externally, etc.
● Issues: Unpersonalized recall. Reinforces popularity bias.
© Tubi, proprietary and confidential
Personalization is the key
● Idea: Start with collaborative filtering.
Use the “wisdom of the crowd”.
● Matrix Factorization: Factorize the
User-Item interaction matrix into low
rank matrices.
● Use the score of MF as first level
pruning.
● Issues: Cold start user/ item
1 x x 1
1 x 1 x
x x 1 1
1 x x 1
Movie
User
© Tubi, proprietary and confidential
Item Embeddings
● Problem: Subsampling of users for training results in a poor user vector
representation. User vector vector also very large.
● Idea: Can we use item vector only ?
● Approximate User representation by watch history. Take the nearest
neighbors in the item space wrt the watch history.
● Retrieval candidates are nearest neighbors of watch history.
© Tubi, proprietary and confidential
Moar Embeddings!
● Lot of additional metadata
associated with a title.
● Abundance of natural language
text.
● Use deep learning/ NLP to
generate more content
embeddings.
Additional Metadata associated with a title
© Tubi, proprietary and confidential 39
Example search: “Kids Horror”
Why is NLP hard ?
Ambiguity in representation and learning!
Not looking for titles to make
kids horrified.
© Tubi, proprietary and confidential 40
Word 2 Vec
● Use the similarity of word vectors to calculate the probability of the outside
context words given the centre word (or vice versa).
● Keep adjusting the word vector to maximize the probability.
*Richard Socher, Stanford NLP course
© Tubi, proprietary and confidential 41
Doc 2 Vec
● Create a numeric representation
of a document instead of a word.
● Add paragraph id to the context
for a word.
© Tubi, proprietary and confidential 42
Embeddings fun
© Tubi, proprietary and confidential 43
Embeddings fun
© Tubi, proprietary and confidential 44
Embeddings fun
© Tubi, proprietary and confidential 45
Embeddings fun
© Tubi, proprietary and confidential
Transformers, Language Models n all
● Transformers are ruling the world.
● BERT widely used. LLMs are on
everyone's mind.
● Pre-training vs Fine tuning.
● And the latest prompt engineering…
Pre-trained similarities not enough, fine tune for a
specific task.
© Tubi, proprietary and confidential
Gen 2.0 Interaction Based Model - Two Tower
● Two Tower Model: User & Content
tower
● User features: e.g. watch history,
tenure, etc.
● Content features: e.g. genre, tags, etc.
● Final score determines user’s affinity for
a title. Use that for pruning candidates.
© Tubi, proprietary and confidential
Recap: How to Generate Retrieval Candidates ?
● Variety of Content Embeddings: Interaction based, Language based, etc.
● For each of the embeddings, generate a User representation.
● Get the nearest neighbors for the user.
● Key: What could help build a user representation ? Watch History!
© Tubi, proprietary and confidential
Embeddings Based Retrieval
Design Choice 1: Generate Average
Embedding Vector given the watch
history and then compute the Nearest
Neighbors ?
Pros: Single representation for a user
Cons: Averaging loses information. E.g.
a Horror & a Comedy title averaged
together.
A B C D
Watch History
Average Embedding
User Representation
© Tubi, proprietary and confidential
Embeddings Based Retrieval
Design Choice 2: Generate Nearest
Neighbors Per Watch History ?
Pros: Horror and Comedy titles not
averaged together.
Cons: A lot of Nearest Neighbors to
compute. Daily ingest jobs took
tremendous compute and storage.
A B C D
Watch History
E, F, G E, P, Q R, A, P X, Y, B
E, F, G, P, Q, R, X, Y
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● User’s watch behavior shows
patterns of clustering
● Depending on the context,
particular titles should be shown.
For e.g. news in the morning, horror
in the evening.
● Key Idea: User embedding should
capture multi-modal interests.
Cluster 1: Romance
Cluster 2: Horror
Cluster 3: Action
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● Design Choice 3: Medoid Based
Representation of User. [Pal et al, KDD
2020, PinnerSage: Multi-Modal User Embedding
Framework]
● Medoids to represent cluster
centres. Reason: Cheaper! Just Ids
as compared to embedding vectors.
Cluster 1
Cluster 2
Cluster 3
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● Hierarchical clustering of the user
watch history. Important as
compared to fixed k clusters.
● Huge reduction in daily ingestion
jobs - only store mediods & NNs for
mediods.
Hierarchical Clustering
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● Design Choice 4: Real time!
● Can we move the NN computation online ? Approximate them ?
● FAISS: ANN based RT inference. Get mediods for each user, compute the
ANN online.
● Only medoids need to be stored offline! More savings in compute & storage.
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● Design Choice 5: Context Based
Exploration and Sampling
● Cluster Importance: Assign
importance based upon size of the
cluster, recency of the watched
content, time of watch, etc.
● Sample based upon importance.
© Tubi, proprietary and confidential
Embeddings Based Retrieval
● Design Choice 6: Bring it on!
● Additional signals, Adaptive Clusters, RT clustering, Better handling of
multiple embeddings, Incremental updates
● Sequence prediction: Use transformers to learn what to pay attention to.
© Tubi, proprietary and confidential
Conclusions
● Retrieval is an important area that helps surface relevant content to the
users.
● User interests are multi-modal.
● The road ahead is very promising and exciting.
© Tubi, proprietary and confidential 58
Thank You!
We are hiring!
Email: jkawale@tubi.tv
Twitter: @jayakawale

More Related Content

What's hot

Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixJiangwei Pan
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
The Netflix Marketing Plan Power Point
The Netflix Marketing Plan Power PointThe Netflix Marketing Plan Power Point
The Netflix Marketing Plan Power PointShawn McNail
 
Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Launching Netflix in India
Launching Netflix in IndiaLaunching Netflix in India
Launching Netflix in IndiaKshitij Sheth
 

What's hot (20)

Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
The Netflix Marketing Plan Power Point
The Netflix Marketing Plan Power PointThe Netflix Marketing Plan Power Point
The Netflix Marketing Plan Power Point
 
Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Launching Netflix in India
Launching Netflix in IndiaLaunching Netflix in India
Launching Netflix in India
 
Netflix
NetflixNetflix
Netflix
 

Similar to Jaya WWW talk 2023.pdf

A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfVWO
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...IT Arena
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a ServiceKamil Sindi
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startupRakuten Group, Inc.
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Using innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsUsing innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsCito
 
The Sprint Method: Case Studies of Implementation in a Corporate Environment
The Sprint Method: Case Studies of Implementation in a Corporate EnvironmentThe Sprint Method: Case Studies of Implementation in a Corporate Environment
The Sprint Method: Case Studies of Implementation in a Corporate EnvironmentUXPA International
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsBoost Labs
 
LiveWire Credentials 2010
LiveWire Credentials 2010LiveWire Credentials 2010
LiveWire Credentials 2010esnayd
 
3 Challenges of Building Complex Dashboards with Open Source Components
3 Challenges of Building Complex Dashboards with Open Source Components3 Challenges of Building Complex Dashboards with Open Source Components
3 Challenges of Building Complex Dashboards with Open Source ComponentsRyan MacCarrigan
 
YouTubeVideoCatagorization
YouTubeVideoCatagorizationYouTubeVideoCatagorization
YouTubeVideoCatagorizationUrjit Patel
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAre we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsMargot
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet?  Rev up your productivity with project management toolsAre we there yet?  Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAnnis Lee Adams
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 

Similar to Jaya WWW talk 2023.pdf (20)

A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdf
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a Service
 
youtube.docx
youtube.docxyoutube.docx
youtube.docx
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startup
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
The #NoEstimates Debate
The #NoEstimates DebateThe #NoEstimates Debate
The #NoEstimates Debate
 
Using innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsUsing innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational exams
 
The Sprint Method: Case Studies of Implementation in a Corporate Environment
The Sprint Method: Case Studies of Implementation in a Corporate EnvironmentThe Sprint Method: Case Studies of Implementation in a Corporate Environment
The Sprint Method: Case Studies of Implementation in a Corporate Environment
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
UX research
UX researchUX research
UX research
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost Labs
 
LiveWire Credentials 2010
LiveWire Credentials 2010LiveWire Credentials 2010
LiveWire Credentials 2010
 
3 Challenges of Building Complex Dashboards with Open Source Components
3 Challenges of Building Complex Dashboards with Open Source Components3 Challenges of Building Complex Dashboards with Open Source Components
3 Challenges of Building Complex Dashboards with Open Source Components
 
YouTubeVideoCatagorization
YouTubeVideoCatagorizationYouTubeVideoCatagorization
YouTubeVideoCatagorization
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management toolsAre we there yet? Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management tools
 
Are we there yet? Rev up your productivity with project management tools
Are we there yet?  Rev up your productivity with project management toolsAre we there yet?  Rev up your productivity with project management tools
Are we there yet? Rev up your productivity with project management tools
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Jaya WWW talk 2023.pdf

  • 1. © Tubi, proprietary and confidential © Tubi, proprietary and confidential Powering Personalized Binge-Watching Recommendations: A Journey of Realtime Multi-Interest Based Retrieval Jaya Kawale Vice President of Engineering (Machine Learning), Tubi
  • 2. © Tubi, proprietary and confidential What to expect ? ● Part I: How ML can help streaming services like Tubi ? ● Part II: Case Study of Retrieval 2
  • 3. © Tubi, proprietary and confidential © Tubi, proprietary and confidential Introduction 3
  • 4. © Tubi, proprietary and confidential 4 Free streaming service - Watch free movie, tv, news & sports!
  • 5. © Tubi, proprietary and confidential 5 Tubi ● More than 64 Million monthly active users. ● Available across several countries including US, Canada, Mexico & LatAm. ● Most watched Free Ad Supported Television (FAST)
  • 6. © Tubi, proprietary and confidential © Tubi, proprietary and confidential How can Machine Learning help ? 6 Part I
  • 8. © Tubi, proprietary and confidential © Tubi, proprietary and confidential Recommendation 8
  • 9. © Tubi, proprietary and confidential Personalized Recommendations Content Ranking Container Ranking Image Ranking Search Notifications Container Generation Cold starting titles
  • 10. © Tubi, proprietary and confidential Personalized Recommendations Content Ranking Container Ranking Image Ranking Search Notifications Container Generation Cold starting titles ● 70+ models helping organize the homepage! ● Rank content and containers based on users’ features and past interactions ● Ranking based on GBDT, Deep Neural Network ● Retrieval based upon a lot of Embeddings (e.g. two tower, NLP embeddings, etc) ● Distilled models for new users ● Exploration strategies for new titles
  • 11. © Tubi, proprietary and confidential 01 02 03 Offline vs Online Feedback loops Changing tastes and catalog Why is it challenging ?
  • 12. © Tubi, proprietary and confidential 01 02 03 Offline vs Online Feedback loops Changing tastes and catalog Why is it challenging ? Beyond Accuracy at the Top!
  • 13. © Tubi, proprietary and confidential 01 Offline vs Online 13
  • 14. © Tubi, proprietary and confidential Typical Metrics 14 Typical Offline Metrics Typical Online Metrics Ranking metrics: NDCG, NMRR, Precision @K Streaming, Retention
  • 15. © Tubi, proprietary and confidential Correlation vs Causation Offline evaluation ● Use historical data ● Cheap, fast, risk free ● Correlation based ● Counterfactuality of rewards: Do not capture what would have happened if ? Online evaluation ● Randomized experiment (A/B tests) ● Wait for days to compute the reward ● Reliable but expensive 15
  • 16. © Tubi, proprietary and confidential Dynamic Environment ● Recommender dynamics can affect the performance in ways not captured by the offline metrics. E.g. impression caps. ● Recommendations can influence user preferences in ways not captured by offline metrics. E.g. Did you watch a title because it was recommended ? ● User dynamics and confounding factors can influence the watch behavior in ways not captured by offline metrics. E.g. Watching a title because it was recommended by a friend. 16
  • 17. © Tubi, proprietary and confidential Counterfactual evaluation ● Estimate the potential outcome of a policy offline using logged data. ● Inverse Propensity Scoring (IPS): Importance weighting to account for the mismatch in the distribution of logged data and the policy to evaluate. ● Several variants - CIPS, SNIPS, etc. 17
  • 18. © Tubi, proprietary and confidential 02 Feedback loops 18 03 Changing User tastes & Catalog
  • 19. © Tubi, proprietary and confidential Feedback loops ● Different algorithms on the homepage influencing one another ● Underlying data influencing the algorithms ● Recommendations influencing the watch behavior ● Watch behavior influences the data. 19 Observational Data
  • 20. © Tubi, proprietary and confidential Feedback loops ● Typical offline training - clicks, watched, plays ● Implicit feedback has inherent biases ● Position/ recommendations influence the data collection 20
  • 21. © Tubi, proprietary and confidential Changing User tastes & Catalog ● Users adapt and change their preferences over time. ● Also, new titles come into the system whereas some others leave the service. ● Uncertainty around new users and titles ● Trends outside influence the watch behavior. 21
  • 22. © Tubi, proprietary and confidential ● Tradeoff: Explore unknown choices to gather information vs exploit known preferences. ● Exploration helps break feedback loops and helps with uncertainty around new items/ users. ● Caveat: Designing good exploration that works in practice is hard due to non-stationarity of the data and large dynamic action spaces. Reward is myopic ● RL: Optimize long term Exploration and bandits 22
  • 23. © Tubi, proprietary and confidential © Tubi, proprietary and confidential Content Understanding 23
  • 24. ML for Content ● Content understanding helps understand the rich metadata Helps us improve ● Recommendations ● Content acquisition decisions ● Cold starting of titles ● Container Genesis ● Image Ranking ● … 24 Plot Synopsis Cast Genre Box office Ratings Posters/ images Language Video trailers
  • 25. © Tubi, proprietary and confidential 25 Content Understanding Easy Hard Keyword Search Review/ Sentiment Classification Topic Extraction Embedding Generation Natural Language Understanding Video Understanding Multi-modal data Understanding (e.g. Text + Images)
  • 26. © Tubi, proprietary and confidential 26 Spock Platform ● Platform for data ingestion, preprocessing and cleaning. ● Generates a variety of embeddings powering the different use cases across the product. ● Helps assess embeddings quality via surrogate tasks.
  • 27. 1st & 3rd Party Data Audience Assessment Viewer-oriented data Title-oriented data Products Models Embeddings (CTXT, MD, MMD, Genre, Demos, Actor, et al) Universe of Content + Metadata Use Cases Beam from Universe to Tubiverse Cold➔ Warm➔Hot Starting Content Value Assessment Tiering Inventory in Tubiverse Augmented Search Seeding Growth Coordinated Pursuit of New Audience Portfolio Analysis / Simulation Spock Platform
  • 28. © Tubi, proprietary and confidential © Tubi, proprietary and confidential ML for Ads 28
  • 29. Overview of ML for AdTech 29 Audience Segments: Leverage data to generate Audience segments for targeting Ad break finder: Detect where to place an Ad break in a video using Computer Vision Time series forecasting: Forecast Ad Opportunities Ad Understanding: Understand what an ad is about.
  • 30. © Tubi, proprietary and confidential © Tubi, proprietary and confidential The Journey of Retrieval 30 Part II
  • 31. © Tubi, proprietary and confidential Retrieval ● Retrieval helps reduce the candidate space to a much smaller number. ● Typically lightweight methods to prune candidates. ● Smaller candidates -> Latency room for a complicated ranker 31 Retrieval: Reduces the candidate space to hundreds Ranker: Ranks hundreds of content HomePage
  • 32. © Tubi, proprietary and confidential How it started ? ● Catalog was small. DAU was small. ● Ranking entire catalog for all users possible. ● Offline Batch Based Jobs - Publish Ranking Daily for all users. No real time inference support needed. ● Issues: Daily ingest jobs. Compute & storage cost.
  • 33. © Tubi, proprietary and confidential As time goes by.. ● Tubi starts becoming more popular. Catalog grows. ● Ranking large catalog for all users daily became compute intensive. ● Limit the number of candidates ranked per device to save the daily ingestion costs (say 200). ● Ingestion cost reduces but entire page is not personalized.
  • 34. © Tubi, proprietary and confidential Fast forward Ranking … ● We moved Rankers to real time inference. ● Got rid of daily ingest jobs per user. Huge savings in compute and storage. ● Also gives us room to personalize the entire homepage.
  • 35. © Tubi, proprietary and confidential Retrieval Gen 1.0 ● Reduce the candidates for ranking and storing. ● Start with popularity based measures. No need to rank larger catalog. ● Simple measures: Popular in Country, Language, Genre, Externally, etc. ● Issues: Unpersonalized recall. Reinforces popularity bias.
  • 36. © Tubi, proprietary and confidential Personalization is the key ● Idea: Start with collaborative filtering. Use the “wisdom of the crowd”. ● Matrix Factorization: Factorize the User-Item interaction matrix into low rank matrices. ● Use the score of MF as first level pruning. ● Issues: Cold start user/ item 1 x x 1 1 x 1 x x x 1 1 1 x x 1 Movie User
  • 37. © Tubi, proprietary and confidential Item Embeddings ● Problem: Subsampling of users for training results in a poor user vector representation. User vector vector also very large. ● Idea: Can we use item vector only ? ● Approximate User representation by watch history. Take the nearest neighbors in the item space wrt the watch history. ● Retrieval candidates are nearest neighbors of watch history.
  • 38. © Tubi, proprietary and confidential Moar Embeddings! ● Lot of additional metadata associated with a title. ● Abundance of natural language text. ● Use deep learning/ NLP to generate more content embeddings. Additional Metadata associated with a title
  • 39. © Tubi, proprietary and confidential 39 Example search: “Kids Horror” Why is NLP hard ? Ambiguity in representation and learning! Not looking for titles to make kids horrified.
  • 40. © Tubi, proprietary and confidential 40 Word 2 Vec ● Use the similarity of word vectors to calculate the probability of the outside context words given the centre word (or vice versa). ● Keep adjusting the word vector to maximize the probability. *Richard Socher, Stanford NLP course
  • 41. © Tubi, proprietary and confidential 41 Doc 2 Vec ● Create a numeric representation of a document instead of a word. ● Add paragraph id to the context for a word.
  • 42. © Tubi, proprietary and confidential 42 Embeddings fun
  • 43. © Tubi, proprietary and confidential 43 Embeddings fun
  • 44. © Tubi, proprietary and confidential 44 Embeddings fun
  • 45. © Tubi, proprietary and confidential 45 Embeddings fun
  • 46. © Tubi, proprietary and confidential Transformers, Language Models n all ● Transformers are ruling the world. ● BERT widely used. LLMs are on everyone's mind. ● Pre-training vs Fine tuning. ● And the latest prompt engineering… Pre-trained similarities not enough, fine tune for a specific task.
  • 47. © Tubi, proprietary and confidential Gen 2.0 Interaction Based Model - Two Tower ● Two Tower Model: User & Content tower ● User features: e.g. watch history, tenure, etc. ● Content features: e.g. genre, tags, etc. ● Final score determines user’s affinity for a title. Use that for pruning candidates.
  • 48. © Tubi, proprietary and confidential Recap: How to Generate Retrieval Candidates ? ● Variety of Content Embeddings: Interaction based, Language based, etc. ● For each of the embeddings, generate a User representation. ● Get the nearest neighbors for the user. ● Key: What could help build a user representation ? Watch History!
  • 49. © Tubi, proprietary and confidential Embeddings Based Retrieval Design Choice 1: Generate Average Embedding Vector given the watch history and then compute the Nearest Neighbors ? Pros: Single representation for a user Cons: Averaging loses information. E.g. a Horror & a Comedy title averaged together. A B C D Watch History Average Embedding User Representation
  • 50. © Tubi, proprietary and confidential Embeddings Based Retrieval Design Choice 2: Generate Nearest Neighbors Per Watch History ? Pros: Horror and Comedy titles not averaged together. Cons: A lot of Nearest Neighbors to compute. Daily ingest jobs took tremendous compute and storage. A B C D Watch History E, F, G E, P, Q R, A, P X, Y, B E, F, G, P, Q, R, X, Y
  • 51. © Tubi, proprietary and confidential Embeddings Based Retrieval ● User’s watch behavior shows patterns of clustering ● Depending on the context, particular titles should be shown. For e.g. news in the morning, horror in the evening. ● Key Idea: User embedding should capture multi-modal interests. Cluster 1: Romance Cluster 2: Horror Cluster 3: Action
  • 52. © Tubi, proprietary and confidential Embeddings Based Retrieval ● Design Choice 3: Medoid Based Representation of User. [Pal et al, KDD 2020, PinnerSage: Multi-Modal User Embedding Framework] ● Medoids to represent cluster centres. Reason: Cheaper! Just Ids as compared to embedding vectors. Cluster 1 Cluster 2 Cluster 3
  • 53. © Tubi, proprietary and confidential Embeddings Based Retrieval ● Hierarchical clustering of the user watch history. Important as compared to fixed k clusters. ● Huge reduction in daily ingestion jobs - only store mediods & NNs for mediods. Hierarchical Clustering
  • 54. © Tubi, proprietary and confidential Embeddings Based Retrieval ● Design Choice 4: Real time! ● Can we move the NN computation online ? Approximate them ? ● FAISS: ANN based RT inference. Get mediods for each user, compute the ANN online. ● Only medoids need to be stored offline! More savings in compute & storage.
  • 55. © Tubi, proprietary and confidential Embeddings Based Retrieval ● Design Choice 5: Context Based Exploration and Sampling ● Cluster Importance: Assign importance based upon size of the cluster, recency of the watched content, time of watch, etc. ● Sample based upon importance.
  • 56. © Tubi, proprietary and confidential Embeddings Based Retrieval ● Design Choice 6: Bring it on! ● Additional signals, Adaptive Clusters, RT clustering, Better handling of multiple embeddings, Incremental updates ● Sequence prediction: Use transformers to learn what to pay attention to.
  • 57. © Tubi, proprietary and confidential Conclusions ● Retrieval is an important area that helps surface relevant content to the users. ● User interests are multi-modal. ● The road ahead is very promising and exciting.
  • 58. © Tubi, proprietary and confidential 58 Thank You! We are hiring! Email: jkawale@tubi.tv Twitter: @jayakawale