SlideShare a Scribd company logo
1 of 33
Download to read offline
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
Machine Learning at Tubi:
Powering Free Movies, TV
and News for All
Jaya Kawale
Vice President of Engineering (Machine Learning), Tubi
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
01 Introduction
2
© Tubi, proprietary and confidential
Tubi
● Advertiser Based Video On Demand
service (AVOD)
● Watch free movie, tv, news & sports
● More than 51 Million monthly active
users
● Available across several countries
including US, Canada, Mexico &
LatAm.
3
© Tubi, proprietary and confidential 4
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
02 Content Understanding
5
© Tubi, proprietary and confidential
ML for Content
● Content understanding helps
understand the rich metadata
Helps us improve
● Recommendations
● Content acquisition decisions
● Cold starting of titles
● Image ranking
● Content categorization
● …
6
Plot Synopsis
Cast
Genre
Box office
Ratings
Posters/ images
Language
Video trailers
© Tubi, proprietary and confidential 7
Content Understanding
Easy
Hard
Keyword Search
Review/ Sentiment Classification
Topic Extraction
Embedding Generation
Natural Language Understanding
Video Understanding
Multi-modal data Understanding
(e.g. Text + Images)
© Tubi, proprietary and confidential 8
Lessons Learned
● Pre-processing and cleaning up is very critical.
● Different tasks require different texts. E.g. sentiment analysis vs
summarization.
● Not all the text is the same. E.g. Reviews vs Subtitles vs Synopsis.
© Tubi, proprietary and confidential 9
Lessons Learned
● Be careful with the choice of algorithms. E.g. BERT more suitable for next
sentence prediction. “No free lunch” in terms of algorithm & representation.
● Averaging widely used but can lose information. E.g. multiple reviews for a title
averaged together to generate a title embedding. Transformer based fusion.
● Evaluation is hard but critical. E.g. embedding quality assessment on
surrogate tasks.
© Tubi, proprietary and confidential 10
Spock Platform
● Platform for data ingestion, preprocessing and cleaning.
● Generates a variety of embeddings powering the different use cases across
the product.
● Helps assess embeddings quality via surrogate tasks.
1st & 3rd Party
Data
Audience
Assessment
Viewer-oriented data
Title-oriented
data
Products
Models
Embeddings (CTXT, MD, MMD,
Genre, Demos, Actor, et al)
Universe of Content + Metadata
Use Cases
Beam from
Universe to
Tubiverse
Cold➔
Warm➔Hot
Starting
Content Value
Assessment
Tiering
Inventory in
Tubiverse
Augmented
Search
Seeding
Growth
Coordinated
Pursuit of New
Audience
Portfolio
Analysis /
Simulation
Spock Platform
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
03 Recommendation
12
© Tubi, proprietary and confidential
Personalized Recommendations
Content Ranking
Container
Ranking
Image Ranking
Search
Notifications
Container
Generation
Cold starting
titles
© Tubi, proprietary and confidential
Personalized Recommendations
Content Ranking
Container
Ranking
Image Ranking
Search
Notifications
Container
Generation
Cold starting
titles
● 70+ models helping organize the
homepage!
● Rank content and containers based on
users’ features and past interactions
● Ranking based on GBDT, Deep Neural
Network
● Retrieval based upon a lot of
Embeddings (e.g. two tower, BERT, etc)
● Distilled models for new users
● Exploration strategies for new titles
© Tubi, proprietary and confidential
01
02
03
04
Offline vs Online
Feedback loops
Changing tastes and catalog
Algorithmic fairness and biases
Why is it challenging ?
© Tubi, proprietary and confidential
01
02
03
04
Offline vs Online
Feedback loops
Changing tastes and catalog
Algorithmic fairness and biases
Why is it challenging ?
Beyond Accuracy
at the Top!
© Tubi, proprietary and confidential
01 Offline vs Online
17
© Tubi, proprietary and confidential
Typical Metrics
18
Typical Offline
Metrics
Typical Online
Metrics
Ranking metrics:
NDCG, NMRR,
Precision @K
Streaming,
Retention
© Tubi, proprietary and confidential
Correlation vs Causation
Offline evaluation
● Use historical data
● Cheap, fast, risk free
● Correlation based
● Counterfactuality of rewards: Do not capture what would have happened if ?
Online evaluation
● Randomized experiment (A/B tests)
● Wait for days to compute the reward
● Reliable but expensive
19
© Tubi, proprietary and confidential
Dynamic Environment
● Recommender dynamics can affect the performance in ways not captured by
the offline metrics. E.g. impression caps.
● Recommendations can influence user preferences in ways not captured by
offline metrics. E.g. Did you watch a title because it was recommended ?
● User dynamics and confounding factors can influence the watch behavior in
ways not captured by offline metrics. E.g. Watching a title because it was
recommended by a friend.
20
© Tubi, proprietary and confidential
Counterfactual evaluation
● Estimate the potential outcome of a policy offline using logged data.
● Inverse Propensity Scoring (IPS): Importance weighting to account for the
mismatch in the distribution of logged data and the policy to evaluate.
● Several variants - CIPS, SNIPS, etc.
21
© Tubi, proprietary and confidential
02 Feedback loops
22
03 Changing User tastes & Catalog
© Tubi, proprietary and confidential
Feedback loops
● Different algorithms on the
homepage influencing one another
● Underlying data influencing the
algorithms
● Recommendations influencing the
watch behavior
● Watch behavior influences the
data.
23
Observational Data
© Tubi, proprietary and confidential
Feedback loops
● Typical offline training - clicks, watched, plays
● Implicit feedback has inherent biases
● Position/ recommendations influence the data collection
24
© Tubi, proprietary and confidential
Changing User tastes & Catalog
● Users adapt and change their preferences over time.
● Also, new titles come into the system whereas some others leave the service.
● Uncertainty around new users and titles
● Trends outside influence the watch behavior.
25
© Tubi, proprietary and confidential
● Tradeoff: Explore unknown choices to gather
information vs exploit known preferences.
● Exploration helps break feedback loops and
helps with uncertainty around new items/
users.
● Caveat: Designing good exploration that works
in practice is hard due to non-stationarity of
the data and large dynamic action spaces.
Reward is myopic
● RL: Optimize long term
Exploration and bandits
26
© Tubi, proprietary and confidential
04 Algorithmic fairness and biases
27
© Tubi, proprietary and confidential
Ranking systems
● Ranking function: Rank items
for a given context →
Learning to Rank
● Slate recommendations: slate
of recommended titles
● Are traditional methods fair ?
28
© Tubi, proprietary and confidential
Fairness
● Many aspects of discrimination relate to correlation vs causation.
● Gender and race may be correlated with factors that shift the distributions.
● Reasons of bias - skewed examples, tainted examples, limited features,
sample size disparity.
29
© Tubi, proprietary and confidential
Fairness
● Pre-processing: Distributions of specific
sensitive or protected variables are biased.
Train a model on a “repaired” data.
● In-processing: Modeling techniques often
become biased by dominant features, other
distributional effects. Incorporate fairness
metrics into the model optimization
● Post-processing: Apply transformations to
improve predictions fairness.
30
Caton & Hass: Fairness in ML survey, 2020
© Tubi, proprietary and confidential
© Tubi, proprietary and confidential
04 Conclusion
31
© Tubi, proprietary and confidential
Conclusion & Acknowledgements
● Lots of challenging problems in content understanding users and
recommendations.
● The road ahead is very exciting and we are hiring!
● All the work is done by my awesome Machine Learning team @Tubi.
32
© Tubi, proprietary and confidential
Thank you.
© Tubi, proprietary and confidential
Jaya Kawale
Email: jkawale@tubi.tv
Twitter: jayakawale

More Related Content

What's hot

Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsRoelof van Zwol
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityStitch Fix Algorithms
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at ScaleBlake Irvine
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel Sudeep Das, Ph.D.
 

What's hot (20)

Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at Scale
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel
 

Similar to Machine Learning At Tubi

Jaya WWW talk 2023.pdf
Jaya WWW talk 2023.pdfJaya WWW talk 2023.pdf
Jaya WWW talk 2023.pdfJaya Kawale
 
A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfVWO
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...IT Arena
 
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...Neo Group Inc
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsBoost Labs
 
LiveWire Credentials 2010
LiveWire Credentials 2010LiveWire Credentials 2010
LiveWire Credentials 2010esnayd
 
De-mystifying and Taming the Complexities of WCAG 2.1
De-mystifying and Taming the Complexities of WCAG 2.1De-mystifying and Taming the Complexities of WCAG 2.1
De-mystifying and Taming the Complexities of WCAG 2.1Bill Tyler
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a ServiceKamil Sindi
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startupRakuten Group, Inc.
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Road-mapping for innovations
Road-mapping for innovationsRoad-mapping for innovations
Road-mapping for innovationsTrong Tan Ho
 
Edi road mapping for innovations
Edi road mapping for innovationsEdi road mapping for innovations
Edi road mapping for innovationsTrong Tan Ho
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesDavid Zibriczky
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Accelerating Product Data Programs with Pre-PIM Software
Accelerating Product Data Programs with Pre-PIM SoftwareAccelerating Product Data Programs with Pre-PIM Software
Accelerating Product Data Programs with Pre-PIM SoftwareEarley Information Science
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DayZivaro Inc
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Emtec Inc.
 
Data Governance and Embarcadero ER/Studio XE3
Data Governance and Embarcadero ER/Studio XE3Data Governance and Embarcadero ER/Studio XE3
Data Governance and Embarcadero ER/Studio XE3BTGrubu
 

Similar to Machine Learning At Tubi (20)

Jaya WWW talk 2023.pdf
Jaya WWW talk 2023.pdfJaya WWW talk 2023.pdf
Jaya WWW talk 2023.pdf
 
A_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdfA_B Testing Personalized Meditation Recommendations.pdf
A_B Testing Personalized Meditation Recommendations.pdf
 
youtube.docx
youtube.docxyoutube.docx
youtube.docx
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
 
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...
Optimizing Your Outsourcing Portfolio – Deciding What to Source: Core vs. Con...
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost Labs
 
LiveWire Credentials 2010
LiveWire Credentials 2010LiveWire Credentials 2010
LiveWire Credentials 2010
 
De-mystifying and Taming the Complexities of WCAG 2.1
De-mystifying and Taming the Complexities of WCAG 2.1De-mystifying and Taming the Complexities of WCAG 2.1
De-mystifying and Taming the Complexities of WCAG 2.1
 
UX research
UX researchUX research
UX research
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a Service
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startup
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Road-mapping for innovations
Road-mapping for innovationsRoad-mapping for innovations
Road-mapping for innovations
 
Edi road mapping for innovations
Edi road mapping for innovationsEdi road mapping for innovations
Edi road mapping for innovations
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Accelerating Product Data Programs with Pre-PIM Software
Accelerating Product Data Programs with Pre-PIM SoftwareAccelerating Product Data Programs with Pre-PIM Software
Accelerating Product Data Programs with Pre-PIM Software
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
 
Data Governance and Embarcadero ER/Studio XE3
Data Governance and Embarcadero ER/Studio XE3Data Governance and Embarcadero ER/Studio XE3
Data Governance and Embarcadero ER/Studio XE3
 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Machine Learning At Tubi

  • 1. © Tubi, proprietary and confidential © Tubi, proprietary and confidential Machine Learning at Tubi: Powering Free Movies, TV and News for All Jaya Kawale Vice President of Engineering (Machine Learning), Tubi
  • 2. © Tubi, proprietary and confidential © Tubi, proprietary and confidential 01 Introduction 2
  • 3. © Tubi, proprietary and confidential Tubi ● Advertiser Based Video On Demand service (AVOD) ● Watch free movie, tv, news & sports ● More than 51 Million monthly active users ● Available across several countries including US, Canada, Mexico & LatAm. 3
  • 4. © Tubi, proprietary and confidential 4
  • 5. © Tubi, proprietary and confidential © Tubi, proprietary and confidential 02 Content Understanding 5
  • 6. © Tubi, proprietary and confidential ML for Content ● Content understanding helps understand the rich metadata Helps us improve ● Recommendations ● Content acquisition decisions ● Cold starting of titles ● Image ranking ● Content categorization ● … 6 Plot Synopsis Cast Genre Box office Ratings Posters/ images Language Video trailers
  • 7. © Tubi, proprietary and confidential 7 Content Understanding Easy Hard Keyword Search Review/ Sentiment Classification Topic Extraction Embedding Generation Natural Language Understanding Video Understanding Multi-modal data Understanding (e.g. Text + Images)
  • 8. © Tubi, proprietary and confidential 8 Lessons Learned ● Pre-processing and cleaning up is very critical. ● Different tasks require different texts. E.g. sentiment analysis vs summarization. ● Not all the text is the same. E.g. Reviews vs Subtitles vs Synopsis.
  • 9. © Tubi, proprietary and confidential 9 Lessons Learned ● Be careful with the choice of algorithms. E.g. BERT more suitable for next sentence prediction. “No free lunch” in terms of algorithm & representation. ● Averaging widely used but can lose information. E.g. multiple reviews for a title averaged together to generate a title embedding. Transformer based fusion. ● Evaluation is hard but critical. E.g. embedding quality assessment on surrogate tasks.
  • 10. © Tubi, proprietary and confidential 10 Spock Platform ● Platform for data ingestion, preprocessing and cleaning. ● Generates a variety of embeddings powering the different use cases across the product. ● Helps assess embeddings quality via surrogate tasks.
  • 11. 1st & 3rd Party Data Audience Assessment Viewer-oriented data Title-oriented data Products Models Embeddings (CTXT, MD, MMD, Genre, Demos, Actor, et al) Universe of Content + Metadata Use Cases Beam from Universe to Tubiverse Cold➔ Warm➔Hot Starting Content Value Assessment Tiering Inventory in Tubiverse Augmented Search Seeding Growth Coordinated Pursuit of New Audience Portfolio Analysis / Simulation Spock Platform
  • 12. © Tubi, proprietary and confidential © Tubi, proprietary and confidential 03 Recommendation 12
  • 13. © Tubi, proprietary and confidential Personalized Recommendations Content Ranking Container Ranking Image Ranking Search Notifications Container Generation Cold starting titles
  • 14. © Tubi, proprietary and confidential Personalized Recommendations Content Ranking Container Ranking Image Ranking Search Notifications Container Generation Cold starting titles ● 70+ models helping organize the homepage! ● Rank content and containers based on users’ features and past interactions ● Ranking based on GBDT, Deep Neural Network ● Retrieval based upon a lot of Embeddings (e.g. two tower, BERT, etc) ● Distilled models for new users ● Exploration strategies for new titles
  • 15. © Tubi, proprietary and confidential 01 02 03 04 Offline vs Online Feedback loops Changing tastes and catalog Algorithmic fairness and biases Why is it challenging ?
  • 16. © Tubi, proprietary and confidential 01 02 03 04 Offline vs Online Feedback loops Changing tastes and catalog Algorithmic fairness and biases Why is it challenging ? Beyond Accuracy at the Top!
  • 17. © Tubi, proprietary and confidential 01 Offline vs Online 17
  • 18. © Tubi, proprietary and confidential Typical Metrics 18 Typical Offline Metrics Typical Online Metrics Ranking metrics: NDCG, NMRR, Precision @K Streaming, Retention
  • 19. © Tubi, proprietary and confidential Correlation vs Causation Offline evaluation ● Use historical data ● Cheap, fast, risk free ● Correlation based ● Counterfactuality of rewards: Do not capture what would have happened if ? Online evaluation ● Randomized experiment (A/B tests) ● Wait for days to compute the reward ● Reliable but expensive 19
  • 20. © Tubi, proprietary and confidential Dynamic Environment ● Recommender dynamics can affect the performance in ways not captured by the offline metrics. E.g. impression caps. ● Recommendations can influence user preferences in ways not captured by offline metrics. E.g. Did you watch a title because it was recommended ? ● User dynamics and confounding factors can influence the watch behavior in ways not captured by offline metrics. E.g. Watching a title because it was recommended by a friend. 20
  • 21. © Tubi, proprietary and confidential Counterfactual evaluation ● Estimate the potential outcome of a policy offline using logged data. ● Inverse Propensity Scoring (IPS): Importance weighting to account for the mismatch in the distribution of logged data and the policy to evaluate. ● Several variants - CIPS, SNIPS, etc. 21
  • 22. © Tubi, proprietary and confidential 02 Feedback loops 22 03 Changing User tastes & Catalog
  • 23. © Tubi, proprietary and confidential Feedback loops ● Different algorithms on the homepage influencing one another ● Underlying data influencing the algorithms ● Recommendations influencing the watch behavior ● Watch behavior influences the data. 23 Observational Data
  • 24. © Tubi, proprietary and confidential Feedback loops ● Typical offline training - clicks, watched, plays ● Implicit feedback has inherent biases ● Position/ recommendations influence the data collection 24
  • 25. © Tubi, proprietary and confidential Changing User tastes & Catalog ● Users adapt and change their preferences over time. ● Also, new titles come into the system whereas some others leave the service. ● Uncertainty around new users and titles ● Trends outside influence the watch behavior. 25
  • 26. © Tubi, proprietary and confidential ● Tradeoff: Explore unknown choices to gather information vs exploit known preferences. ● Exploration helps break feedback loops and helps with uncertainty around new items/ users. ● Caveat: Designing good exploration that works in practice is hard due to non-stationarity of the data and large dynamic action spaces. Reward is myopic ● RL: Optimize long term Exploration and bandits 26
  • 27. © Tubi, proprietary and confidential 04 Algorithmic fairness and biases 27
  • 28. © Tubi, proprietary and confidential Ranking systems ● Ranking function: Rank items for a given context → Learning to Rank ● Slate recommendations: slate of recommended titles ● Are traditional methods fair ? 28
  • 29. © Tubi, proprietary and confidential Fairness ● Many aspects of discrimination relate to correlation vs causation. ● Gender and race may be correlated with factors that shift the distributions. ● Reasons of bias - skewed examples, tainted examples, limited features, sample size disparity. 29
  • 30. © Tubi, proprietary and confidential Fairness ● Pre-processing: Distributions of specific sensitive or protected variables are biased. Train a model on a “repaired” data. ● In-processing: Modeling techniques often become biased by dominant features, other distributional effects. Incorporate fairness metrics into the model optimization ● Post-processing: Apply transformations to improve predictions fairness. 30 Caton & Hass: Fairness in ML survey, 2020
  • 31. © Tubi, proprietary and confidential © Tubi, proprietary and confidential 04 Conclusion 31
  • 32. © Tubi, proprietary and confidential Conclusion & Acknowledgements ● Lots of challenging problems in content understanding users and recommendations. ● The road ahead is very exciting and we are hiring! ● All the work is done by my awesome Machine Learning team @Tubi. 32
  • 33. © Tubi, proprietary and confidential Thank you. © Tubi, proprietary and confidential Jaya Kawale Email: jkawale@tubi.tv Twitter: jayakawale