SlideShare a Scribd company logo
Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili
ML Platform Meetup, 6/20/2019
Missing Values in
Recommender Models
Talk Outline
● Problem Statement: Missing Features in Recommender Systems (RS)
● Handling Missing Features in GBDTs
● Handling Missing Features in NNs
● Conclusion
01
Problem Statement
● Scientists and engineers in the mathematical sciences have historically
dealt with the problem of missing observations for a long time
● Typical patterns in physics:
a. Astronomers fill in missing observations for orbits via least-squares:
Ceres example
b. Models to explain all observations including missing / future ones
i. Physicist proposes a new model explaining past observations that previous models
cannot adequately explain
ii. She realizes new model predicts events for which past observations do not exist
iii. New observations are collected to validate new model
Missing Observations vs. Missing Features
Physics Example from 100 Years Ago
● Einstein proposed general relativity model for
gravitation in 1915, an improvement over Newtonian
models, with two striking examples:
○ It better explained known observed shifts in
perihelion (closest point to Sun) of Mercury’s orbit
○ It predicted as yet unmeasured bending of light
from distant stars, e.g., during solar eclipse,
bending ~ 1.75 arc-seconds, twice Newtonian
prediction. Arms race to validate experimentally:
Eddington succeeded in May 1919
Non-parametric/ big-data Correlational Models
● We have already talked about several complex models:
○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters
assuming normal noise to fill in missing past coordinates and predict future motion
○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion
functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure
● In rest of talk, we focus on correlational models, but they are statistical and more complex:
○ trained on more data (both more features and samples)
○ non-parametric (decision trees) or many parameters (neural networks)
● But observation not missing, only a part of it:
○ incomplete observation is called observation with missing data
○ if input is incomplete, it is an observation with missing features
Improving correlational ML Models in RS
● Given context, predictive ML model in recommender system (RS) needs
to match users with items they might enjoy
● Thankfully, as ML engineers in the recommendation space, we need less
creativity and labor than Einstein / Eddington to improve models
● In supervised ML models, we can time-travel our (features, labels) to see
if our newer predictive models improve performance on historical offline
metrics [Netflix Delorean ML blog]
● Model improvements come from leveraging
○ business information (more appropriate metrics or inputs)
○ ML models: BERT, CatBoost, Factorization Machines, etc.
Problem of Missing Data in RS - I
● ML models in RS need to deal with missing data patterns for cases such as:
○ New users
○ New contexts (e.g., country, time-zone, language, device, row-type)
○ New items
○ Timeouts and failures in data microservices
○ Modeling causal impact of recommendations
○ Intent-to-treat
● Unfortunately, last two problems similar to Einstein/Eddington example:
Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk]
● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem
[Netflix talk]
Problem of Missing Data in RS - II
Guiding principles in this talk for RS cold-start + other correlational missing
feature problems
● Let ML models handle missing values rather than imputing and/or adding
features (via models or simple statistics)
○ Both GBDTs and NNs allow this
● ML models generally better at interpolation than extrapolation
○ Many past examples of service handling new users, items and contexts
○ For robust extrapolation during timeouts or data service failures, add simulated
examples in training and/or impose feature monotonicity constraints
New Users - I
● New users join Netflix every minute
New Users - II
● We get some taste information in
the sign-up flow
● But clearly, we don’t know enough
(what have they watched
elsewhere, broader tastes, etc.) to
personalize well
● Rather than try to extrapolate into
the past, personalize progressively
better as they interact with our
service
New Contexts
● ML models in search / recommender systems need to respect user
language choice
● As new languages are supported, these choices will grow
New Items - I
● New items are added to the Netflix service every day
SNL
New items - II
● New items miss any features
based on engagement data
● “Coming Soon” tab shows
trailers
○ This tab needs a
personalized ranker as well
02
Handling Missing Features in
GBDTs
GBDT for RS
● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost,
LightGBM, Cognitive Foundry, sklearn, etc.
● XGBoost won several structured data Kaggle competitions
● Netflix talk on fast scoring of XGBoost models
● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper)
Source: XGBoost
(S)GBDT Background - I
Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss
minimization consists of one main algorithm (greedily learn ensemble) and two
sub-algorithms (learn individual tree, learn split at each node of tree) :
Learn leaf coefficient
by one iteration of
Newton-Raphson
Get gradient of (logistic)
loss per example w.r.t.
current ensemble
Learn tree structure
(S)GBDT Background - II
Learn left and right
trees recursively
Find best split via
variance reduction
Missing Value Handling w GBDTs: Taxonomy
● ESL-II, (Section 9.6) mentions 3 ways to handle missing values:
○ Discard observations with any missing values
○ Impute all missing values before training via models or simple statistics :
Item popularities may be initialized randomly or to zero or via weighted averaging, where
weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its training
phase via surrogate splits non-strict usage
in tree:
■ Categoricals can include one more “missing” category
■ Continuous / categorical:
● Send example left or right for missing value appropriately (XGBoost)
● Use ternary split with missing branch (R’s GBM)
Missing Value Handling w R’s GBM
● Use ternary split with
missing branch:
○ Weighted
variance
reduction in
Best-Split
algorithm
updated to
include missing
variance
Missing Value Handling w XGBoost
Always send example left or right for
missing value appropriately:
● Evaluate best threshold and
variance reduction in Best-Split
algorithm from sending missing
values left or right (post-hoc)
and then pick better choice
03
Handling Missing Features in
NNs
Recurrent Neural Network (NN) for RS
● Youtube latent cross
recurrent NN, WSDM 2018
● Trained with
TensorFlow/Keras
○ Other options include
PyTorch, MxNet,
CNTK, etc.
Missing Value Handling w NNs: Taxonomy
● Similar taxonomy as in the case of GBDTs
○ Discard observations with any missing values
■ Dropout: Drop connections w missing values, scale up others
○ Impute all missing values before training via models or simple
statistics: Item embeddings may be initialized randomly or to zeros or via weighted
averaging, where weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its
training phase via hidden layers
■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17)
■ Continuous / Categorical: Impute continuous + include “missing” embedding or
Hidden layer reaches (NIPS18) “average” for missing feature or item
Missing Value Handling w DropoutNet
Auto-encoder with item/user-vec randomly retained or set to zero/average
Missing Value Handling w Hidden “Average”
Partly closed-form “average” for missing GMM first hidden layer activation
● A variety of ways to handle missing values in recommender models
● Only presented subset of approaches that do not modify / impute
inputs and treat missing values within training algorithm
● Optimal approach for a problem likely dataset-dependent !
Conclusion
● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al.
● Why beauty is truth: a history of symmetry, I. Stewart
● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired
● Delorean, H. Taghavi, et al., Netflix
● Bandits for Recommendations, J. Kawale, et al., Netflix
● Longer-term outcomes, B. Rostykus et al., Netflix
● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix
● Beyond clicks: Dwell-time for personalization, X Yi, et al.
● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al.
● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman
● R GBM
● Xgboost
● Processing of missing data via neural networks, Smieja, et al.
● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al.
● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al.
References
Acknowledgments
The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several
others for discussions and contributions
Thank You !

More Related Content

What's hot

Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
Jiangwei Pan
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 

What's hot (20)

Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 

Similar to Missing values in recommender models

DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8
bobbinb2internationa
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
Alberto Danese
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
Knoldus Inc.
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
Studio Synthesis
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
Data reduction
Data reductionData reduction
Data reduction
GowriLatha1
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
Marsan Ma
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
MargiShah29
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
NBER
 

Similar to Missing values in recommender models (20)

DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Data reduction
Data reductionData reduction
Data reduction
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 

Recently uploaded

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Missing values in recommender models

  • 1. Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili ML Platform Meetup, 6/20/2019 Missing Values in Recommender Models
  • 2. Talk Outline ● Problem Statement: Missing Features in Recommender Systems (RS) ● Handling Missing Features in GBDTs ● Handling Missing Features in NNs ● Conclusion
  • 4. ● Scientists and engineers in the mathematical sciences have historically dealt with the problem of missing observations for a long time ● Typical patterns in physics: a. Astronomers fill in missing observations for orbits via least-squares: Ceres example b. Models to explain all observations including missing / future ones i. Physicist proposes a new model explaining past observations that previous models cannot adequately explain ii. She realizes new model predicts events for which past observations do not exist iii. New observations are collected to validate new model Missing Observations vs. Missing Features
  • 5. Physics Example from 100 Years Ago ● Einstein proposed general relativity model for gravitation in 1915, an improvement over Newtonian models, with two striking examples: ○ It better explained known observed shifts in perihelion (closest point to Sun) of Mercury’s orbit ○ It predicted as yet unmeasured bending of light from distant stars, e.g., during solar eclipse, bending ~ 1.75 arc-seconds, twice Newtonian prediction. Arms race to validate experimentally: Eddington succeeded in May 1919
  • 6. Non-parametric/ big-data Correlational Models ● We have already talked about several complex models: ○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters assuming normal noise to fill in missing past coordinates and predict future motion ○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure ● In rest of talk, we focus on correlational models, but they are statistical and more complex: ○ trained on more data (both more features and samples) ○ non-parametric (decision trees) or many parameters (neural networks) ● But observation not missing, only a part of it: ○ incomplete observation is called observation with missing data ○ if input is incomplete, it is an observation with missing features
  • 7. Improving correlational ML Models in RS ● Given context, predictive ML model in recommender system (RS) needs to match users with items they might enjoy ● Thankfully, as ML engineers in the recommendation space, we need less creativity and labor than Einstein / Eddington to improve models ● In supervised ML models, we can time-travel our (features, labels) to see if our newer predictive models improve performance on historical offline metrics [Netflix Delorean ML blog] ● Model improvements come from leveraging ○ business information (more appropriate metrics or inputs) ○ ML models: BERT, CatBoost, Factorization Machines, etc.
  • 8. Problem of Missing Data in RS - I ● ML models in RS need to deal with missing data patterns for cases such as: ○ New users ○ New contexts (e.g., country, time-zone, language, device, row-type) ○ New items ○ Timeouts and failures in data microservices ○ Modeling causal impact of recommendations ○ Intent-to-treat ● Unfortunately, last two problems similar to Einstein/Eddington example: Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk] ● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem [Netflix talk]
  • 9. Problem of Missing Data in RS - II Guiding principles in this talk for RS cold-start + other correlational missing feature problems ● Let ML models handle missing values rather than imputing and/or adding features (via models or simple statistics) ○ Both GBDTs and NNs allow this ● ML models generally better at interpolation than extrapolation ○ Many past examples of service handling new users, items and contexts ○ For robust extrapolation during timeouts or data service failures, add simulated examples in training and/or impose feature monotonicity constraints
  • 10. New Users - I ● New users join Netflix every minute
  • 11. New Users - II ● We get some taste information in the sign-up flow ● But clearly, we don’t know enough (what have they watched elsewhere, broader tastes, etc.) to personalize well ● Rather than try to extrapolate into the past, personalize progressively better as they interact with our service
  • 12. New Contexts ● ML models in search / recommender systems need to respect user language choice ● As new languages are supported, these choices will grow
  • 13. New Items - I ● New items are added to the Netflix service every day SNL
  • 14. New items - II ● New items miss any features based on engagement data ● “Coming Soon” tab shows trailers ○ This tab needs a personalized ranker as well
  • 16. GBDT for RS ● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost, LightGBM, Cognitive Foundry, sklearn, etc. ● XGBoost won several structured data Kaggle competitions ● Netflix talk on fast scoring of XGBoost models ● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper) Source: XGBoost
  • 17. (S)GBDT Background - I Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss minimization consists of one main algorithm (greedily learn ensemble) and two sub-algorithms (learn individual tree, learn split at each node of tree) : Learn leaf coefficient by one iteration of Newton-Raphson Get gradient of (logistic) loss per example w.r.t. current ensemble Learn tree structure
  • 18. (S)GBDT Background - II Learn left and right trees recursively Find best split via variance reduction
  • 19. Missing Value Handling w GBDTs: Taxonomy ● ESL-II, (Section 9.6) mentions 3 ways to handle missing values: ○ Discard observations with any missing values ○ Impute all missing values before training via models or simple statistics : Item popularities may be initialized randomly or to zero or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via surrogate splits non-strict usage in tree: ■ Categoricals can include one more “missing” category ■ Continuous / categorical: ● Send example left or right for missing value appropriately (XGBoost) ● Use ternary split with missing branch (R’s GBM)
  • 20. Missing Value Handling w R’s GBM ● Use ternary split with missing branch: ○ Weighted variance reduction in Best-Split algorithm updated to include missing variance
  • 21. Missing Value Handling w XGBoost Always send example left or right for missing value appropriately: ● Evaluate best threshold and variance reduction in Best-Split algorithm from sending missing values left or right (post-hoc) and then pick better choice
  • 23. Recurrent Neural Network (NN) for RS ● Youtube latent cross recurrent NN, WSDM 2018 ● Trained with TensorFlow/Keras ○ Other options include PyTorch, MxNet, CNTK, etc.
  • 24. Missing Value Handling w NNs: Taxonomy ● Similar taxonomy as in the case of GBDTs ○ Discard observations with any missing values ■ Dropout: Drop connections w missing values, scale up others ○ Impute all missing values before training via models or simple statistics: Item embeddings may be initialized randomly or to zeros or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via hidden layers ■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17) ■ Continuous / Categorical: Impute continuous + include “missing” embedding or Hidden layer reaches (NIPS18) “average” for missing feature or item
  • 25. Missing Value Handling w DropoutNet Auto-encoder with item/user-vec randomly retained or set to zero/average
  • 26. Missing Value Handling w Hidden “Average” Partly closed-form “average” for missing GMM first hidden layer activation
  • 27. ● A variety of ways to handle missing values in recommender models ● Only presented subset of approaches that do not modify / impute inputs and treat missing values within training algorithm ● Optimal approach for a problem likely dataset-dependent ! Conclusion
  • 28. ● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al. ● Why beauty is truth: a history of symmetry, I. Stewart ● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired ● Delorean, H. Taghavi, et al., Netflix ● Bandits for Recommendations, J. Kawale, et al., Netflix ● Longer-term outcomes, B. Rostykus et al., Netflix ● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix ● Beyond clicks: Dwell-time for personalization, X Yi, et al. ● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al. ● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman ● R GBM ● Xgboost ● Processing of missing data via neural networks, Smieja, et al. ● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al. ● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al. References
  • 29. Acknowledgments The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several others for discussions and contributions Thank You !