SlideShare a Scribd company logo
1 of 29
Download to read offline
Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili
ML Platform Meetup, 6/20/2019
Missing Values in
Recommender Models
Talk Outline
● Problem Statement: Missing Features in Recommender Systems (RS)
● Handling Missing Features in GBDTs
● Handling Missing Features in NNs
● Conclusion
01
Problem Statement
● Scientists and engineers in the mathematical sciences have historically
dealt with the problem of missing observations for a long time
● Typical patterns in physics:
a. Astronomers fill in missing observations for orbits via least-squares:
Ceres example
b. Models to explain all observations including missing / future ones
i. Physicist proposes a new model explaining past observations that previous models
cannot adequately explain
ii. She realizes new model predicts events for which past observations do not exist
iii. New observations are collected to validate new model
Missing Observations vs. Missing Features
Physics Example from 100 Years Ago
● Einstein proposed general relativity model for
gravitation in 1915, an improvement over Newtonian
models, with two striking examples:
○ It better explained known observed shifts in
perihelion (closest point to Sun) of Mercury’s orbit
○ It predicted as yet unmeasured bending of light
from distant stars, e.g., during solar eclipse,
bending ~ 1.75 arc-seconds, twice Newtonian
prediction. Arms race to validate experimentally:
Eddington succeeded in May 1919
Non-parametric/ big-data Correlational Models
● We have already talked about several complex models:
○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters
assuming normal noise to fill in missing past coordinates and predict future motion
○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion
functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure
● In rest of talk, we focus on correlational models, but they are statistical and more complex:
○ trained on more data (both more features and samples)
○ non-parametric (decision trees) or many parameters (neural networks)
● But observation not missing, only a part of it:
○ incomplete observation is called observation with missing data
○ if input is incomplete, it is an observation with missing features
Improving correlational ML Models in RS
● Given context, predictive ML model in recommender system (RS) needs
to match users with items they might enjoy
● Thankfully, as ML engineers in the recommendation space, we need less
creativity and labor than Einstein / Eddington to improve models
● In supervised ML models, we can time-travel our (features, labels) to see
if our newer predictive models improve performance on historical offline
metrics [Netflix Delorean ML blog]
● Model improvements come from leveraging
○ business information (more appropriate metrics or inputs)
○ ML models: BERT, CatBoost, Factorization Machines, etc.
Problem of Missing Data in RS - I
● ML models in RS need to deal with missing data patterns for cases such as:
○ New users
○ New contexts (e.g., country, time-zone, language, device, row-type)
○ New items
○ Timeouts and failures in data microservices
○ Modeling causal impact of recommendations
○ Intent-to-treat
● Unfortunately, last two problems similar to Einstein/Eddington example:
Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk]
● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem
[Netflix talk]
Problem of Missing Data in RS - II
Guiding principles in this talk for RS cold-start + other correlational missing
feature problems
● Let ML models handle missing values rather than imputing and/or adding
features (via models or simple statistics)
○ Both GBDTs and NNs allow this
● ML models generally better at interpolation than extrapolation
○ Many past examples of service handling new users, items and contexts
○ For robust extrapolation during timeouts or data service failures, add simulated
examples in training and/or impose feature monotonicity constraints
New Users - I
● New users join Netflix every minute
New Users - II
● We get some taste information in
the sign-up flow
● But clearly, we don’t know enough
(what have they watched
elsewhere, broader tastes, etc.) to
personalize well
● Rather than try to extrapolate into
the past, personalize progressively
better as they interact with our
service
New Contexts
● ML models in search / recommender systems need to respect user
language choice
● As new languages are supported, these choices will grow
New Items - I
● New items are added to the Netflix service every day
SNL
New items - II
● New items miss any features
based on engagement data
● “Coming Soon” tab shows
trailers
○ This tab needs a
personalized ranker as well
02
Handling Missing Features in
GBDTs
GBDT for RS
● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost,
LightGBM, Cognitive Foundry, sklearn, etc.
● XGBoost won several structured data Kaggle competitions
● Netflix talk on fast scoring of XGBoost models
● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper)
Source: XGBoost
(S)GBDT Background - I
Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss
minimization consists of one main algorithm (greedily learn ensemble) and two
sub-algorithms (learn individual tree, learn split at each node of tree) :
Learn leaf coefficient
by one iteration of
Newton-Raphson
Get gradient of (logistic)
loss per example w.r.t.
current ensemble
Learn tree structure
(S)GBDT Background - II
Learn left and right
trees recursively
Find best split via
variance reduction
Missing Value Handling w GBDTs: Taxonomy
● ESL-II, (Section 9.6) mentions 3 ways to handle missing values:
○ Discard observations with any missing values
○ Impute all missing values before training via models or simple statistics :
Item popularities may be initialized randomly or to zero or via weighted averaging, where
weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its training
phase via surrogate splits non-strict usage
in tree:
■ Categoricals can include one more “missing” category
■ Continuous / categorical:
● Send example left or right for missing value appropriately (XGBoost)
● Use ternary split with missing branch (R’s GBM)
Missing Value Handling w R’s GBM
● Use ternary split with
missing branch:
○ Weighted
variance
reduction in
Best-Split
algorithm
updated to
include missing
variance
Missing Value Handling w XGBoost
Always send example left or right for
missing value appropriately:
● Evaluate best threshold and
variance reduction in Best-Split
algorithm from sending missing
values left or right (post-hoc)
and then pick better choice
03
Handling Missing Features in
NNs
Recurrent Neural Network (NN) for RS
● Youtube latent cross
recurrent NN, WSDM 2018
● Trained with
TensorFlow/Keras
○ Other options include
PyTorch, MxNet,
CNTK, etc.
Missing Value Handling w NNs: Taxonomy
● Similar taxonomy as in the case of GBDTs
○ Discard observations with any missing values
■ Dropout: Drop connections w missing values, scale up others
○ Impute all missing values before training via models or simple
statistics: Item embeddings may be initialized randomly or to zeros or via weighted
averaging, where weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its
training phase via hidden layers
■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17)
■ Continuous / Categorical: Impute continuous + include “missing” embedding or
Hidden layer reaches (NIPS18) “average” for missing feature or item
Missing Value Handling w DropoutNet
Auto-encoder with item/user-vec randomly retained or set to zero/average
Missing Value Handling w Hidden “Average”
Partly closed-form “average” for missing GMM first hidden layer activation
● A variety of ways to handle missing values in recommender models
● Only presented subset of approaches that do not modify / impute
inputs and treat missing values within training algorithm
● Optimal approach for a problem likely dataset-dependent !
Conclusion
● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al.
● Why beauty is truth: a history of symmetry, I. Stewart
● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired
● Delorean, H. Taghavi, et al., Netflix
● Bandits for Recommendations, J. Kawale, et al., Netflix
● Longer-term outcomes, B. Rostykus et al., Netflix
● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix
● Beyond clicks: Dwell-time for personalization, X Yi, et al.
● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al.
● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman
● R GBM
● Xgboost
● Processing of missing data via neural networks, Smieja, et al.
● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al.
● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al.
References
Acknowledgments
The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several
others for discussions and contributions
Thank You !

More Related Content

What's hot

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionJiangwei Pan
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel Sudeep Das, Ph.D.
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 

What's hot (20)

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfaction
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 

Similar to Missing values in recommender models

Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityAlberto Danese
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessingKnoldus Inc.
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIStudio Synthesis
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 Natalia Díaz Rodríguez
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1NBER
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 

Similar to Missing values in recommender models (20)

Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Data reduction
Data reductionData reduction
Data reduction
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Missing values in recommender models

  • 1. Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili ML Platform Meetup, 6/20/2019 Missing Values in Recommender Models
  • 2. Talk Outline ● Problem Statement: Missing Features in Recommender Systems (RS) ● Handling Missing Features in GBDTs ● Handling Missing Features in NNs ● Conclusion
  • 4. ● Scientists and engineers in the mathematical sciences have historically dealt with the problem of missing observations for a long time ● Typical patterns in physics: a. Astronomers fill in missing observations for orbits via least-squares: Ceres example b. Models to explain all observations including missing / future ones i. Physicist proposes a new model explaining past observations that previous models cannot adequately explain ii. She realizes new model predicts events for which past observations do not exist iii. New observations are collected to validate new model Missing Observations vs. Missing Features
  • 5. Physics Example from 100 Years Ago ● Einstein proposed general relativity model for gravitation in 1915, an improvement over Newtonian models, with two striking examples: ○ It better explained known observed shifts in perihelion (closest point to Sun) of Mercury’s orbit ○ It predicted as yet unmeasured bending of light from distant stars, e.g., during solar eclipse, bending ~ 1.75 arc-seconds, twice Newtonian prediction. Arms race to validate experimentally: Eddington succeeded in May 1919
  • 6. Non-parametric/ big-data Correlational Models ● We have already talked about several complex models: ○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters assuming normal noise to fill in missing past coordinates and predict future motion ○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure ● In rest of talk, we focus on correlational models, but they are statistical and more complex: ○ trained on more data (both more features and samples) ○ non-parametric (decision trees) or many parameters (neural networks) ● But observation not missing, only a part of it: ○ incomplete observation is called observation with missing data ○ if input is incomplete, it is an observation with missing features
  • 7. Improving correlational ML Models in RS ● Given context, predictive ML model in recommender system (RS) needs to match users with items they might enjoy ● Thankfully, as ML engineers in the recommendation space, we need less creativity and labor than Einstein / Eddington to improve models ● In supervised ML models, we can time-travel our (features, labels) to see if our newer predictive models improve performance on historical offline metrics [Netflix Delorean ML blog] ● Model improvements come from leveraging ○ business information (more appropriate metrics or inputs) ○ ML models: BERT, CatBoost, Factorization Machines, etc.
  • 8. Problem of Missing Data in RS - I ● ML models in RS need to deal with missing data patterns for cases such as: ○ New users ○ New contexts (e.g., country, time-zone, language, device, row-type) ○ New items ○ Timeouts and failures in data microservices ○ Modeling causal impact of recommendations ○ Intent-to-treat ● Unfortunately, last two problems similar to Einstein/Eddington example: Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk] ● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem [Netflix talk]
  • 9. Problem of Missing Data in RS - II Guiding principles in this talk for RS cold-start + other correlational missing feature problems ● Let ML models handle missing values rather than imputing and/or adding features (via models or simple statistics) ○ Both GBDTs and NNs allow this ● ML models generally better at interpolation than extrapolation ○ Many past examples of service handling new users, items and contexts ○ For robust extrapolation during timeouts or data service failures, add simulated examples in training and/or impose feature monotonicity constraints
  • 10. New Users - I ● New users join Netflix every minute
  • 11. New Users - II ● We get some taste information in the sign-up flow ● But clearly, we don’t know enough (what have they watched elsewhere, broader tastes, etc.) to personalize well ● Rather than try to extrapolate into the past, personalize progressively better as they interact with our service
  • 12. New Contexts ● ML models in search / recommender systems need to respect user language choice ● As new languages are supported, these choices will grow
  • 13. New Items - I ● New items are added to the Netflix service every day SNL
  • 14. New items - II ● New items miss any features based on engagement data ● “Coming Soon” tab shows trailers ○ This tab needs a personalized ranker as well
  • 16. GBDT for RS ● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost, LightGBM, Cognitive Foundry, sklearn, etc. ● XGBoost won several structured data Kaggle competitions ● Netflix talk on fast scoring of XGBoost models ● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper) Source: XGBoost
  • 17. (S)GBDT Background - I Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss minimization consists of one main algorithm (greedily learn ensemble) and two sub-algorithms (learn individual tree, learn split at each node of tree) : Learn leaf coefficient by one iteration of Newton-Raphson Get gradient of (logistic) loss per example w.r.t. current ensemble Learn tree structure
  • 18. (S)GBDT Background - II Learn left and right trees recursively Find best split via variance reduction
  • 19. Missing Value Handling w GBDTs: Taxonomy ● ESL-II, (Section 9.6) mentions 3 ways to handle missing values: ○ Discard observations with any missing values ○ Impute all missing values before training via models or simple statistics : Item popularities may be initialized randomly or to zero or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via surrogate splits non-strict usage in tree: ■ Categoricals can include one more “missing” category ■ Continuous / categorical: ● Send example left or right for missing value appropriately (XGBoost) ● Use ternary split with missing branch (R’s GBM)
  • 20. Missing Value Handling w R’s GBM ● Use ternary split with missing branch: ○ Weighted variance reduction in Best-Split algorithm updated to include missing variance
  • 21. Missing Value Handling w XGBoost Always send example left or right for missing value appropriately: ● Evaluate best threshold and variance reduction in Best-Split algorithm from sending missing values left or right (post-hoc) and then pick better choice
  • 23. Recurrent Neural Network (NN) for RS ● Youtube latent cross recurrent NN, WSDM 2018 ● Trained with TensorFlow/Keras ○ Other options include PyTorch, MxNet, CNTK, etc.
  • 24. Missing Value Handling w NNs: Taxonomy ● Similar taxonomy as in the case of GBDTs ○ Discard observations with any missing values ■ Dropout: Drop connections w missing values, scale up others ○ Impute all missing values before training via models or simple statistics: Item embeddings may be initialized randomly or to zeros or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via hidden layers ■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17) ■ Continuous / Categorical: Impute continuous + include “missing” embedding or Hidden layer reaches (NIPS18) “average” for missing feature or item
  • 25. Missing Value Handling w DropoutNet Auto-encoder with item/user-vec randomly retained or set to zero/average
  • 26. Missing Value Handling w Hidden “Average” Partly closed-form “average” for missing GMM first hidden layer activation
  • 27. ● A variety of ways to handle missing values in recommender models ● Only presented subset of approaches that do not modify / impute inputs and treat missing values within training algorithm ● Optimal approach for a problem likely dataset-dependent ! Conclusion
  • 28. ● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al. ● Why beauty is truth: a history of symmetry, I. Stewart ● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired ● Delorean, H. Taghavi, et al., Netflix ● Bandits for Recommendations, J. Kawale, et al., Netflix ● Longer-term outcomes, B. Rostykus et al., Netflix ● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix ● Beyond clicks: Dwell-time for personalization, X Yi, et al. ● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al. ● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman ● R GBM ● Xgboost ● Processing of missing data via neural networks, Smieja, et al. ● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al. ● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al. References
  • 29. Acknowledgments The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several others for discussions and contributions Thank You !