SlideShare a Scribd company logo
1 of 144
Deep
Recommendations
in PyTorch
Setup
Follow along with instructions here:
cemoody.github.io/simple_mf
About
@chrisemoody
Caltech Physics
PhD. in Astrophysics + Supercomputing
Scikit-Learn t-SNE Contributor
Stitch Fix AI Team
Stitch Fix
Stitch Fix
Stitch Fix
Stitch Fix
Stitch Fix
AI at Stitch Fix
Capture Joint
Annotation
Segmentation
Model Fitting
If you’re
interested, ask
me about this
later!
Matrix
Factorization
“Latent” Style Space
Matrix
Factorization
t-SNE Viz
“Latent” Style Space
100+ Data Scientists
Increased Personalization
Increased Personalization
Decreased Client Churn
Increased Personalization
Decreased Client Churn
Increased Item Sales
Increased Personalization
Decreased Client Churn
Increased Item Sales
Better Merch Buying
Increased Personalization
Decreased Client Churn
Increased Item Sales
Better Merch Buying
Better Stylist Relationships
1. More data means more personalization
1. Recommendation engines are instruments of your business.
1. Custom models respect your heterogeneous and unique system
Lessons Learned
Lessons Learned Goals for Today
1. More data means more personalization
We’ll use rec engines to drive personalization
1. Recommendation engines are instruments of your business.
The latent factors they reveal enable new science!
1. Custom models respect your heterogeneous and unique system
We’ll explore 8 different ways of building
deep recommendation engines.
Setup
Follow along with instructions here:
cemoody.github.io/simple_mf
Why PyTorch
This gets very deep.
…PyTorch
computes
everything at
run time… so
debug &
investigate!
…PyTorch
computes
everything at
run time… so
debug &
investigate!
Setup
Setup
Follow along with instructions here:
cemoody.github.io/simple_mf
Start Jupyter Notebook
Start Tensorboard
There are eight Jupyter
Notebooks in this tutorial.
We’ll go through them one-by-
one.
Feel free to run them in advance
-- they’ll start training models in
the background.
… or start them up as I go along.
Matrix Factorization
Fundamentals
Notebooks we’ll be using:
01 Training a simple MF model.ipynb
02 Simple MF Model with Biases.ipynb
+1
-1
+1
+1
0
+1
0
+1
+1+1
+1
0
+1
0
+1
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0
+1 R
(n_users x n_items)
R
The ratings matrix
(n_users x n_items)
Extremely large
(Easily tens of billions of elements)
Mostly zeroes
(Typical sparsity is 0.01% - 1%)
R
The ratings matrix
(n_users, n_items)
P
The user matrix
(n_users, k)
Q
The item matrix
(n_item, k)
P
The user matrix
(n_users, k)
Q
The item matrix
(n_item, k)
Much smaller!
(millions or 100ks of rows)
Compact & dense!
No zeroes, efficient storage.
Rui ≈ pu⋅ qi
≈u
i
u
i
Rui ≈ pu⋅ qi
≈
A single +1 or 0 rating User vector
Item vector
known to be estimated
Minimize L = (Rui - pu⋅ qi)2
Minimize L = (Rui - pu⋅ qi)2 + c |Q|2 + c|P|2
Minimize L = (Rui - pu⋅ qi)2 + c |Q|2 + c|P|2
Baseline model
- Only captures interactions.
- What if a user generally likes everything?
- What if an item is generally popular?
Rui = pu⋅ qi
Baseline model
- Only captures interactions.
- What if a user generally likes everything?
- What if an item is generally popular?
Let’s add biases.
Rui = pu⋅ qi
Model +
global bias
user biases
item biases
Now we learn how much a user
generally likes things, and an item is
generally liked.
Rui = b + ωu + μi + pu⋅ qi
A single +1 or 0 rating
User vector Item vectorUser bias
One number
per user
Item bias
One number
per item
=
Rui = b + ωu + μi + pu⋅ qi
A single +1 or 0 rating
User vector Item vectorUser bias
One number
per user
Rui = b + ωu + μi + pu⋅ qi
Item bias
One number
per item
=
Recommendation Engines
are an instrument to do science
User vector Item vector
User & item vectors are not black boxes.
They are instruments into your space.
Stitch Fix vectors stand for clothing style.
Movielens vectors yield movie genres.
At Spotify, they represent latent musical tastes.
Amazon they represent latent categories.
Let’s PCA the Stitch Fix vectors and take a look.
+1.2 - 0.3 +0.4 -4.0 +0.9
+6.7 - 0.44 + 0.9 +0.58 -0.7
+3.8 - 0.9 -2. 4 +0.8 +0.3
We’ll sort by the
first eigenvector.
+1.2 - 0.3 +0.4 -4.0 +0.9
+6.7 - 0.44 + 0.9 +0.58 -0.7
+3.8 - 0.9 -2. 4 +0.8 +0.3
Let’s PCA the Stitch Fix vectors and take a look.
Positive EndNegative End
striped,
polka dotted,
fitted,
button-up,
high chroma
this must be: preppy.
tile-printed,
paisley,
bell-sleeved,
fringed,
tasseled,
maxi
this must be: boho.
we did not define boho or preppy -- we discovered it!
This is the cornerstone of our ‘style space’ that segments our clients and
our merch.
+1.2 - 0.3 +0.4 -4.0 +0.9
+6.7 - 0.44 + 0.9 +0.58 -0.7
+3.8 - 0.9 -2. 4 +0.8 +0.3
We don’t have just one
dimension however….
… we have many more.
What do each of those
look like?
Internally, we use 18
dimensions.
The first three we call
trendiness, femininity,
and end use.
For more search:
“Understanding Latent
Style”
https://multithreaded.stitch
fix.com/blog/2018/06/28/la
tent-style/
Advanced Matrix
Factorization
Notebooks we’ll be using:
01 Training a simple MF model.ipynb
02 Simple MF Model with Biases.ipynb
This is now a typical matrix-
factorization recommender.
Rui = b + ωu + μi + pu⋅ qi
This is now a typical matrix-
factorization recommender.
What if we know more “side” features
-- like the user’s occupation?
Great for when we want to “coldstart”
a new user.
Rui = b + ωu + μi + pu⋅ qi
We have two choices for side features:
- add them as a bias
→ Choose this if you think occupation changes like rate, but not which
movies
→e.g. “artists like movies more than other occupations”
- add them as a user vector
→ Choose this if you think occupation changes depending on the item
→e.g. “realtors love real estate shows”
Rui = b + do + ωu + μi + ( pu+ to)⋅ qi
*Note that to interacts with qi
We have two choices for side features:
- add them as a bias
→ Choose this if you think occupation changes like rate, but not which
movies
→e.g. “realtors love real estate shows”
- add them as a user vector
→ Choose this if you think occupation changes movie selection
→e.g. “realtors love real estate shows”
Rui = b + do + ωu + μi + ( pu+ to)⋅ qi
*Note that to interacts with qi
Rui = b + ωu + μi + pu⋅ qi
What about when features change in time?
You’ve seen biases.
You’ve seen user-item interactions.
Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt
What does a latent temporal vector encode?
Here’s a toy example.
t (time)
m0= 0.0 1.0
Average
User
Rating
0 1 2 3 4 5 6
0.0 1.0
Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt
What does a latent temporal vector encode?
Here’s a toy example.
t (time)
m0= 0.0 1.0
Average
User
Rating
n0 = 1.0 0.0
n1 =
nt = 1 - t t0 1 2 3 4 5 6
0.9 0.1
0.0 1.0
Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt
What does a latent temporal vector encode?
Here’s a toy example.
t (time)
Average
User
Rating
0 1 2 3 4 5 6
m1= 0.0 1.0
n0 = 1.0 0.0
n1 =
nt = 1 - t t
0.9 0.1
1.0 0.0
Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt
What does a latent temporal vector encode?
A user mixture of time-series.
m1= 0.0 1.0
n0 = 1.0 0.0
n1 =
nt = 1 - t t
0.9 0.1
1.0 0.0
m0= 0.0 1.00.0 1.0
User 0 will have a
100% mixture of
the 1st time series
User 1 will have a
100% mixture of
the 2nd time series
Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt
We can enforce that nearby times have
similar components.
n0 = 1.0 0.0
n1 =
nt = 1 - t t
0.9 0.1
Enforce that:
|nt - nt-1|
should be small.
Word2Vec is
actually a rec engine!
Notebooks we’ll be using:
03 Simple MF Biases is actually word2vec.ipynb
“ITEM_92 I think this fabric is wonderful
(rayon & spandex). like the lace/embroidery
accents”
Instead of
(users, items) we have (token1, token2)
And instead of a
Rating we have a skipgram count.
But it’s still the same model!
Rui = b + ωu + μi + pu⋅ qi
Item bias
token2 bias
User bias
token1 bias
A single +1 or 0 rating
Log Skipgram count
User vector
token1 vector
Item vector
token2 vector
=
Rui = ωu + μi + pu⋅ qi
User bias
token1 bias
(How frequent
is this word?)
A single +1 or 0 rating
Log Skipgram count
User vector
word1 vector
Item vector
word2 vector
Item bias
token2 bias
(How frequent is
this word?)
=
Rui = ωu + μi + pu⋅ qi
Do word1 & word2 have a
special interaction?
In recommendation engines, there’s also
directions in the latent space. What are the
directions in your space?
MORE TRENDY
MORE
PROFESSIONAL
In recommendation engines, there’s also
directions in the latent space. What are the
directions in your space?
MORE TRENDY
MORE
PROFESSIONAL
Variational Matrix
Factorization
Notebooks we’ll be using:
08 Variational MF.ipynb
Practical reasons to go variational:
1. Alternative regularization
2. Measure what your model doesn’t know.
3. Help explain your data.
Practical reasons to go variational:
1. Alternative regularization
2. Measure what your model doesn’t know.
3. Help explain your data.
4. Short & fits in a tweet!
Rui = b + ωu + μi + pu⋅ qi
Let’s make this variational:
1. Replace point estimates with samples from a
distribution.
2. Replace regularizing that point, regularize that
distribution.
WITHOUT VARIATIONAL
embeddings = nn.Embedding(n_users, k)
c_vector = embeddings(c_index)
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
embeddings_mu
embedding
_lv
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
embeddings_mu
embedding
_lv
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
embeddings_mu
embedding
_lv
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
embeddings_mu
embedding
_lv
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
embeddings_mu
embedding
_lv
def sample_gaussian(mu, lv):
variance = sqrt(exp(lv))
sample = mu + N(0, 1) * variance
return sample
WITH VARIATIONAL
embeddings_mu = nn.Embedding(n_users, k)
embeddings_lv = nn.Embedding(n_users, k)
...
vector_mu = embeddings_mu(c_index)
vector_lv = embeddings_lv(c_index)
c_vector = sample_gaussian(vector_mu, vector_lv)
Rui = b + ωu + μi + pu⋅ qi
Let’s make this variational:
1. Replace point estimates with samples from a
distribution.
2. Replace regularizing that point with
regularizing that distribution.
WITH VARIATIONAL
loss += gaussian_kldiv(vector_mu, vector_lv)
WITHOUT VARIATIONAL
loss += c_vector.pow(2.0).sum()
WITH VARIATIONAL
loss += gaussian_kldiv(vector_mu, vector_lv)
WITHOUT VARIATIONAL
loss += c_vector.pow(2.0).sum()
At the Frontier
More Things to Try
FMs
● Useful when you have many kinds of
interactions, not just user-item and user-
time
Mixture-of-Tastes
● Like “attention” for recommendations,
allows for users to have multiple “tastes.”
Extra: Non-Euclidean Spaces
Poincare Spaces
“Zelda” is a game in set 2D Euclidean space
(roughly).
● Infinity can’t be seen
● Area is homogenous
● Geodesics are straight lines
“Hyper Rogue” is a game in hyperbolic
space.
● Radius = 1 is infinitely far away,
but visible
● Volume increases towards
boundary
“Hyper Rogue” is a game in hyperbolic
space.
● Radius = 1 is infinitely far away,
but visible
● Volume increases towards
boundary
Deep recommendations in PyTorch
Deep recommendations in PyTorch

More Related Content

Similar to Deep recommendations in PyTorch

Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
OWOX BI
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Gabriel Moreira
 
Design Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and TasksDesign Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and Tasks
Keyur Sorathia
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
Vasco Duarte
 

Similar to Deep recommendations in PyTorch (20)

Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect Interactions
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Matrix factorization: Demistifying word embeddings
Matrix factorization: Demistifying word embeddingsMatrix factorization: Demistifying word embeddings
Matrix factorization: Demistifying word embeddings
 
Attribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia BochevaAttribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia Bocheva
 
Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
 
Design Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and TasksDesign Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and Tasks
 
Design Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and TasksDesign Refreshment - New Languages to Everyday Objects and Tasks
Design Refreshment - New Languages to Everyday Objects and Tasks
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine Learning
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Intelligent Search
Intelligent SearchIntelligent Search
Intelligent Search
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 

More from Stitch Fix Algorithms

Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
Stitch Fix Algorithms
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 

More from Stitch Fix Algorithms (11)

Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Moment-based estimation for hierarchical models in Apache Spark
Moment-based estimation for hierarchical models in Apache SparkMoment-based estimation for hierarchical models in Apache Spark
Moment-based estimation for hierarchical models in Apache Spark
 
Production model deployment
Production model deploymentProduction model deployment
Production model deployment
 
Optimizing Spark
Optimizing SparkOptimizing Spark
Optimizing Spark
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Incrementality
IncrementalityIncrementality
Incrementality
 
Apache Spark & ML Workflows
Apache Spark & ML WorkflowsApache Spark & ML Workflows
Apache Spark & ML Workflows
 
Enabling full stack data scientists
Enabling full stack data scientistsEnabling full stack data scientists
Enabling full stack data scientists
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Deep recommendations in PyTorch

  • 2. Setup Follow along with instructions here: cemoody.github.io/simple_mf
  • 3. About @chrisemoody Caltech Physics PhD. in Astrophysics + Supercomputing Scikit-Learn t-SNE Contributor Stitch Fix AI Team
  • 9. AI at Stitch Fix Capture Joint Annotation Segmentation Model Fitting If you’re interested, ask me about this later!
  • 15. Increased Personalization Decreased Client Churn Increased Item Sales
  • 16. Increased Personalization Decreased Client Churn Increased Item Sales Better Merch Buying
  • 17. Increased Personalization Decreased Client Churn Increased Item Sales Better Merch Buying Better Stylist Relationships
  • 18. 1. More data means more personalization 1. Recommendation engines are instruments of your business. 1. Custom models respect your heterogeneous and unique system Lessons Learned
  • 19. Lessons Learned Goals for Today 1. More data means more personalization We’ll use rec engines to drive personalization 1. Recommendation engines are instruments of your business. The latent factors they reveal enable new science! 1. Custom models respect your heterogeneous and unique system We’ll explore 8 different ways of building deep recommendation engines.
  • 20. Setup Follow along with instructions here: cemoody.github.io/simple_mf
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. This gets very deep.
  • 29.
  • 32. Setup
  • 33. Setup Follow along with instructions here: cemoody.github.io/simple_mf
  • 36. There are eight Jupyter Notebooks in this tutorial. We’ll go through them one-by- one. Feel free to run them in advance -- they’ll start training models in the background. … or start them up as I go along.
  • 37. Matrix Factorization Fundamentals Notebooks we’ll be using: 01 Training a simple MF model.ipynb 02 Simple MF Model with Biases.ipynb
  • 38.
  • 39.
  • 45. R The ratings matrix (n_users x n_items) Extremely large (Easily tens of billions of elements) Mostly zeroes (Typical sparsity is 0.01% - 1%)
  • 46. R The ratings matrix (n_users, n_items) P The user matrix (n_users, k) Q The item matrix (n_item, k)
  • 47. P The user matrix (n_users, k) Q The item matrix (n_item, k) Much smaller! (millions or 100ks of rows) Compact & dense! No zeroes, efficient storage.
  • 48. Rui ≈ pu⋅ qi ≈u i u i
  • 49. Rui ≈ pu⋅ qi ≈ A single +1 or 0 rating User vector Item vector known to be estimated
  • 50. Minimize L = (Rui - pu⋅ qi)2
  • 51. Minimize L = (Rui - pu⋅ qi)2 + c |Q|2 + c|P|2
  • 52. Minimize L = (Rui - pu⋅ qi)2 + c |Q|2 + c|P|2
  • 53. Baseline model - Only captures interactions. - What if a user generally likes everything? - What if an item is generally popular? Rui = pu⋅ qi
  • 54. Baseline model - Only captures interactions. - What if a user generally likes everything? - What if an item is generally popular? Let’s add biases. Rui = pu⋅ qi
  • 55. Model + global bias user biases item biases Now we learn how much a user generally likes things, and an item is generally liked. Rui = b + ωu + μi + pu⋅ qi
  • 56. A single +1 or 0 rating User vector Item vectorUser bias One number per user Item bias One number per item = Rui = b + ωu + μi + pu⋅ qi
  • 57. A single +1 or 0 rating User vector Item vectorUser bias One number per user Rui = b + ωu + μi + pu⋅ qi Item bias One number per item =
  • 58. Recommendation Engines are an instrument to do science
  • 59. User vector Item vector User & item vectors are not black boxes. They are instruments into your space. Stitch Fix vectors stand for clothing style. Movielens vectors yield movie genres. At Spotify, they represent latent musical tastes. Amazon they represent latent categories.
  • 60. Let’s PCA the Stitch Fix vectors and take a look. +1.2 - 0.3 +0.4 -4.0 +0.9 +6.7 - 0.44 + 0.9 +0.58 -0.7 +3.8 - 0.9 -2. 4 +0.8 +0.3
  • 61. We’ll sort by the first eigenvector. +1.2 - 0.3 +0.4 -4.0 +0.9 +6.7 - 0.44 + 0.9 +0.58 -0.7 +3.8 - 0.9 -2. 4 +0.8 +0.3 Let’s PCA the Stitch Fix vectors and take a look.
  • 63. striped, polka dotted, fitted, button-up, high chroma this must be: preppy. tile-printed, paisley, bell-sleeved, fringed, tasseled, maxi this must be: boho.
  • 64. we did not define boho or preppy -- we discovered it! This is the cornerstone of our ‘style space’ that segments our clients and our merch.
  • 65. +1.2 - 0.3 +0.4 -4.0 +0.9 +6.7 - 0.44 + 0.9 +0.58 -0.7 +3.8 - 0.9 -2. 4 +0.8 +0.3 We don’t have just one dimension however…. … we have many more. What do each of those look like?
  • 66. Internally, we use 18 dimensions. The first three we call trendiness, femininity, and end use.
  • 67. For more search: “Understanding Latent Style” https://multithreaded.stitch fix.com/blog/2018/06/28/la tent-style/
  • 68. Advanced Matrix Factorization Notebooks we’ll be using: 01 Training a simple MF model.ipynb 02 Simple MF Model with Biases.ipynb
  • 69. This is now a typical matrix- factorization recommender. Rui = b + ωu + μi + pu⋅ qi
  • 70. This is now a typical matrix- factorization recommender. What if we know more “side” features -- like the user’s occupation? Great for when we want to “coldstart” a new user. Rui = b + ωu + μi + pu⋅ qi
  • 71. We have two choices for side features: - add them as a bias → Choose this if you think occupation changes like rate, but not which movies →e.g. “artists like movies more than other occupations” - add them as a user vector → Choose this if you think occupation changes depending on the item →e.g. “realtors love real estate shows” Rui = b + do + ωu + μi + ( pu+ to)⋅ qi *Note that to interacts with qi
  • 72. We have two choices for side features: - add them as a bias → Choose this if you think occupation changes like rate, but not which movies →e.g. “realtors love real estate shows” - add them as a user vector → Choose this if you think occupation changes movie selection →e.g. “realtors love real estate shows” Rui = b + do + ωu + μi + ( pu+ to)⋅ qi *Note that to interacts with qi
  • 73. Rui = b + ωu + μi + pu⋅ qi What about when features change in time? You’ve seen biases. You’ve seen user-item interactions.
  • 74. Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt What does a latent temporal vector encode? Here’s a toy example. t (time) m0= 0.0 1.0 Average User Rating 0 1 2 3 4 5 6 0.0 1.0
  • 75. Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt What does a latent temporal vector encode? Here’s a toy example. t (time) m0= 0.0 1.0 Average User Rating n0 = 1.0 0.0 n1 = nt = 1 - t t0 1 2 3 4 5 6 0.9 0.1 0.0 1.0
  • 76. Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt What does a latent temporal vector encode? Here’s a toy example. t (time) Average User Rating 0 1 2 3 4 5 6 m1= 0.0 1.0 n0 = 1.0 0.0 n1 = nt = 1 - t t 0.9 0.1 1.0 0.0
  • 77. Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt What does a latent temporal vector encode? A user mixture of time-series. m1= 0.0 1.0 n0 = 1.0 0.0 n1 = nt = 1 - t t 0.9 0.1 1.0 0.0 m0= 0.0 1.00.0 1.0 User 0 will have a 100% mixture of the 1st time series User 1 will have a 100% mixture of the 2nd time series
  • 78. Rui = b + ωu + μi + pu⋅ qi + mu⋅ nt We can enforce that nearby times have similar components. n0 = 1.0 0.0 n1 = nt = 1 - t t 0.9 0.1 Enforce that: |nt - nt-1| should be small.
  • 79. Word2Vec is actually a rec engine! Notebooks we’ll be using: 03 Simple MF Biases is actually word2vec.ipynb
  • 80. “ITEM_92 I think this fabric is wonderful (rayon & spandex). like the lace/embroidery accents”
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95. Instead of (users, items) we have (token1, token2) And instead of a Rating we have a skipgram count. But it’s still the same model! Rui = b + ωu + μi + pu⋅ qi
  • 96. Item bias token2 bias User bias token1 bias A single +1 or 0 rating Log Skipgram count User vector token1 vector Item vector token2 vector = Rui = ωu + μi + pu⋅ qi
  • 97. User bias token1 bias (How frequent is this word?) A single +1 or 0 rating Log Skipgram count User vector word1 vector Item vector word2 vector Item bias token2 bias (How frequent is this word?) = Rui = ωu + μi + pu⋅ qi Do word1 & word2 have a special interaction?
  • 98.
  • 99.
  • 100.
  • 101.
  • 102. In recommendation engines, there’s also directions in the latent space. What are the directions in your space? MORE TRENDY MORE PROFESSIONAL
  • 103. In recommendation engines, there’s also directions in the latent space. What are the directions in your space? MORE TRENDY MORE PROFESSIONAL
  • 104. Variational Matrix Factorization Notebooks we’ll be using: 08 Variational MF.ipynb
  • 105. Practical reasons to go variational: 1. Alternative regularization 2. Measure what your model doesn’t know. 3. Help explain your data.
  • 106. Practical reasons to go variational: 1. Alternative regularization 2. Measure what your model doesn’t know. 3. Help explain your data. 4. Short & fits in a tweet!
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114. Rui = b + ωu + μi + pu⋅ qi Let’s make this variational: 1. Replace point estimates with samples from a distribution. 2. Replace regularizing that point, regularize that distribution.
  • 115. WITHOUT VARIATIONAL embeddings = nn.Embedding(n_users, k) c_vector = embeddings(c_index) WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv)
  • 116. WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv) embeddings_mu embedding _lv
  • 117. WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv) embeddings_mu embedding _lv
  • 118. WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv) embeddings_mu embedding _lv
  • 119. WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv) embeddings_mu embedding _lv
  • 120. WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv) embeddings_mu embedding _lv
  • 121. def sample_gaussian(mu, lv): variance = sqrt(exp(lv)) sample = mu + N(0, 1) * variance return sample WITH VARIATIONAL embeddings_mu = nn.Embedding(n_users, k) embeddings_lv = nn.Embedding(n_users, k) ... vector_mu = embeddings_mu(c_index) vector_lv = embeddings_lv(c_index) c_vector = sample_gaussian(vector_mu, vector_lv)
  • 122. Rui = b + ωu + μi + pu⋅ qi Let’s make this variational: 1. Replace point estimates with samples from a distribution. 2. Replace regularizing that point with regularizing that distribution.
  • 123. WITH VARIATIONAL loss += gaussian_kldiv(vector_mu, vector_lv) WITHOUT VARIATIONAL loss += c_vector.pow(2.0).sum()
  • 124. WITH VARIATIONAL loss += gaussian_kldiv(vector_mu, vector_lv) WITHOUT VARIATIONAL loss += c_vector.pow(2.0).sum()
  • 126. More Things to Try FMs ● Useful when you have many kinds of interactions, not just user-item and user- time Mixture-of-Tastes ● Like “attention” for recommendations, allows for users to have multiple “tastes.”
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140. “Zelda” is a game in set 2D Euclidean space (roughly). ● Infinity can’t be seen ● Area is homogenous ● Geodesics are straight lines
  • 141. “Hyper Rogue” is a game in hyperbolic space. ● Radius = 1 is infinitely far away, but visible ● Volume increases towards boundary
  • 142. “Hyper Rogue” is a game in hyperbolic space. ● Radius = 1 is infinitely far away, but visible ● Volume increases towards boundary

Editor's Notes

  1. *Only* captures interactions. That means it captures all kinds of effects like:
  2. *Only* captures interactions. That means it captures all kinds of effects like:
  3. It powers: What are the design characteristics best liked by a trendy professional? We can share with our design team and vendors that black cloth, button-downs, and blazers would be good features to leverage. How many trendy boho items should we buy? We could share with our planning team what fraction of our clientele live in this region of style space, and stock accordingly. Or how about deciding whether two items are duplicative and should not both be purchased? We can calculate the item pair’s cosine similarity and surface these scores to our buyers.
  4. Try: lassie vs reservoir dogs vs star trek
  5. all deep learning text is running some sort of co-occurrence model. it’s a simple BOW baseline model that still gives you vectors that are aren’t immediately interpreted embeds both words & items in same space Let’s remove the fluff & simplify…
  6. X is a 2D matrix where each element is just the count of how many times i saw that pair of words near each other called “skip-grams” because we’re skipping tokens between both words
  7. This is a minor modification from Glove & word2vec-like algos LHS: take log of that count (words are log-distributed)
  8. This is a minor modification from Glove & word2vec-like algos LHS: take log of that count (words are log-distributed)
  9. This is a minor modification from Glove & word2vec-like algos LHS: take log of that count (words are log-distributed)
  10. you do get hints that these direction exist outside of t-sne the difference b/w m & w is same k & queen (arrows are the same size & direction)
  11. not just m&w, k&q … this is linear regularity
  12. we have hundreds of dimensions in word vectors, hundreds of potential directions
  13. most ML & deep learning is about optimizing a point estimate of your model parameters. variational are about optimizing a posterior, which loosely speaking expresses a spectrum of model configurations that are consistent w/ my data. 1. never seen a client? high uncertain. — having a model that knows when it doesn't know something is key to var methods 3. variational methods can reveal entailment, natural way of grouping data, sometimes hierarchical var w2v — entailment, brand new property of word vectors not in vanilla var FMs — capture what they don’t know, prioritize discovery, ex new drugs in medicine or new items in rec engines var tsne — build trust in model, viz whats going
  14. of course have this extra log variance matrix hanging around every word (king or queen or man or woman) has a row in this matrix what does this mean?
  15. not just at a point any more polysemous, lot of different contexts man: 1) human, male. 2) human beings in general 3) ‘sticking it to the man’ woman: has fewer king: ruler or chess piece variance not just a regularizer variance not just measuring ‘noise’ variance estimate tells me what my model claims is consistent w/ my data variane not just uncertainty — but actually giving me representations w/ novel properties
  16. word2gauss has entailment
  17. now not just capturing similarities, linear regularities but entailment, and property that words may be subsumed entirely by other tokens didn't have this in OG word2vec example of learning hierarchical representations
  18. get a diff sample every time backprop will affect both mean & variance
  19. now replace w/ two matrices, two row vectors mean matrix log variance matrix Until 2013 recently, the latter required formidable variational calculus and all manner of integral tricks and specialized math. But the beautiful reparameterization trick unifies both approaches, and is at the core of Bayesian Deep Learning methods. What's more, it's elegant and can be implemented in Python code fitting inside of a single tweet.
  20. 2nd piece of regularizing that distribution instead of point, penalize dist
  21. 2nd piece of regularizing that distribution instead of point, penalize dist
  22. Can use video games tell the story of Euclidean vs Poincare 3 points to make
  23. Hyper rogue is game set in hyperbolic space Read first two Next slide for geodesics
  24. Similar to an binary tree: Shortst path involves me going up the tree, and back down