SlideShare a Scribd company logo
1 of 37
Download to read offline
Diamond: Mixed Effects Models in Python
Timothy Sweetser
Stitch Fix
http://github.com/stitchfix/diamond
tsweetser@stitchfix.com
@hacktuarial
November 27, 2017
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 1 / 32
Overview
1 context and motivation
2 what is the mixed effects model
3 application to recommender systems
4 computation
5 diamond
6 appendix
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 2 / 32
context and motivation
Stitch Fix
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 3 / 32
what is the mixed effects model
Refresher: Linear Model
y ∼ N(Xβ, σ2
I)
y is n x 1
X is n x p
β is an unknown vector of length p
σ2 is an unknown, nonnegative constant
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 4 / 32
what is the mixed effects model
Mixed Effects Model
y|b ∼ N(Xβ + Zb, σ2
I)
We have a second set of features, Z, n x q
the coefficients on Z are b ∼ N(0, Σ)
Σ is q x q
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 5 / 32
what is the mixed effects model
simple example of a mixed effects model
You think there is some relationship between a woman’s height and the
ideal length of jeans for her:
length = α + β ∗ height +
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 6 / 32
what is the mixed effects model
simple example of a mixed effects model
You think there is some relationship between a woman’s height and the
ideal length of jeans for her:
length = α + β ∗ height +
But, you think the length might need to be shorter or longer, depending
on the silhouette of the jeans. In other words, you want α to vary by
silhouette.
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 6 / 32
what is the mixed effects model
why might silhouette affect length ∼ height?
Skinny
Bootcut
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 7 / 32
what is the mixed effects model
linear model: formula
Linear models can be expressed in formula notation, used by patsy,
statsmodels, and R
import statsmodels.formula.api as smf
lm = smf.ols(’length ~ 1 + height ’, data=train_df).fit()
in math, this means length = Xβ +
Xi = [1.0, 64.0]
β is what we want to learn, using (customer, item) data from jeans
that fit well
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 8 / 32
what is the mixed effects model
linear model: illustration
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 9 / 32
what is the mixed effects model
mixed effects: formula
Now, allow the intercept to vary by silhouette
mix = smf.mixedlm(’length ~ 1 + height ’,
data=train_df ,
re_formula=’1’,
groups=’silhouette ’,
use_sparse=True).fit()
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 10 / 32
what is the mixed effects model
illustration
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 11 / 32
what is the mixed effects model
mixed effects regularization
y|b ∼ N(Xβ + Zb, σ2
I)
Sort by silhouette:
Z =




1bootcut 0 0 0
0 1skinny 0 0
0 0 1straight 0
0 0 0 1wide




X is n x 2
Z is n x 4
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 12 / 32
what is the mixed effects model
matrices and formulas - mixed effects
Zb =




1bootcut 0 0 0
0 1skinny 0 0
0 0 1straight 0
0 0 0 1wide








µbootcut
µskinny
µstraight
µwide




Each µsilhouette is drawn from N(0, σ2)
This allows for deviations from the average effects, µ and β, by
silhouette, to the extend that the data support it
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 13 / 32
application to recommender systems
a basic model
rating ∼ 1 + (1|user id) + (1|item id)
In math, this means
rui = µ + αu + βi + ui
where
µ is an unknown constant
αu ∼ N(0, σ2
user )
βi ∼ N(0, σ2
item)
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 14 / 32
application to recommender systems
a basic model
rating ∼ 1 + (1|user id) + (1|item id)
In math, this means
rui = µ + αu + βi + ui
where
µ is an unknown constant
αu ∼ N(0, σ2
user )
βi ∼ N(0, σ2
item)
some items are more popular than others
some users are more picky than others
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 14 / 32
application to recommender systems
add features
rating ∼ 1 + (1 + item feature1 + item feature2|user id)+
(1 + user feature1 + user feature2|item id)
Now,
αu ∼ N(0, Σuser )
βi ∼ N(0, Σitem)
the good: we’re using features! learn individual and shared preferences
helps with new items, new users
the bad: scales as O(p2)
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 15 / 32
application to recommender systems
comments
rating ∼ 1 + (1 + item feature1 + item feature2|user id)+
(1 + user feature1 + user feature2|item id)
this is a parametric model, and much less flexible than trees, neural
networks, or matrix factorization
but you don’t have to choose!
you can use an ensemble, or use this as a feature in another model
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 16 / 32
computation
computation
How can you fit models like this? We were using R’s lme4 package
Maximum likelihood computation works like this:
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
computation
computation
How can you fit models like this? We were using R’s lme4 package
Maximum likelihood computation works like this:
Estimate covariance structure of random effects, Σ
given Σ, estimate coefficients β and b
with these, compute loglikelihood
repeat until convergence
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
computation
computation
How can you fit models like this? We were using R’s lme4 package
Maximum likelihood computation works like this:
Estimate covariance structure of random effects, Σ
given Σ, estimate coefficients β and b
with these, compute loglikelihood
repeat until convergence
Doesn’t scale well with number of observations, n
lme4 supports a variety of generalized linear models, but is not
optimized for any one in particular
Is it really necessary to update hyperparameters Σ every time you
estimate the coefficients?
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
computation
diamond
Diamond solves a similar problem using these tricks:
Input Σ. Conditional on Σ, the optimization problem is convex
Use Hessian of L2 penalized loglikelihood function (pencil + paper)
logistic regression
cumulative logistic regression, for ordinal responses
if Y ∈ (1, 2, 3, . . . , J),
log
Pr(Y ≤ j)
1 − Pr(Y ≤ j)
= αj + βT
x
for j = 1, 2, . . . , J − 1
quasi-Newton optimization techniques from Minka 2003
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 18 / 32
computation
other solvers
How else could you fit mixed effects models?
”Exact” methods
Full Bayes: MCMC. e.g. PyStan, PyMC3, Edward
diamond, but you must specify the hyperparameters Σ
statsmodels only supports linear regression for Gaussian-distributed
outcomes
R/lme4
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 19 / 32
computation
other solvers
How else could you fit mixed effects models?
”Exact” methods
Full Bayes: MCMC. e.g. PyStan, PyMC3, Edward
diamond, but you must specify the hyperparameters Σ
statsmodels only supports linear regression for Gaussian-distributed
outcomes
R/lme4
Approximate methods
Simple, global L2 regularization
Full Bayes: Variational Inference
moment-based methods
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 19 / 32
diamond
Speed test
MovieLens, 20M observations like (userId, movieId, rating)
binarize (ordinal!) rating → 1(rating > 3.5)
this is well-balanced
Fit a model like
rating ∼ 1 + (1|user id) + (1|item id)
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 20 / 32
diamond
diamond
from diamond.glms.logistic import LogisticRegression
import pandas as pd
train_df = ...
priors_df = pd.DataFrame({
’group ’: [’userId ’, ’movieId ’],
’var1 ’: [’intercept ’] * 2,
’var2 ’: [np.nan , np.nan],
’vcov ’: [0.9, 1.0]
})
m = LogisticRegression (train_df=train_df , priors_df=
priors_df)
results = m.fit(’liked ~ 1 + (1 | userId) + (1 | movieId)’,
tol=1e-5, max_its=200 , verbose=True)
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 21 / 32
diamond
Speed test vs. sklearn
Diamond
estimate covariance on sample of 1M observations in R. 1-time, 60
minutes
σ2
user = 0.9, σ2
movie = 1.0
Takes 83 minutes on my laptop to fit in diamond
sklearn LogisticRegression
use cross validation to estimate regularization. 1-time, takes 24
minutes
grid search would be a fairer comparison
refit takes 1 minute
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 22 / 32
diamond
diamond vs. sklearn predictions
Global L2 regularization is a good approximation for this problem, but may
not work as well when σ2
user >> σ2
item, vice versa, or for more models with
more features
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 23 / 32
diamond
diamond vs. R
lme4 takes more than 360 minutes to fit
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 24 / 32
diamond
diamond vs. moment-based
active area of research by statisticians at Stanford, NYU, elsewhere
very fast to fit simple models using method of moments
e.g. rating ∼ 1 + (1 + x|user id)
or rating ∼ 1 + (1|user id) + (1|item id)
Fitting this to movie lens 20M took 4 minutes
but not rating ∼ 1 + (1 + x|user id) + (1|item id)
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 25 / 32
diamond
diamond vs. variational inference
I fit this model in under 5 minutes using Edward, and didn’t have to
input Σ.
VI is very promising!
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 26 / 32
diamond
why use diamond?
http://github.com/stitchfix/diamond
scales well with number of observations (compared to pure R, MCMC)
solves the exact problem (compared to variational, moment-based)
scales ok with P (compared to simple global L2)
supports ordinal logistic regression
if Y ∈ (1, 2, 3, . . . , J),
log
Pr(Y ≤ j)
1 − Pr(Y ≤ j)
= αj + βT
x
for j = 1, 2, . . . , J − 1
Reference: Agresti, Categorical Data Analysis
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 27 / 32
diamond
summary
mixed effects models are useful for recommender systems and other
data science applications
they can be hard to fit for large datasets
they play well with other kinds of models
diamond, moment-based approaches, and variational inference are
good ways to estimate models quickly
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 28 / 32
diamond
discussion
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 29 / 32
diamond
References I
Patrick Perry (2015)
Moment Based Estimation for Hierarchical Models
https://arxiv.org/abs/1504.04941
Alan Agresti (2012)
Categorical Data Analysis, 3rd Ed.
ISBN-13 978-0470463635
Gao + Owen (2016)
Estimation and Inference for Very Large Linear Mixed Effects Models
https://arxiv.org/abs/1610.08088
Edward
A Library for probabilistic modeling, inference, and criticism.
https://github.com/blei-lab/edward
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 30 / 32
diamond
References II
inka
A comparison of numerical optimizers for logistic regression
https://tminka.github.io/papers/logreg/minka-logreg.pdf
me4
https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 31 / 32
appendix
regularization
Usual L2 regularization. If each βi ∼ N(0, 1
λ )
minimize
β
loss +
1
2
βT
(λIp)β
Here, the four b coefficient vectors are samples from N(0, Σ). If we knew
Σ, the regularization would be
minimize
b
loss +
1
2
bT




Σ−1 0 0 0
0 Σ−1 0 0
0 0 Σ−1 0
0 0 0 Σ−1



 b
Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 32 / 32

More Related Content

Similar to Diamond mixed effects models in Python

Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the RescueDomino Data Lab
 
Introduction to probability and Statistics
Introduction to probability and Statistics Introduction to probability and Statistics
Introduction to probability and Statistics XOLISWA MASHIYANE
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Finance through algorithmic lens
Finance through algorithmic lensFinance through algorithmic lens
Finance through algorithmic lensMaxim Litvak
 
Smack: Behind the Refactorings
Smack: Behind the RefactoringsSmack: Behind the Refactorings
Smack: Behind the RefactoringsPharo
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsJen Stirrup
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RSean Davis
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems ResearchAlan Said
 
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptx
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptxLinear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptx
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptxDina Ismail
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)Duke Network Analysis Center
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and PracticeKimikazu Kato
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
Social interactions and beyond
Social interactions and beyondSocial interactions and beyond
Social interactions and beyondFrancisco Restivo
 

Similar to Diamond mixed effects models in Python (20)

Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the Rescue
 
Introduction to probability and Statistics
Introduction to probability and Statistics Introduction to probability and Statistics
Introduction to probability and Statistics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
 
Data matching.ppt
Data matching.pptData matching.ppt
Data matching.ppt
 
Finance through algorithmic lens
Finance through algorithmic lensFinance through algorithmic lens
Finance through algorithmic lens
 
Smack: Behind the Refactorings
Smack: Behind the RefactoringsSmack: Behind the Refactorings
Smack: Behind the Refactorings
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
 
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptx
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptxLinear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptx
Linear Equations and Systems - Mathematics - 8th Grade by Slidesgo.pptx
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and Practice
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
Mathvocabulary
MathvocabularyMathvocabulary
Mathvocabulary
 
Social interactions and beyond
Social interactions and beyondSocial interactions and beyond
Social interactions and beyond
 
Origins of free
Origins of freeOrigins of free
Origins of free
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Diamond mixed effects models in Python

  • 1. Diamond: Mixed Effects Models in Python Timothy Sweetser Stitch Fix http://github.com/stitchfix/diamond tsweetser@stitchfix.com @hacktuarial November 27, 2017 Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 1 / 32
  • 2. Overview 1 context and motivation 2 what is the mixed effects model 3 application to recommender systems 4 computation 5 diamond 6 appendix Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 2 / 32
  • 3. context and motivation Stitch Fix Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 3 / 32
  • 4. what is the mixed effects model Refresher: Linear Model y ∼ N(Xβ, σ2 I) y is n x 1 X is n x p β is an unknown vector of length p σ2 is an unknown, nonnegative constant Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 4 / 32
  • 5. what is the mixed effects model Mixed Effects Model y|b ∼ N(Xβ + Zb, σ2 I) We have a second set of features, Z, n x q the coefficients on Z are b ∼ N(0, Σ) Σ is q x q Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 5 / 32
  • 6. what is the mixed effects model simple example of a mixed effects model You think there is some relationship between a woman’s height and the ideal length of jeans for her: length = α + β ∗ height + Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 6 / 32
  • 7. what is the mixed effects model simple example of a mixed effects model You think there is some relationship between a woman’s height and the ideal length of jeans for her: length = α + β ∗ height + But, you think the length might need to be shorter or longer, depending on the silhouette of the jeans. In other words, you want α to vary by silhouette. Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 6 / 32
  • 8. what is the mixed effects model why might silhouette affect length ∼ height? Skinny Bootcut Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 7 / 32
  • 9. what is the mixed effects model linear model: formula Linear models can be expressed in formula notation, used by patsy, statsmodels, and R import statsmodels.formula.api as smf lm = smf.ols(’length ~ 1 + height ’, data=train_df).fit() in math, this means length = Xβ + Xi = [1.0, 64.0] β is what we want to learn, using (customer, item) data from jeans that fit well Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 8 / 32
  • 10. what is the mixed effects model linear model: illustration Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 9 / 32
  • 11. what is the mixed effects model mixed effects: formula Now, allow the intercept to vary by silhouette mix = smf.mixedlm(’length ~ 1 + height ’, data=train_df , re_formula=’1’, groups=’silhouette ’, use_sparse=True).fit() Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 10 / 32
  • 12. what is the mixed effects model illustration Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 11 / 32
  • 13. what is the mixed effects model mixed effects regularization y|b ∼ N(Xβ + Zb, σ2 I) Sort by silhouette: Z =     1bootcut 0 0 0 0 1skinny 0 0 0 0 1straight 0 0 0 0 1wide     X is n x 2 Z is n x 4 Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 12 / 32
  • 14. what is the mixed effects model matrices and formulas - mixed effects Zb =     1bootcut 0 0 0 0 1skinny 0 0 0 0 1straight 0 0 0 0 1wide         µbootcut µskinny µstraight µwide     Each µsilhouette is drawn from N(0, σ2) This allows for deviations from the average effects, µ and β, by silhouette, to the extend that the data support it Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 13 / 32
  • 15. application to recommender systems a basic model rating ∼ 1 + (1|user id) + (1|item id) In math, this means rui = µ + αu + βi + ui where µ is an unknown constant αu ∼ N(0, σ2 user ) βi ∼ N(0, σ2 item) Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 14 / 32
  • 16. application to recommender systems a basic model rating ∼ 1 + (1|user id) + (1|item id) In math, this means rui = µ + αu + βi + ui where µ is an unknown constant αu ∼ N(0, σ2 user ) βi ∼ N(0, σ2 item) some items are more popular than others some users are more picky than others Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 14 / 32
  • 17. application to recommender systems add features rating ∼ 1 + (1 + item feature1 + item feature2|user id)+ (1 + user feature1 + user feature2|item id) Now, αu ∼ N(0, Σuser ) βi ∼ N(0, Σitem) the good: we’re using features! learn individual and shared preferences helps with new items, new users the bad: scales as O(p2) Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 15 / 32
  • 18. application to recommender systems comments rating ∼ 1 + (1 + item feature1 + item feature2|user id)+ (1 + user feature1 + user feature2|item id) this is a parametric model, and much less flexible than trees, neural networks, or matrix factorization but you don’t have to choose! you can use an ensemble, or use this as a feature in another model Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 16 / 32
  • 19. computation computation How can you fit models like this? We were using R’s lme4 package Maximum likelihood computation works like this: Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
  • 20. computation computation How can you fit models like this? We were using R’s lme4 package Maximum likelihood computation works like this: Estimate covariance structure of random effects, Σ given Σ, estimate coefficients β and b with these, compute loglikelihood repeat until convergence Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
  • 21. computation computation How can you fit models like this? We were using R’s lme4 package Maximum likelihood computation works like this: Estimate covariance structure of random effects, Σ given Σ, estimate coefficients β and b with these, compute loglikelihood repeat until convergence Doesn’t scale well with number of observations, n lme4 supports a variety of generalized linear models, but is not optimized for any one in particular Is it really necessary to update hyperparameters Σ every time you estimate the coefficients? Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 17 / 32
  • 22. computation diamond Diamond solves a similar problem using these tricks: Input Σ. Conditional on Σ, the optimization problem is convex Use Hessian of L2 penalized loglikelihood function (pencil + paper) logistic regression cumulative logistic regression, for ordinal responses if Y ∈ (1, 2, 3, . . . , J), log Pr(Y ≤ j) 1 − Pr(Y ≤ j) = αj + βT x for j = 1, 2, . . . , J − 1 quasi-Newton optimization techniques from Minka 2003 Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 18 / 32
  • 23. computation other solvers How else could you fit mixed effects models? ”Exact” methods Full Bayes: MCMC. e.g. PyStan, PyMC3, Edward diamond, but you must specify the hyperparameters Σ statsmodels only supports linear regression for Gaussian-distributed outcomes R/lme4 Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 19 / 32
  • 24. computation other solvers How else could you fit mixed effects models? ”Exact” methods Full Bayes: MCMC. e.g. PyStan, PyMC3, Edward diamond, but you must specify the hyperparameters Σ statsmodels only supports linear regression for Gaussian-distributed outcomes R/lme4 Approximate methods Simple, global L2 regularization Full Bayes: Variational Inference moment-based methods Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 19 / 32
  • 25. diamond Speed test MovieLens, 20M observations like (userId, movieId, rating) binarize (ordinal!) rating → 1(rating > 3.5) this is well-balanced Fit a model like rating ∼ 1 + (1|user id) + (1|item id) Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 20 / 32
  • 26. diamond diamond from diamond.glms.logistic import LogisticRegression import pandas as pd train_df = ... priors_df = pd.DataFrame({ ’group ’: [’userId ’, ’movieId ’], ’var1 ’: [’intercept ’] * 2, ’var2 ’: [np.nan , np.nan], ’vcov ’: [0.9, 1.0] }) m = LogisticRegression (train_df=train_df , priors_df= priors_df) results = m.fit(’liked ~ 1 + (1 | userId) + (1 | movieId)’, tol=1e-5, max_its=200 , verbose=True) Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 21 / 32
  • 27. diamond Speed test vs. sklearn Diamond estimate covariance on sample of 1M observations in R. 1-time, 60 minutes σ2 user = 0.9, σ2 movie = 1.0 Takes 83 minutes on my laptop to fit in diamond sklearn LogisticRegression use cross validation to estimate regularization. 1-time, takes 24 minutes grid search would be a fairer comparison refit takes 1 minute Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 22 / 32
  • 28. diamond diamond vs. sklearn predictions Global L2 regularization is a good approximation for this problem, but may not work as well when σ2 user >> σ2 item, vice versa, or for more models with more features Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 23 / 32
  • 29. diamond diamond vs. R lme4 takes more than 360 minutes to fit Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 24 / 32
  • 30. diamond diamond vs. moment-based active area of research by statisticians at Stanford, NYU, elsewhere very fast to fit simple models using method of moments e.g. rating ∼ 1 + (1 + x|user id) or rating ∼ 1 + (1|user id) + (1|item id) Fitting this to movie lens 20M took 4 minutes but not rating ∼ 1 + (1 + x|user id) + (1|item id) Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 25 / 32
  • 31. diamond diamond vs. variational inference I fit this model in under 5 minutes using Edward, and didn’t have to input Σ. VI is very promising! Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 26 / 32
  • 32. diamond why use diamond? http://github.com/stitchfix/diamond scales well with number of observations (compared to pure R, MCMC) solves the exact problem (compared to variational, moment-based) scales ok with P (compared to simple global L2) supports ordinal logistic regression if Y ∈ (1, 2, 3, . . . , J), log Pr(Y ≤ j) 1 − Pr(Y ≤ j) = αj + βT x for j = 1, 2, . . . , J − 1 Reference: Agresti, Categorical Data Analysis Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 27 / 32
  • 33. diamond summary mixed effects models are useful for recommender systems and other data science applications they can be hard to fit for large datasets they play well with other kinds of models diamond, moment-based approaches, and variational inference are good ways to estimate models quickly Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 28 / 32
  • 34. diamond discussion Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 29 / 32
  • 35. diamond References I Patrick Perry (2015) Moment Based Estimation for Hierarchical Models https://arxiv.org/abs/1504.04941 Alan Agresti (2012) Categorical Data Analysis, 3rd Ed. ISBN-13 978-0470463635 Gao + Owen (2016) Estimation and Inference for Very Large Linear Mixed Effects Models https://arxiv.org/abs/1610.08088 Edward A Library for probabilistic modeling, inference, and criticism. https://github.com/blei-lab/edward Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 30 / 32
  • 36. diamond References II inka A comparison of numerical optimizers for logistic regression https://tminka.github.io/papers/logreg/minka-logreg.pdf me4 https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 31 / 32
  • 37. appendix regularization Usual L2 regularization. If each βi ∼ N(0, 1 λ ) minimize β loss + 1 2 βT (λIp)β Here, the four b coefficient vectors are samples from N(0, Σ). If we knew Σ, the regularization would be minimize b loss + 1 2 bT     Σ−1 0 0 0 0 Σ−1 0 0 0 0 Σ−1 0 0 0 0 Σ−1     b Timothy Sweetser (Stitch Fix) Diamond November 27, 2017 32 / 32