Scaling & Transforming Stitch Fix's Visibility into What Folks will love

Scaling and Transforming
Visibility into
What People Will Love
June Andrews
Lunch n Learn April 28 2021

Agenda
Design the Line Architecture
Story of Development
- Ways of Working
- Component Level Learnings
Elevate

Matching Service Between People & Fashion

Transforming Stitch Fix’s Visibility
into What People Will Love
Design the Line

Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
HyperParamet
erOptimization
Design
the
Line

Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Ways
of
Working
HyperParamet
erOptimization

Project Management Guide to Stages of ML

2020 Hired the first person into the role of ML
Integration. This role has been a foundational unlock in
designing ML systems.
About the Role
This role is responsible for unlocking business
opportunities for Stitch Fix to more efficiently grow
merchandise by leveraging in house ML products. On a
daily basis, this may involve researching how merchandise
is purchased for Stitch Fix or coding customizations to our
existing ML products to enable new use cases. This role
will involve both a solid understanding of machine learning
products from features to evaluation, and the creativity to
see how ML can be integrated into Stitch Fix for better
buying decisions and more efficient operations.
ML Integration

Set a Standard of Development
The standard doesn’t have to be the highest bar, but uniformity is a good baseline
Code Standards:
○ PEP 8/Black/Lint/etc
○ Google Python Style Guide
○ Documentation/Sphinx
Testing:
○ Unit/Integration/%
○ Deployment processes
Code Reviews:
○ Primary/Secondary Reviewers
○ Size of a Code Review
Blocker Resolution & Feedback Processes

Steel Thread v Modular Development
Modular Development
○ Create an overall architecture map
○ Mock out endpoints
○ Build deep within each module
○ Connect modules all together at the end
○ Release with a fully ﬂedged product
Steel Thread
○ Create an overall architecture map
○ Mock out endpoints
○ Build bare minimum for each module
○ Connect modules as quickly as possible
○ Release with a ‘make it barely work’ product
○ Rapid tuning of bottlenecks for a ‘it works’
product
○ Long term investment in upgrading modules
Boehm's Spiral???

Modular Development
○ Great for known complexity
○ Good ROI of development
○ Increasingly Available
Steel Thread
○ Quicker release of major milestones
○ Laser focus
○ Requires
Steel Thread v Modular Development

Enable Focus:
○ Daily Stand Ups
○ Complete List of Everything that Needed Built
○ Steel Thread naturally lends itself to bite sized
tasks
○ Use low uncertainty solutions
○ Increased Pair Coding over Code Reviews
○ Clear Cross Functional Buy In
○ Mocked Out Endpoints Between Teams
Take Care of People:
○ Rotating ‘adjustment’ PTO
○ Mental Health Days
○ Pre-emptive No Meeting Days
○ Customize to what people need
○ “It’s okay to be happy at work. It’s okay to enjoy
being good at what you do.”
○ Increased online social interactions, lunches, etc
2020 Steel Thread Support

Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Component
Level
Learnings
HyperParamet
erOptimization

Stages of Development:
- Count level metrics
- Ratio metrics
- Domain Speciﬁc Business Value metrics
- Historically corrected labels
- [Wishlist] Distribution of a metric labels
Gotchas: Expect rapid schema changes as client
metadata, business context, and metrics evolve
with the business.
Labeler
Labeler
Client Sales
Client
Metadata

Metric Stability is a function of different levels of certainty. Fashion (and the stock market) have high levels of chaotic
inﬂuence, much higher than many areas of tech.
Manage with adding 2nd and 3rd moment metrics for gauging stability of predictions in production. Ie, not just absolute
loss, but also standard deviation of error and higher moments.
Labeler
Deterministic
Influence
Probabilistic
Influence
Chaotic Influence
Known Victory Lap Continuous
Development
Use in Confidence
Bounds
Unknown -
measurable
Roadmap Roadmap
Unknown -
unmeasurable

In Steel Thread development, pick a feature family covering each of the main
types of data {categorical, numerical, image, text} to put strong connectors
in place between each of the components. If the connectors are strong, then
additional feature families can be added at a later date without breaking
downstream data type assumptions.
Gotchas: Client Input features are calculated on a different timeline than
ML computed features. Handle by allowing null features to be returned and
taken into account at the model routing stage.
Featurization
Image
Service
Featurization
Client Input
Priors

Why do embeddings work?
○ There’s a lot of space in high dimensions. The
probability adding a set of vectors together lands
near a point is extraordinarily low.
What is a meaningful level of near in a high dimensional
space?
○ Use the variation of known similar vectors to
create a localized meaningful distance threshold.
[Tunkelang]
Featurization - Embeddings

To prevent time travel, have to create a “memoryless circuit” at training time
where only as much information that would be known at inference time is
known about the training data.
Common Forms of Time Travel:
- Randomly assigned test & train data sets
- Duplicate records of varying degrees
- Features calculated off of current tables v historical snapshots
Data Set Creation

This is the fun stuff. Try to build with an interchangeable parts
mindset to enable rapid iteration.
Gotchas
- Choosing a common set of tooling and approaches will
enable more dynamic resourcing for sprints
- Using default parameters that slow down the pipeline
for no improvement in accuracy
Model Training
Model Training
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
HyperParamet
erOptimization

UMAP
○ Faster than T-SNE
○ Biased towards preserving short distances at the
expense of ignoring large distances
T-SNE
○ Groundbreaking way to visualize high dimensional data
over large datasets
○ Preserves large distances at the expense of local
distances
PCA
○ Doesn’t do well with the cloudiness of large, high
dimensional data sets. If the dimensionality is large
enough nearly all points are equidistant.
LDA (may not be applicable)
Dimensionality Reduction

In practice, the dimensionality reduction step is a hybrid
approach with features being grouped for different levels of
compression.
Ie, price features should not be compressed, but embedding
features should be.
Dimensionality Reduction (in practice)

Grid Search
○ Higher dimensional spaces lead to spending most
of the time searching the boundary of the
parameter space
Random Search
○ Better distribution of evenly searching the space
Bayesian Optimization
○ At least as good as random … but so much quicker
It’s Free
○ Stop spending so many resources re-coding a free
solution, you won’t be able to beat
○ …...SigOpt
Hyper Parameter Optimization (HPO)

Many model types are primarily good at sorting
datasets, but struggle with biases that can cause
absolutely accuracy to suffer.
Calibration corrects for known biases to improve
absolute accuracy.
Calibration

...a quick aside about ...
Model Evaluation

Hard to get multiple points of measurement to know
when the chasm is crossed.
Measure 2 points:
○ Current system performance (Human
Accuracy)
○ Perfect performance
Rule of Thumb:
○ Release once ML accuracy is greater than the
split
Split the Difference

Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Maximizing Reuse
HyperParamet
erOptimization

Common System Parameters
○ Client Segment
○ Business Context
○ Target Metric
○ Time Scale
Software Engineering Best Practices:
○ Every Degree of Freedom in a system has a
cost for maintenance, design complexity.
○ Adding Degrees of Freedom often requires a
refactor
Set Flexible & Narrow System Parameters

With enabling predictions to be used in multiple
contexts, providing predictions in context is
important for enabling strong decision making.
Examples of setting context
○ Provide a summary recommendation of buy,
unknown, or don’t buy
○ Provide a historical baseline of performance
with those predictions
○ Provide an example of the next best or most
similar item already in the system
Context Context Context

Elevate Program was designed to give a leg up to
emerging BIPOC designers at a time when it was
needed.
Access to data insights, predictions, and early
product market ﬁt indicators for scaling help plan
supply chains, highlight growth areas, and help
emerging brands optimize their digital presence.
Building recommender systems is expensive, reusing
them is cheap. I encourage folks to think about how
their work can be reused by building up compassion
for what will help others.
Elevate Program

Scaling & Transforming Stitch Fix's Visibility into What Folks will love

Recommended

Recommended

More Related Content

Similar to Scaling & Transforming Stitch Fix's Visibility into What Folks will love

Similar to Scaling & Transforming Stitch Fix's Visibility into What Folks will love (20)

More from June Andrews

More from June Andrews (14)

Recently uploaded

Recently uploaded (20)

Scaling & Transforming Stitch Fix's Visibility into What Folks will love