SlideShare a Scribd company logo
1 of 43
Download to read offline
Scaling and Transforming
Visibility into
What People Will Love
June Andrews
Lunch n Learn April 28 2021
Agenda
Design the Line Architecture
Story of Development
- Ways of Working
- Component Level Learnings
Elevate
Matching Service Between People & Fashion
Transforming Stitch Fix’s Visibility
into What People Will Love
Design the Line
Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
HyperParamet
erOptimization
Design
the
Line
Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Ways
of
Working
HyperParamet
erOptimization
Project Management Guide to Stages of ML
2020 Hired the first person into the role of ML
Integration. This role has been a foundational unlock in
designing ML systems.
About the Role
This role is responsible for unlocking business
opportunities for Stitch Fix to more efficiently grow
merchandise by leveraging in house ML products. On a
daily basis, this may involve researching how merchandise
is purchased for Stitch Fix or coding customizations to our
existing ML products to enable new use cases. This role
will involve both a solid understanding of machine learning
products from features to evaluation, and the creativity to
see how ML can be integrated into Stitch Fix for better
buying decisions and more efficient operations.
ML Integration
Set a Standard of Development
The standard doesn’t have to be the highest bar, but uniformity is a good baseline
Code Standards:
○ PEP 8/Black/Lint/etc
○ Google Python Style Guide
○ Documentation/Sphinx
Testing:
○ Unit/Integration/%
○ Deployment processes
Code Reviews:
○ Primary/Secondary Reviewers
○ Size of a Code Review
Blocker Resolution & Feedback Processes
Steel Thread v Modular Development
Modular Development
○ Create an overall architecture map
○ Mock out endpoints
○ Build deep within each module
○ Connect modules all together at the end
○ Release with a fully fledged product
Steel Thread
○ Create an overall architecture map
○ Mock out endpoints
○ Build bare minimum for each module
○ Connect modules as quickly as possible
○ Release with a ‘make it barely work’ product
○ Rapid tuning of bottlenecks for a ‘it works’
product
○ Long term investment in upgrading modules
Boehm's Spiral???
Modular Development
○ Great for known complexity
○ Good ROI of development
○ Increasingly Available
Steel Thread
○ Quicker release of major milestones
○ Laser focus
○ Requires
Steel Thread v Modular Development
Enable Focus:
○ Daily Stand Ups
○ Complete List of Everything that Needed Built
○ Steel Thread naturally lends itself to bite sized
tasks
○ Use low uncertainty solutions
○ Increased Pair Coding over Code Reviews
○ Clear Cross Functional Buy In
○ Mocked Out Endpoints Between Teams
Take Care of People:
○ Rotating ‘adjustment’ PTO
○ Mental Health Days
○ Pre-emptive No Meeting Days
○ Customize to what people need
○ “It’s okay to be happy at work. It’s okay to enjoy
being good at what you do.”
○ Increased online social interactions, lunches, etc
2020 Steel Thread Support
Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Component
Level
Learnings
HyperParamet
erOptimization
Stages of Development:
- Count level metrics
- Ratio metrics
- Domain Specific Business Value metrics
- Historically corrected labels
- [Wishlist] Distribution of a metric labels
Gotchas: Expect rapid schema changes as client
metadata, business context, and metrics evolve
with the business.
Labeler
Labeler
Client Sales
Client
Metadata
Metric Stability is a function of different levels of certainty. Fashion (and the stock market) have high levels of chaotic
influence, much higher than many areas of tech.
Manage with adding 2nd and 3rd moment metrics for gauging stability of predictions in production. Ie, not just absolute
loss, but also standard deviation of error and higher moments.
Labeler
Deterministic
Influence
Probabilistic
Influence
Chaotic Influence
Known Victory Lap Continuous
Development
Use in Confidence
Bounds
Unknown -
measurable
Roadmap Roadmap
Unknown -
unmeasurable
In Steel Thread development, pick a feature family covering each of the main
types of data {categorical, numerical, image, text} to put strong connectors
in place between each of the components. If the connectors are strong, then
additional feature families can be added at a later date without breaking
downstream data type assumptions.
Gotchas: Client Input features are calculated on a different timeline than
ML computed features. Handle by allowing null features to be returned and
taken into account at the model routing stage.
Featurization
Image
Service
Featurization
Client Input
Priors
Why do embeddings work?
○ There’s a lot of space in high dimensions. The
probability adding a set of vectors together lands
near a point is extraordinarily low.
What is a meaningful level of near in a high dimensional
space?
○ Use the variation of known similar vectors to
create a localized meaningful distance threshold.
[Tunkelang]
Featurization - Embeddings
To prevent time travel, have to create a “memoryless circuit” at training time
where only as much information that would be known at inference time is
known about the training data.
Common Forms of Time Travel:
- Randomly assigned test & train data sets
- Duplicate records of varying degrees
- Features calculated off of current tables v historical snapshots
Data Set Creation
This is the fun stuff. Try to build with an interchangeable parts
mindset to enable rapid iteration.
Gotchas
- Choosing a common set of tooling and approaches will
enable more dynamic resourcing for sprints
- Using default parameters that slow down the pipeline
for no improvement in accuracy
Model Training
Model Training
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
HyperParamet
erOptimization
UMAP
○ Faster than T-SNE
○ Biased towards preserving short distances at the
expense of ignoring large distances
T-SNE
○ Groundbreaking way to visualize high dimensional data
over large datasets
○ Preserves large distances at the expense of local
distances
PCA
○ Doesn’t do well with the cloudiness of large, high
dimensional data sets. If the dimensionality is large
enough nearly all points are equidistant.
LDA (may not be applicable)
Dimensionality Reduction
In practice, the dimensionality reduction step is a hybrid
approach with features being grouped for different levels of
compression.
Ie, price features should not be compressed, but embedding
features should be.
Dimensionality Reduction (in practice)
Grid Search
○ Higher dimensional spaces lead to spending most
of the time searching the boundary of the
parameter space
Random Search
○ Better distribution of evenly searching the space
Bayesian Optimization
○ At least as good as random … but so much quicker
It’s Free
○ Stop spending so many resources re-coding a free
solution, you won’t be able to beat
○ …...SigOpt
Hyper Parameter Optimization (HPO)
Many model types are primarily good at sorting
datasets, but struggle with biases that can cause
absolutely accuracy to suffer.
Calibration corrects for known biases to improve
absolute accuracy.
Calibration
...a quick aside about ...
Model Evaluation
Hard to get multiple points of measurement to know
when the chasm is crossed.
Measure 2 points:
○ Current system performance (Human
Accuracy)
○ Perfect performance
Rule of Thumb:
○ Release once ML accuracy is greater than the
split
Split the Difference
Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Component
Level
Learnings
HyperParamet
erOptimization
Model Training
Data Set
Construction
Online
Featurization
Model
Router
Prediction
Storage
Model
Storage
Inference
Training
Image
Service
Upload
New Styles
Prediction
Reports
Featurization
Labeler
Client Sales
Client Input
Priors
Dimensionality
Reduction
Calibration
{GBDT,
Regression, etc}
Model
Evaluation
Model
Deployment
Client
Metadata
Maximizing Reuse
HyperParamet
erOptimization
Common System Parameters
○ Client Segment
○ Business Context
○ Target Metric
○ Time Scale
Software Engineering Best Practices:
○ Every Degree of Freedom in a system has a
cost for maintenance, design complexity.
○ Adding Degrees of Freedom often requires a
refactor
Set Flexible & Narrow System Parameters
With enabling predictions to be used in multiple
contexts, providing predictions in context is
important for enabling strong decision making.
Examples of setting context
○ Provide a summary recommendation of buy,
unknown, or don’t buy
○ Provide a historical baseline of performance
with those predictions
○ Provide an example of the next best or most
similar item already in the system
Context Context Context
Elevate Program was designed to give a leg up to
emerging BIPOC designers at a time when it was
needed.
Access to data insights, predictions, and early
product market fit indicators for scaling help plan
supply chains, highlight growth areas, and help
emerging brands optimize their digital presence.
Building recommender systems is expensive, reusing
them is cheap. I encourage folks to think about how
their work can be reused by building up compassion
for what will help others.
Elevate Program
Thanks!

More Related Content

Similar to Scaling & Transforming Stitch Fix's Visibility into What Folks will love

laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentationNeerajNishad4
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems Ltd.
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard onceJi Dong
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Evolutionary Architecture And Design
Evolutionary Architecture And DesignEvolutionary Architecture And Design
Evolutionary Architecture And DesignNaresh Jain
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleDatabricks
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
 
Accelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringAccelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringCognizant
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 

Similar to Scaling & Transforming Stitch Fix's Visibility into What Folks will love (20)

laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard once
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Evolutionary Architecture And Design
Evolutionary Architecture And DesignEvolutionary Architecture And Design
Evolutionary Architecture And Design
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the People
 
C2_W1---.pdf
C2_W1---.pdfC2_W1---.pdf
C2_W1---.pdf
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Accelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringAccelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature Engineering
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 

More from June Andrews

The Uncanny Valley of ML
The Uncanny Valley of MLThe Uncanny Valley of ML
The Uncanny Valley of MLJune Andrews
 
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...June Andrews
 
Push & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaPush & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaJune Andrews
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsJune Andrews
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsJune Andrews
 
Replication in Data Science
Replication in Data ScienceReplication in Data Science
Replication in Data ScienceJune Andrews
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...June Andrews
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on PinterestJune Andrews
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsJune Andrews
 
Predictive Analytics & Business Insights
Predictive Analytics & Business InsightsPredictive Analytics & Business Insights
Predictive Analytics & Business InsightsJune Andrews
 

More from June Andrews (14)

The Uncanny Valley of ML
The Uncanny Valley of MLThe Uncanny Valley of ML
The Uncanny Valley of ML
 
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
 
Data Competitive
Data CompetitiveData Competitive
Data Competitive
 
Push & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaPush & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & Academia
 
ML Playbook
ML PlaybookML Playbook
ML Playbook
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Replication in Data Science
Replication in Data ScienceReplication in Data Science
Replication in Data Science
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on Pinterest
 
Math in data
Math in dataMath in data
Math in data
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North Stars
 
Economic Insights
Economic InsightsEconomic Insights
Economic Insights
 
Predictive Analytics & Business Insights
Predictive Analytics & Business InsightsPredictive Analytics & Business Insights
Predictive Analytics & Business Insights
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Scaling & Transforming Stitch Fix's Visibility into What Folks will love

  • 1. Scaling and Transforming Visibility into What People Will Love June Andrews Lunch n Learn April 28 2021
  • 2. Agenda Design the Line Architecture Story of Development - Ways of Working - Component Level Learnings Elevate
  • 3. Matching Service Between People & Fashion
  • 4. Transforming Stitch Fix’s Visibility into What People Will Love Design the Line
  • 5. Model Training Data Set Construction Online Featurization Model Router Prediction Storage Model Storage Inference Training Image Service Upload New Styles Prediction Reports Featurization Labeler Client Sales Client Input Priors Dimensionality Reduction Calibration {GBDT, Regression, etc} Model Evaluation Model Deployment Client Metadata HyperParamet erOptimization Design the Line
  • 6. Model Training Data Set Construction Online Featurization Model Router Prediction Storage Model Storage Inference Training Image Service Upload New Styles Prediction Reports Featurization Labeler Client Sales Client Input Priors Dimensionality Reduction Calibration {GBDT, Regression, etc} Model Evaluation Model Deployment Client Metadata Ways of Working HyperParamet erOptimization
  • 7. Project Management Guide to Stages of ML
  • 8. 2020 Hired the first person into the role of ML Integration. This role has been a foundational unlock in designing ML systems. About the Role This role is responsible for unlocking business opportunities for Stitch Fix to more efficiently grow merchandise by leveraging in house ML products. On a daily basis, this may involve researching how merchandise is purchased for Stitch Fix or coding customizations to our existing ML products to enable new use cases. This role will involve both a solid understanding of machine learning products from features to evaluation, and the creativity to see how ML can be integrated into Stitch Fix for better buying decisions and more efficient operations. ML Integration
  • 9. Set a Standard of Development The standard doesn’t have to be the highest bar, but uniformity is a good baseline Code Standards: ○ PEP 8/Black/Lint/etc ○ Google Python Style Guide ○ Documentation/Sphinx Testing: ○ Unit/Integration/% ○ Deployment processes Code Reviews: ○ Primary/Secondary Reviewers ○ Size of a Code Review Blocker Resolution & Feedback Processes
  • 10. Steel Thread v Modular Development Modular Development ○ Create an overall architecture map ○ Mock out endpoints ○ Build deep within each module ○ Connect modules all together at the end ○ Release with a fully fledged product Steel Thread ○ Create an overall architecture map ○ Mock out endpoints ○ Build bare minimum for each module ○ Connect modules as quickly as possible ○ Release with a ‘make it barely work’ product ○ Rapid tuning of bottlenecks for a ‘it works’ product ○ Long term investment in upgrading modules Boehm's Spiral???
  • 11. Modular Development ○ Great for known complexity ○ Good ROI of development ○ Increasingly Available Steel Thread ○ Quicker release of major milestones ○ Laser focus ○ Requires Steel Thread v Modular Development
  • 12. Enable Focus: ○ Daily Stand Ups ○ Complete List of Everything that Needed Built ○ Steel Thread naturally lends itself to bite sized tasks ○ Use low uncertainty solutions ○ Increased Pair Coding over Code Reviews ○ Clear Cross Functional Buy In ○ Mocked Out Endpoints Between Teams Take Care of People: ○ Rotating ‘adjustment’ PTO ○ Mental Health Days ○ Pre-emptive No Meeting Days ○ Customize to what people need ○ “It’s okay to be happy at work. It’s okay to enjoy being good at what you do.” ○ Increased online social interactions, lunches, etc 2020 Steel Thread Support
  • 13. Model Training Data Set Construction Online Featurization Model Router Prediction Storage Model Storage Inference Training Image Service Upload New Styles Prediction Reports Featurization Labeler Client Sales Client Input Priors Dimensionality Reduction Calibration {GBDT, Regression, etc} Model Evaluation Model Deployment Client Metadata Component Level Learnings HyperParamet erOptimization
  • 14. Stages of Development: - Count level metrics - Ratio metrics - Domain Specific Business Value metrics - Historically corrected labels - [Wishlist] Distribution of a metric labels Gotchas: Expect rapid schema changes as client metadata, business context, and metrics evolve with the business. Labeler Labeler Client Sales Client Metadata
  • 15. Metric Stability is a function of different levels of certainty. Fashion (and the stock market) have high levels of chaotic influence, much higher than many areas of tech. Manage with adding 2nd and 3rd moment metrics for gauging stability of predictions in production. Ie, not just absolute loss, but also standard deviation of error and higher moments. Labeler Deterministic Influence Probabilistic Influence Chaotic Influence Known Victory Lap Continuous Development Use in Confidence Bounds Unknown - measurable Roadmap Roadmap Unknown - unmeasurable
  • 16. In Steel Thread development, pick a feature family covering each of the main types of data {categorical, numerical, image, text} to put strong connectors in place between each of the components. If the connectors are strong, then additional feature families can be added at a later date without breaking downstream data type assumptions. Gotchas: Client Input features are calculated on a different timeline than ML computed features. Handle by allowing null features to be returned and taken into account at the model routing stage. Featurization Image Service Featurization Client Input Priors
  • 17. Why do embeddings work? ○ There’s a lot of space in high dimensions. The probability adding a set of vectors together lands near a point is extraordinarily low. What is a meaningful level of near in a high dimensional space? ○ Use the variation of known similar vectors to create a localized meaningful distance threshold. [Tunkelang] Featurization - Embeddings
  • 18. To prevent time travel, have to create a “memoryless circuit” at training time where only as much information that would be known at inference time is known about the training data. Common Forms of Time Travel: - Randomly assigned test & train data sets - Duplicate records of varying degrees - Features calculated off of current tables v historical snapshots Data Set Creation
  • 19. This is the fun stuff. Try to build with an interchangeable parts mindset to enable rapid iteration. Gotchas - Choosing a common set of tooling and approaches will enable more dynamic resourcing for sprints - Using default parameters that slow down the pipeline for no improvement in accuracy Model Training Model Training Dimensionality Reduction Calibration {GBDT, Regression, etc} HyperParamet erOptimization
  • 20. UMAP ○ Faster than T-SNE ○ Biased towards preserving short distances at the expense of ignoring large distances T-SNE ○ Groundbreaking way to visualize high dimensional data over large datasets ○ Preserves large distances at the expense of local distances PCA ○ Doesn’t do well with the cloudiness of large, high dimensional data sets. If the dimensionality is large enough nearly all points are equidistant. LDA (may not be applicable) Dimensionality Reduction
  • 21. In practice, the dimensionality reduction step is a hybrid approach with features being grouped for different levels of compression. Ie, price features should not be compressed, but embedding features should be. Dimensionality Reduction (in practice)
  • 22. Grid Search ○ Higher dimensional spaces lead to spending most of the time searching the boundary of the parameter space Random Search ○ Better distribution of evenly searching the space Bayesian Optimization ○ At least as good as random … but so much quicker It’s Free ○ Stop spending so many resources re-coding a free solution, you won’t be able to beat ○ …...SigOpt Hyper Parameter Optimization (HPO)
  • 23. Many model types are primarily good at sorting datasets, but struggle with biases that can cause absolutely accuracy to suffer. Calibration corrects for known biases to improve absolute accuracy. Calibration
  • 24. ...a quick aside about ... Model Evaluation
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. Hard to get multiple points of measurement to know when the chasm is crossed. Measure 2 points: ○ Current system performance (Human Accuracy) ○ Perfect performance Rule of Thumb: ○ Release once ML accuracy is greater than the split Split the Difference
  • 38. Model Training Data Set Construction Online Featurization Model Router Prediction Storage Model Storage Inference Training Image Service Upload New Styles Prediction Reports Featurization Labeler Client Sales Client Input Priors Dimensionality Reduction Calibration {GBDT, Regression, etc} Model Evaluation Model Deployment Client Metadata Component Level Learnings HyperParamet erOptimization
  • 39. Model Training Data Set Construction Online Featurization Model Router Prediction Storage Model Storage Inference Training Image Service Upload New Styles Prediction Reports Featurization Labeler Client Sales Client Input Priors Dimensionality Reduction Calibration {GBDT, Regression, etc} Model Evaluation Model Deployment Client Metadata Maximizing Reuse HyperParamet erOptimization
  • 40. Common System Parameters ○ Client Segment ○ Business Context ○ Target Metric ○ Time Scale Software Engineering Best Practices: ○ Every Degree of Freedom in a system has a cost for maintenance, design complexity. ○ Adding Degrees of Freedom often requires a refactor Set Flexible & Narrow System Parameters
  • 41. With enabling predictions to be used in multiple contexts, providing predictions in context is important for enabling strong decision making. Examples of setting context ○ Provide a summary recommendation of buy, unknown, or don’t buy ○ Provide a historical baseline of performance with those predictions ○ Provide an example of the next best or most similar item already in the system Context Context Context
  • 42. Elevate Program was designed to give a leg up to emerging BIPOC designers at a time when it was needed. Access to data insights, predictions, and early product market fit indicators for scaling help plan supply chains, highlight growth areas, and help emerging brands optimize their digital presence. Building recommender systems is expensive, reusing them is cheap. I encourage folks to think about how their work can be reused by building up compassion for what will help others. Elevate Program