SlideShare a Scribd company logo
Inventory As Pure Functions
Sky Yin
Data scientist
● What do we do at Stitch Fix?
● Why does inventory matter?
● The design of Tracer
● The implementation of Tracer
Agenda
We provide personalized styling service.
What do we do at Stitch Fix?
“AI”
We provide personalized styling service through a combination of
algorithmic recommendations and stylist curation.
http://algorithms-tour.stitchfix.com/
What do we do at Stitch Fix?
We need good inventory to serve good recommendations.
Recommendation algorithms work in both ways.
(Buyers here mean the people who buy clothes from vendors to fill our warehouses)
Why does inventory matter?
Stylists Buyers
We need good personalized inventory to serve good
recommendation for each client.
Why does inventory matter?
We need good personalized inventory to serve good
recommendation for each client.
Tracer
A time series database providing precise personalized inventory
states at any given point of time
Why does inventory matter?
Imagine we have a time series of SKU counts
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate this series, say every 10 minutes.
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate counts, say every 10 minutes.
● Problems:
○ A tons of things can happen within 10 minutes during peak
hours
○ We’d like to know what exactly stylists saw when they
started working. 10-min snapshots just isn’t accurate
enough
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
● Problems
○ Not realistic to aggregate that often in the engineering DB,
where every item is a row.
○ Even if eng maintains a count table, should we snapshot
that every 1 sec?
○ A waste of space for non-moving counts during midnight
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
○ Problems
■ Again, engineering DB works on item level
■ Say t1 is far away from t2, in order to know the count at tx
(t1
< tx
< t2
), we may need to walk through tons of other
events. This could be solved by indexing, but indexing
for each SKU is too much
The design of Tracer
(s11
-> s12
, t1
), (s12
-> s13
, t2
), (s13
-> s14
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s tweak this idea a bit and generate events of item state
transitions
● This gives us the flexibility to process item state as we want. In
the case of computing SKU counts, we can transform these
events into SKU count changes:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The design of Tracer
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
Q: How could we know the count at any t within the range?
● One missing piece is we still need an initial state to apply a delta
● This can be addressed by creating a state snapshot at the very
beginning
The design of Tracer
Now the whole design can be summarized as two pure functions:
● Inventory state function
I(t)
● Difference function
D(t1
,t2
) = I(t2
) - I(t1
) = -D(t2
,t1
)
● Inventory state reasoning
I(t2
) = I(t1
) + D(t1
,t2
) = I(t3
) - D(t2
, t3
)
The design of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
● We provide both Scala and Python API to query the inventory
state
The implementation of Tracer
Thank You
@piggybox
sky.yin@gmail.com

More Related Content

Similar to Inventory as Pure Functions

Computability, turing machines and lambda calculus
Computability, turing machines and lambda calculusComputability, turing machines and lambda calculus
Computability, turing machines and lambda calculus
Edward Blurock
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
Ramesh Sampath
 
Algorithm in Computer, Sorting and Notations
Algorithm in Computer, Sorting  and NotationsAlgorithm in Computer, Sorting  and Notations
Algorithm in Computer, Sorting and Notations
Abid Kohistani
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
PingCAP
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdf
SwasShiv
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
Jeff Patti
 
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Zuozhi Wang
 
l1.ppt
l1.pptl1.ppt
PGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updatesPGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updates
G Gordon Worley III
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
Akshay Dagar
 
Data type
Data typeData type
Data type
myrajendra
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
ssuser034ce1
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
rediet43
 
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITYDESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
AneetaGrace1
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
TulasiramKandula1
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
Alex D. Gaudio
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
SagarDR5
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
Alpha474815
 

Similar to Inventory as Pure Functions (20)

Computability, turing machines and lambda calculus
Computability, turing machines and lambda calculusComputability, turing machines and lambda calculus
Computability, turing machines and lambda calculus
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Algorithm in Computer, Sorting and Notations
Algorithm in Computer, Sorting  and NotationsAlgorithm in Computer, Sorting  and Notations
Algorithm in Computer, Sorting and Notations
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdf
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
PGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updatesPGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updates
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Data type
Data typeData type
Data type
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITYDESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
DESIGN AND ALGORITHM.pptx BCA BANGALORECITY UNIVERSITY
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Inventory as Pure Functions

  • 1. Inventory As Pure Functions Sky Yin Data scientist
  • 2. ● What do we do at Stitch Fix? ● Why does inventory matter? ● The design of Tracer ● The implementation of Tracer Agenda
  • 3. We provide personalized styling service. What do we do at Stitch Fix? “AI”
  • 4. We provide personalized styling service through a combination of algorithmic recommendations and stylist curation. http://algorithms-tour.stitchfix.com/ What do we do at Stitch Fix?
  • 5. We need good inventory to serve good recommendations. Recommendation algorithms work in both ways. (Buyers here mean the people who buy clothes from vendors to fill our warehouses) Why does inventory matter? Stylists Buyers
  • 6. We need good personalized inventory to serve good recommendation for each client. Why does inventory matter?
  • 7. We need good personalized inventory to serve good recommendation for each client. Tracer A time series database providing precise personalized inventory states at any given point of time Why does inventory matter?
  • 8. Imagine we have a time series of SKU counts (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? The design of Tracer
  • 9. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● This is asking too much! Let’s use a predefined interval to generate this series, say every 10 minutes. The design of Tracer
  • 10. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● This is asking too much! Let’s use a predefined interval to generate counts, say every 10 minutes. ● Problems: ○ A tons of things can happen within 10 minutes during peak hours ○ We’d like to know what exactly stylists saw when they started working. 10-min snapshots just isn’t accurate enough The design of Tracer
  • 11. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● OK, let’s generate the counts every second! The design of Tracer
  • 12. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● OK, let’s generate the counts every second! ● Problems ○ Not realistic to aggregate that often in the engineering DB, where every item is a row. ○ Even if eng maintains a count table, should we snapshot that every 1 sec? ○ A waste of space for non-moving counts during midnight The design of Tracer
  • 13. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● Let’s do away with the fixed interval and only generate a count event when the count changes! The design of Tracer
  • 14. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● Let’s do away with the fixed interval and only generate a count event when the count changes! ○ Problems ■ Again, engineering DB works on item level ■ Say t1 is far away from t2, in order to know the count at tx (t1 < tx < t2 ), we may need to walk through tons of other events. This could be solved by indexing, but indexing for each SKU is too much The design of Tracer
  • 15. (s11 -> s12 , t1 ), (s12 -> s13 , t2 ), (s13 -> s14 , t3 )... Q: How could we know the count at any t within the range? ● Let’s tweak this idea a bit and generate events of item state transitions ● This gives us the flexibility to process item state as we want. In the case of computing SKU counts, we can transform these events into SKU count changes: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... The design of Tracer
  • 16. (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... Q: How could we know the count at any t within the range? ● One missing piece is we still need an initial state to apply a delta ● This can be addressed by creating a state snapshot at the very beginning The design of Tracer
  • 17. Now the whole design can be summarized as two pure functions: ● Inventory state function I(t) ● Difference function D(t1 ,t2 ) = I(t2 ) - I(t1 ) = -D(t2 ,t1 ) ● Inventory state reasoning I(t2 ) = I(t1 ) + D(t1 ,t2 ) = I(t3 ) - D(t2 , t3 ) The design of Tracer
  • 18. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... The implementation of Tracer
  • 19. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there The implementation of Tracer
  • 20. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. The implementation of Tracer
  • 21. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. ● This is all built upon Spark and deltas and snapshots are stored as Spark dataframe The implementation of Tracer
  • 22. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. ● This is all built upon Spark and deltas and snapshots are stored as Spark dataframe ● We provide both Scala and Python API to query the inventory state The implementation of Tracer