SlideShare a Scribd company logo
1 of 23
Download to read offline
Inventory As Pure Functions
Sky Yin
Data scientist
● What do we do at Stitch Fix?
● Why does inventory matter?
● The design of Tracer
● The implementation of Tracer
Agenda
We provide personalized styling service.
What do we do at Stitch Fix?
“AI”
We provide personalized styling service through a combination of
algorithmic recommendations and stylist curation.
http://algorithms-tour.stitchfix.com/
What do we do at Stitch Fix?
We need good inventory to serve good recommendations.
Recommendation algorithms work in both ways.
(Buyers here mean the people who buy clothes from vendors to fill our warehouses)
Why does inventory matter?
Stylists Buyers
We need good personalized inventory to serve good
recommendation for each client.
Why does inventory matter?
We need good personalized inventory to serve good
recommendation for each client.
Tracer
A time series database providing precise personalized inventory
states at any given point of time
Why does inventory matter?
Imagine we have a time series of SKU counts
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate this series, say every 10 minutes.
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate counts, say every 10 minutes.
● Problems:
○ A tons of things can happen within 10 minutes during peak
hours
○ We’d like to know what exactly stylists saw when they
started working. 10-min snapshots just isn’t accurate
enough
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
● Problems
○ Not realistic to aggregate that often in the engineering DB,
where every item is a row.
○ Even if eng maintains a count table, should we snapshot
that every 1 sec?
○ A waste of space for non-moving counts during midnight
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
The design of Tracer
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
○ Problems
■ Again, engineering DB works on item level
■ Say t1 is far away from t2, in order to know the count at tx
(t1
< tx
< t2
), we may need to walk through tons of other
events. This could be solved by indexing, but indexing
for each SKU is too much
The design of Tracer
(s11
-> s12
, t1
), (s12
-> s13
, t2
), (s13
-> s14
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s tweak this idea a bit and generate events of item state
transitions
● This gives us the flexibility to process item state as we want. In
the case of computing SKU counts, we can transform these
events into SKU count changes:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The design of Tracer
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
Q: How could we know the count at any t within the range?
● One missing piece is we still need an initial state to apply a delta
● This can be addressed by creating a state snapshot at the very
beginning
The design of Tracer
Now the whole design can be summarized as two pure functions:
● Inventory state function
I(t)
● Difference function
D(t1
,t2
) = I(t2
) - I(t1
) = -D(t2
,t1
)
● Inventory state reasoning
I(t2
) = I(t1
) + D(t1
,t2
) = I(t3
) - D(t2
, t3
)
The design of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
The implementation of Tracer
● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
● We provide both Scala and Python API to query the inventory
state
The implementation of Tracer
Thank You
@piggybox
sky.yin@gmail.com

More Related Content

Similar to Inventory As Pure Functions: How Stitch Fix Built Tracer to Track Precise Inventory States

Computability, turing machines and lambda calculus
Computability, turing machines and lambda calculusComputability, turing machines and lambda calculus
Computability, turing machines and lambda calculusEdward Blurock
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelinesRamesh Sampath
 
Algorithm in Computer, Sorting and Notations
Algorithm in Computer, Sorting  and NotationsAlgorithm in Computer, Sorting  and Notations
Algorithm in Computer, Sorting and NotationsAbid Kohistani
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdfSwasShiv
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Zuozhi Wang
 
PGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updatesPGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updatesG Gordon Worley III
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysisAkshay Dagar
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfssuser034ce1
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptxrediet43
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfTulasiramKandula1
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationAlex D. Gaudio
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTomek Borek
 

Similar to Inventory As Pure Functions: How Stitch Fix Built Tracer to Track Precise Inventory States (20)

Computability, turing machines and lambda calculus
Computability, turing machines and lambda calculusComputability, turing machines and lambda calculus
Computability, turing machines and lambda calculus
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Algorithm in Computer, Sorting and Notations
Algorithm in Computer, Sorting  and NotationsAlgorithm in Computer, Sorting  and Notations
Algorithm in Computer, Sorting and Notations
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdf
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
PGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updatesPGDay SF 2020 - Timeseries data in Postgres with updates
PGDay SF 2020 - Timeseries data in Postgres with updates
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Data type
Data typeData type
Data type
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new people
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Inventory As Pure Functions: How Stitch Fix Built Tracer to Track Precise Inventory States

  • 1. Inventory As Pure Functions Sky Yin Data scientist
  • 2. ● What do we do at Stitch Fix? ● Why does inventory matter? ● The design of Tracer ● The implementation of Tracer Agenda
  • 3. We provide personalized styling service. What do we do at Stitch Fix? “AI”
  • 4. We provide personalized styling service through a combination of algorithmic recommendations and stylist curation. http://algorithms-tour.stitchfix.com/ What do we do at Stitch Fix?
  • 5. We need good inventory to serve good recommendations. Recommendation algorithms work in both ways. (Buyers here mean the people who buy clothes from vendors to fill our warehouses) Why does inventory matter? Stylists Buyers
  • 6. We need good personalized inventory to serve good recommendation for each client. Why does inventory matter?
  • 7. We need good personalized inventory to serve good recommendation for each client. Tracer A time series database providing precise personalized inventory states at any given point of time Why does inventory matter?
  • 8. Imagine we have a time series of SKU counts (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? The design of Tracer
  • 9. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● This is asking too much! Let’s use a predefined interval to generate this series, say every 10 minutes. The design of Tracer
  • 10. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● This is asking too much! Let’s use a predefined interval to generate counts, say every 10 minutes. ● Problems: ○ A tons of things can happen within 10 minutes during peak hours ○ We’d like to know what exactly stylists saw when they started working. 10-min snapshots just isn’t accurate enough The design of Tracer
  • 11. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● OK, let’s generate the counts every second! The design of Tracer
  • 12. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● OK, let’s generate the counts every second! ● Problems ○ Not realistic to aggregate that often in the engineering DB, where every item is a row. ○ Even if eng maintains a count table, should we snapshot that every 1 sec? ○ A waste of space for non-moving counts during midnight The design of Tracer
  • 13. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● Let’s do away with the fixed interval and only generate a count event when the count changes! The design of Tracer
  • 14. (count1 , t1 ), (count1 , t2 ), (count1 , t3 )... Q: How could we know the count at any t within the range? ● Let’s do away with the fixed interval and only generate a count event when the count changes! ○ Problems ■ Again, engineering DB works on item level ■ Say t1 is far away from t2, in order to know the count at tx (t1 < tx < t2 ), we may need to walk through tons of other events. This could be solved by indexing, but indexing for each SKU is too much The design of Tracer
  • 15. (s11 -> s12 , t1 ), (s12 -> s13 , t2 ), (s13 -> s14 , t3 )... Q: How could we know the count at any t within the range? ● Let’s tweak this idea a bit and generate events of item state transitions ● This gives us the flexibility to process item state as we want. In the case of computing SKU counts, we can transform these events into SKU count changes: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... The design of Tracer
  • 16. (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... Q: How could we know the count at any t within the range? ● One missing piece is we still need an initial state to apply a delta ● This can be addressed by creating a state snapshot at the very beginning The design of Tracer
  • 17. Now the whole design can be summarized as two pure functions: ● Inventory state function I(t) ● Difference function D(t1 ,t2 ) = I(t2 ) - I(t1 ) = -D(t2 ,t1 ) ● Inventory state reasoning I(t2 ) = I(t1 ) + D(t1 ,t2 ) = I(t3 ) - D(t2 , t3 ) The design of Tracer
  • 18. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... The implementation of Tracer
  • 19. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there The implementation of Tracer
  • 20. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. The implementation of Tracer
  • 21. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. ● This is all built upon Spark and deltas and snapshots are stored as Spark dataframe The implementation of Tracer
  • 22. ● As we consume the item event stream, we continuously build delta blocks: (delta1 , t1 ), (delta2 , t2 ), (delta1 , t3 )... ● Periodically we create SKU count snapshot every hour, so that we don’t need to always go to the very start to apply deltas all the way from there ● To speed up searching for a certain snapshot, we index snapshots. In the case of hourly snapshot, there are only 24 ones to index a day. ● This is all built upon Spark and deltas and snapshots are stored as Spark dataframe ● We provide both Scala and Python API to query the inventory state The implementation of Tracer