Inventory as Pure Functions

Inventory As Pure Functions
Sky Yin
Data scientist

● What do we do at Stitch Fix?
● Why does inventory matter?
● The design of Tracer
● The implementation of Tracer
Agenda

We provide personalized styling service.
What do we do at Stitch Fix?
“AI”

We provide personalized styling service through a combination of
algorithmic recommendations and stylist curation.
http://algorithms-tour.stitchfix.com/
What do we do at Stitch Fix?

We need good inventory to serve good recommendations.
Recommendation algorithms work in both ways.
(Buyers here mean the people who buy clothes from vendors to fill our warehouses)
Why does inventory matter?
Stylists Buyers

We need good personalized inventory to serve good
recommendation for each client.

We need good personalized inventory to serve good
recommendation for each client.
Tracer
A time series database providing precise personalized inventory
states at any given point of time

Imagine we have a time series of SKU counts
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
The design of Tracer

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● This is asking too much! Let’s use a predefined interval to
generate this series, say every 10 minutes.

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● This is asking too much! Let’s use a predefined interval to
generate counts, say every 10 minutes.
● Problems:
○ A tons of things can happen within 10 minutes during peak
hours
○ We’d like to know what exactly stylists saw when they
started working. 10-min snapshots just isn’t accurate
enough

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● OK, let’s generate the counts every second!

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● OK, let’s generate the counts every second!
● Problems
○ Not realistic to aggregate that often in the engineering DB,
where every item is a row.
○ Even if eng maintains a count table, should we snapshot
that every 1 sec?
○ A waste of space for non-moving counts during midnight

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● Let’s do away with the fixed interval and only generate a count
event when the count changes!

(count1
, t1
), (count1
, t2
), (count1
, t3
)...
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
○ Problems
■ Again, engineering DB works on item level
■ Say t1 is far away from t2, in order to know the count at tx
(t1
< tx
< t2
), we may need to walk through tons of other
events. This could be solved by indexing, but indexing
for each SKU is too much

(s11
-> s12
, t1
), (s12
-> s13
, t2
), (s13
-> s14
, t3
)...
● Let’s tweak this idea a bit and generate events of item state
transitions
● This gives us the flexibility to process item state as we want. In
the case of computing SKU counts, we can transform these
events into SKU count changes:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...

(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● One missing piece is we still need an initial state to apply a delta
● This can be addressed by creating a state snapshot at the very
beginning

Now the whole design can be summarized as two pure functions:
● Inventory state function
I(t)
● Difference function
D(t1
,t2
) = I(t2
) - I(t1
) = -D(t2
,t1
)
● Inventory state reasoning
I(t2
) = I(t1
) + D(t1
,t2
) = I(t3
) - D(t2
, t3
)

● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The implementation of Tracer

delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there

delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.

delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
the way from there
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe

delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
the way from there
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
● We provide both Scala and Python API to query the inventory
state

Thank You
@piggybox
sky.yin@gmail.com

Inventory as Pure Functions

Recommended

Recommended

More Related Content

Similar to Inventory as Pure Functions

Similar to Inventory as Pure Functions (20)

Recently uploaded

Recently uploaded (20)

Inventory as Pure Functions