life-state-predications

Life-‐‑stage Prediction for Product
Recommendation in E-‐‑commerce
Authored By:

•  Peng Jiang -‐‑ Alibaba Inc
•  Yadong Zhu -‐‑ Alibaba Inc
•  Quan Yuan† -‐‑ Alibaba Inc
•  Yi Zhang -‐‑ University of California, Santa Cruz
KDD 2015, August 10-‐‑13, 2015, Sydney, Australia
© 2015 ACM

Overview
•  A model based recommender
•  Incorporates several ideas from ML and IR
domains
•  Tries to model life-stage of consumers
•  Attempted on http://www.taobao.com

hKp://www.taobao.com/market/baobao/2014/
•  Founded by Alibaba Group on 2003
•  ~ 760 million product listings as of 2013
•  One of the world’s top 10 most visited websites -‐‑ Alexa

Key contributions
1.  Conception of life stages into E-commerce systems
2.  A Maximum Entropy Semi Markov model for
segmentation and prediction
3.  An efficient large scale solution
4.  A solution for modeling multi-kids scenario via Gaussian
mixture models
5.  Verification of the effectiveness in both offline and
online scenarios

Core idea
Importance of life stages on consumer’s purchasing
behaviors
•  Bachelor stage
•  Newly married couples (young, no children)
•  Full nest (married couple with dependent children)
•  Empty nest (i.e. elder married couples with no
children living together)
o  Head in labor force
o  Retired
o  Solitary survivors
o  Etc…

Markov models
•  Hidden semi-Markov model
•  Maximum-entropy Markov model
Hidden semi-‐‑Markov model
+
Maximum-‐‑entropy Markov model
=
Maximum Entropy Semi Markov Model (MESMM)

Hidden Markov Models
•  Discrete and Continuous versions
•  Viterbi algorithm is used for most probable state sequence

Viterbi algorithm
•  A dynamic programming algorithm for finding the
most likely sequence of hidden states
Vt,k = The probability of the most probable state sequence responsible
for the ﬁrst t observations that have k as its ﬁnal state
ax,k = Transition probability from state x to k

Hidden semi-‐‑Markov model
•  The probability of there being a change in
the hidden state depends on the amount of
time that has elapsed since entry into the
current state
•  This is in contrast to hidden Markov models.

Maximum-‐‑entropy Markov model
•  A sequence of observations - O1, …, On
•  Tag with the labels – S1,…,Sn
•  Such that - P(S1,…,Sn | O1, …, On) is maximized
•  Parameters λ can be learned using EM (Baum–Welch)
•  Optimal state sequence using Viterbi algorithm
•  Main advantage over HMMs: Overlapping/non-‐‑independent features

Maximum Entropy Semi Markov Model
•  The probability of life stage yt at time t depends
on,
o  The previous life stage yt−1 at time t−1
o  How long the user has been in the previous life stage
o  The observed user behavior sequence
•  Variable d changes deterministically
o  When life stage changes, d is reset to 0. Otherwise d decreases as time
goes on

MESMM cont...
Out Goal: Given an observed behavior sequence X, ﬁnd
the best underlying life-‐‑stage sequence y1,…,yk and the
corresponding duration di,…,dk

Xt: the observed behavior sequence at time t
dt: the duration of a life stage at time t
yt: the life stage label at time t
lmin,lmax: The minimum and maximum lengths of life stage

Problem ?
The inference process is computationally expensive!

Have to predict both next state label and the duration
of the period.
But in Mom-‐‑baby domain…

When you know the birthdate of the baby, transitions
and durations become deterministic.
Additionally, due to single-child policy in China, the
default model assumes all families are single child.

Simpliﬁed model
•  A logistic regression model to predict yt based on
the Xt.
•  Trains the model offline, using behavior sequences
and birthdates provided.

Logistic regression classiﬁer
•  Instead of items, categories are matched against
user behavior sequences
o  Available items are changing frequently
o  Purchasing behaviors are more consistent at category level
•  Categories weighted using TFIDF to reduce the
influence of popular categories

More on features…
•  User search queries
o  3 years old children’s garments, large-size diaper ….
o  Lots of information
o  Pre-processed using Chinese word segmentation -> word vectors
•  Product labels and titles
o  Size – “M” or “L”
o  “Newborn”, “1-3 years” .etc
•  Temporal Effect of Features

Predicting
•  Probability of a user purchasing a product at a
specific age a - P(p_productj , a)
o  p_productj - Probability of purchasing product J
o  a – Baby’s age
•  For users without age information: Estimate baby’s
age distribution – Pu(a). Then do the same as
above:

•  With recent relaxations of one-child policy in China,
there’s an increased number of multi-kids families
(~10% from purchasing stats)
•  Uses a Gaussian mixture model
•  MLE/EM for parameter (w, sigma, mu) estimation
•  BIC/AIC for best K
Multi-‐‑kids scenario
aj,t = Purchasing age of the

baby at time t
C = The index of child
K = Total number of children

Implementation
hKp://www.taobao.com/market/baobao/2014/

Evaluation
•  ~8 million children birthdate information

Classiﬁcation accuracy
•  With 5 fold cross-validation
Basic -‐‑ logistic regression model with only product category features
Prop – Product meta data
Title – Segmented data from title
Temp – Temporal eﬀects
Seg – Fixed baby life stage template is introduced to segment

Evaluating temporal eﬀects
•  Temporal information play a very important role
•  Too small/too large window sizes are bad. 60 days is
optimal
•  More number of windows (more history) is better

Online experiments (A/B testing)
•  2 buckets – same mom-baby products
•  Bucket A – existing rec. system (CF, brand preference)
•  Bucket B – new system
•  Evaluating P(p_productj = yes)

Online experiments (A/B testing)
uCTR = Click Trough Rate
CVR = Click Conversion Rate

•  Complex model and high computational cost
•  Not recommended as the first recommendation
system
•  Fits very well for deterministic transition scenarios
•  Generalization is questionable…
Summary

life-state-predications

Recommended

Recommended

More Related Content

Similar to life-state-predications

Similar to life-state-predications (20)

life-state-predications