This document summarizes a research paper on using life stage prediction to improve product recommendations in e-commerce. The researchers at Alibaba developed a Maximum Entropy Semi-Markov Model (MESMM) that predicts a user's life stage based on purchase history and incorporates the predicted life stage into recommendations. The model segments users into life stages like bachelor, married with young children, empty nest, etc. Evaluation on Taobao data showed the MESMM approach improved recommendation accuracy and online metrics like click-through rate over existing collaborative filtering.
2. Overview
• A model based recommender
• Incorporates several ideas from ML and IR
domains
• Tries to model life-stage of consumers
• Attempted on http://www.taobao.com
4. Key contributions
1. Conception of life stages into E-commerce systems
2. A Maximum Entropy Semi Markov model for
segmentation and prediction
3. An efficient large scale solution
4. A solution for modeling multi-kids scenario via Gaussian
mixture models
5. Verification of the effectiveness in both offline and
online scenarios
5. Core idea
Importance of life stages on consumer’s purchasing
behaviors
• Bachelor stage
• Newly married couples (young, no children)
• Full nest (married couple with dependent children)
• Empty nest (i.e. elder married couples with no
children living together)
o Head in labor force
o Retired
o Solitary survivors
o Etc…
8. Markov models
• Hidden semi-Markov model
• Maximum-entropy Markov model
Hidden semi-‐‑Markov model
+
Maximum-‐‑entropy Markov model
=
Maximum Entropy Semi Markov Model (MESMM)
9. Hidden Markov Models
• Discrete and Continuous versions
• Viterbi algorithm is used for most probable state sequence
10. Viterbi algorithm
• A dynamic programming algorithm for finding the
most likely sequence of hidden states
Vt,k = The probability of the most probable state sequence responsible
for the first t observations that have k as its final state
ax,k = Transition probability from state x to k
11. Hidden semi-‐‑Markov model
• The probability of there being a change in
the hidden state depends on the amount of
time that has elapsed since entry into the
current state
• This is in contrast to hidden Markov models.
12. Maximum-‐‑entropy Markov model
• A sequence of observations - O1, …, On
• Tag with the labels – S1,…,Sn
• Such that - P(S1,…,Sn | O1, …, On) is maximized
• Parameters λ can be learned using EM (Baum–Welch)
• Optimal state sequence using Viterbi algorithm
• Main advantage over HMMs: Overlapping/non-‐‑independent features
13. Maximum Entropy Semi Markov Model
• The probability of life stage yt at time t depends
on,
o The previous life stage yt−1 at time t−1
o How long the user has been in the previous life stage
o The observed user behavior sequence
• Variable d changes deterministically
o When life stage changes, d is reset to 0. Otherwise d decreases as time
goes on
14. MESMM cont...
Out Goal: Given an observed behavior sequence X, find
the best underlying life-‐‑stage sequence y1,…,yk and the
corresponding duration di,…,dk
Xt: the observed behavior sequence at time t
dt: the duration of a life stage at time t
yt: the life stage label at time t
lmin,lmax: The minimum and maximum lengths of life stage
15. Problem ?
The inference process is computationally expensive!
Have to predict both next state label and the duration
of the period.
But in Mom-‐‑baby domain…
When you know the birthdate of the baby, transitions
and durations become deterministic.
Additionally, due to single-child policy in China, the
default model assumes all families are single child.
16. Simplified model
• A logistic regression model to predict yt based on
the Xt.
• Trains the model offline, using behavior sequences
and birthdates provided.
17. Logistic regression classifier
• Instead of items, categories are matched against
user behavior sequences
o Available items are changing frequently
o Purchasing behaviors are more consistent at category level
• Categories weighted using TFIDF to reduce the
influence of popular categories
18. More on features…
• User search queries
o 3 years old children’s garments, large-size diaper ….
o Lots of information
o Pre-processed using Chinese word segmentation -> word vectors
• Product labels and titles
o Size – “M” or “L”
o “Newborn”, “1-3 years” .etc
• Temporal Effect of Features
19. Predicting
• Probability of a user purchasing a product at a
specific age a - P(p_productj , a)
o p_productj - Probability of purchasing product J
o a – Baby’s age
• For users without age information: Estimate baby’s
age distribution – Pu(a). Then do the same as
above:
20. • With recent relaxations of one-child policy in China,
there’s an increased number of multi-kids families
(~10% from purchasing stats)
• Uses a Gaussian mixture model
• MLE/EM for parameter (w, sigma, mu) estimation
• BIC/AIC for best K
Multi-‐‑kids scenario
aj,t = Purchasing age of the
baby at time t
C = The index of child
K = Total number of children
23. Classification accuracy
• With 5 fold cross-validation
Basic -‐‑ logistic regression model with only product category features
Prop – Product meta data
Title – Segmented data from title
Temp – Temporal effects
Seg – Fixed baby life stage template is introduced to segment
24. Evaluating temporal effects
• Temporal information play a very important role
• Too small/too large window sizes are bad. 60 days is
optimal
• More number of windows (more history) is better
25. Online experiments (A/B testing)
• 2 buckets – same mom-baby products
• Bucket A – existing rec. system (CF, brand preference)
• Bucket B – new system
• Evaluating P(p_productj = yes)
27. • Complex model and high computational cost
• Not recommended as the first recommendation
system
• Fits very well for deterministic transition scenarios
• Generalization is questionable…
Summary