SlideShare a Scribd company logo
1 of 28
Download to read offline
Life-­‐‑stage  Prediction  for  Product  
Recommendation  in  E-­‐‑commerce  	
Authored  By:	
	
•  Peng  Jiang  -­‐‑  Alibaba  Inc	
•  Yadong  Zhu  -­‐‑  Alibaba  Inc	
•  Quan  Yuan†  -­‐‑  Alibaba  Inc	
•  Yi  Zhang  -­‐‑  University  of  California,  Santa  Cruz  	
KDD  2015,  August  10-­‐‑13,  2015,  Sydney,  Australia	
©  2015  ACM    
Overview	
•  A model based recommender
•  Incorporates several ideas from ML and IR
domains
•  Tries to model life-stage of consumers
•  Attempted on http://www.taobao.com
hKp://www.taobao.com/market/baobao/2014/	
•  Founded  by  Alibaba  Group  on  2003	
•  ~  760  million  product  listings  as  of  2013	
•  One  of  the  world’s  top  10  most  visited  websites    -­‐‑  Alexa
Key  contributions	
1.  Conception of life stages into E-commerce systems
2.  A Maximum Entropy Semi Markov model for
segmentation and prediction
3.  An efficient large scale solution
4.  A solution for modeling multi-kids scenario via Gaussian
mixture models
5.  Verification of the effectiveness in both offline and
online scenarios
Core  idea	
Importance of life stages on consumer’s purchasing
behaviors
•  Bachelor stage
•  Newly married couples (young, no children)
•  Full nest (married couple with dependent children)
•  Empty nest (i.e. elder married couples with no
children living together)
o  Head in labor force
o  Retired
o  Solitary survivors
o  Etc…
Home  remodeling  
Mom-­‐‑baby  domain
Markov  models	
•  Hidden semi-Markov model
•  Maximum-entropy Markov model
Hidden  semi-­‐‑Markov  model	
+	
Maximum-­‐‑entropy  Markov  model	
=	
Maximum  Entropy  Semi  Markov  Model  (MESMM)  
Hidden  Markov  Models	
•  Discrete  and  Continuous  versions	
•  Viterbi  algorithm  is  used  for  most  probable  state  sequence
Viterbi  algorithm	
•  A dynamic programming algorithm for finding the
most likely sequence of hidden states
Vt,k  =    The  probability  of  the  most  probable  state  sequence  responsible	
                      for  the  first  t  observations  that  have  k  as  its  final  state	
ax,k      =  Transition  probability  from  state  x  to  k
Hidden  semi-­‐‑Markov  model	
•  The probability of there being a change in
the hidden state depends on the amount of
time that has elapsed since entry into the
current state
•  This is in contrast to hidden Markov models.
Maximum-­‐‑entropy  Markov  model	
•  A sequence of observations - O1, …, On
•  Tag with the labels – S1,…,Sn
•  Such that - P(S1,…,Sn | O1, …, On) is maximized
•  Parameters  λ  can  be  learned  using  EM  (Baum–Welch)	
•  Optimal  state  sequence  using  Viterbi  algorithm	
•  Main  advantage  over  HMMs:  Overlapping/non-­‐‑independent  features
Maximum  Entropy  Semi  Markov  Model  	
•  The probability of life stage yt at time t depends
on,
o  The previous life stage yt−1 at time t−1
o  How long the user has been in the previous life stage
o  The observed user behavior sequence
•  Variable d changes deterministically
o  When life stage changes, d is reset to 0. Otherwise d decreases as time
goes on
MESMM  cont...	
Out  Goal:  Given  an  observed  behavior  sequence  X,  find  
the  best  underlying  life-­‐‑stage  sequence  y1,…,yk  and  the  
corresponding  duration  di,…,dk	
	
Xt:  the  observed  behavior  sequence  at  time  t	
dt:  the  duration  of  a  life  stage  at  time  t	
yt:  the  life  stage  label  at  time  t	
lmin,lmax:      The  minimum  and  maximum  lengths  of  life  stage
Problem  ?	
The  inference  process  is  computationally  expensive!	
	
Have to predict both next state label and the duration
of the period.
But  in  Mom-­‐‑baby  domain…	
	
When you know the birthdate of the baby, transitions
and durations become deterministic.
Additionally, due to single-child policy in China, the
default model assumes all families are single child.
Simplified  model	
•  A logistic regression model to predict yt based on
the Xt.
•  Trains the model offline, using behavior sequences
and birthdates provided.
Logistic  regression  classifier	
•  Instead of items, categories are matched against
user behavior sequences
o  Available items are changing frequently
o  Purchasing behaviors are more consistent at category level
•  Categories weighted using TFIDF to reduce the
influence of popular categories
More  on  features…	
•  User search queries
o  3 years old children’s garments, large-size diaper ….
o  Lots of information
o  Pre-processed using Chinese word segmentation -> word vectors
•  Product labels and titles
o  Size – “M” or “L”
o  “Newborn”, “1-3 years” .etc
•  Temporal Effect of Features
Predicting	
•  Probability of a user purchasing a product at a
specific age a - P(p_productj , a)
o  p_productj - Probability of purchasing product J
o  a – Baby’s age
•  For users without age information: Estimate baby’s
age distribution – Pu(a). Then do the same as
above:
•  With recent relaxations of one-child policy in China,
there’s an increased number of multi-kids families
(~10% from purchasing stats)
•  Uses a Gaussian mixture model
•  MLE/EM for parameter (w, sigma, mu) estimation
•  BIC/AIC for best K
Multi-­‐‑kids  scenario	
aj,t    =  Purchasing  age  of  the  	
	
  baby  at  time  t	
C          =  The  index  of  child	
K        =  Total  number  of  children
Implementation	
hKp://www.taobao.com/market/baobao/2014/
Evaluation	
•  ~8 million children birthdate information
Classification  accuracy	
•  With 5 fold cross-validation
Basic  -­‐‑  logistic  regression  model  with  only  product  category  features  	
Prop  –  Product  meta  data	
Title  –  Segmented  data  from  title	
Temp  –  Temporal  effects	
Seg  –  Fixed  baby  life  stage  template  is  introduced  to  segment
Evaluating  temporal  effects	
•  Temporal information play a very important role
•  Too small/too large window sizes are bad. 60 days is
optimal
•  More number of windows (more history) is better
Online  experiments  (A/B  testing)	
•  2 buckets – same mom-baby products
•  Bucket A – existing rec. system (CF, brand preference)
•  Bucket B – new system
•  Evaluating P(p_productj = yes)
Online  experiments  (A/B  testing)	
uCTR  =  Click  Trough  Rate	
CVR  =  Click  Conversion  Rate
•  Complex model and high computational cost
•  Not recommended as the first recommendation
system
•  Fits very well for deterministic transition scenarios
•  Generalization is questionable…
Summary
Thank  you!	
Questions?

More Related Content

Similar to life-state-predications

Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modelingPierre Gutierrez
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsStavros Kontopoulos
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework reviewtaeseon ryu
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
BIOMAG2018 - Darren Price - CamCAN
BIOMAG2018 - Darren Price - CamCANBIOMAG2018 - Darren Price - CamCAN
BIOMAG2018 - Darren Price - CamCANRobert Oostenveld
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceLiangjie Hong
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AIBill Liu
 
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdfMario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdfMarioAlejandroLeonGa
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
 

Similar to life-state-predications (20)

Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming Applications
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
 
M 3 iot
M 3 iotM 3 iot
M 3 iot
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Fashiondatasc
FashiondatascFashiondatasc
Fashiondatasc
 
BIOMAG2018 - Darren Price - CamCAN
BIOMAG2018 - Darren Price - CamCANBIOMAG2018 - Darren Price - CamCAN
BIOMAG2018 - Darren Price - CamCAN
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
 
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdfMario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
Mario Leon - IntelliSense.io - Presentacion Mineria Digital 2022.pdf
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
wellington1.ppt
wellington1.pptwellington1.ppt
wellington1.ppt
 
Big data
Big dataBig data
Big data
 

life-state-predications

  • 1. Life-­‐‑stage  Prediction  for  Product   Recommendation  in  E-­‐‑commerce   Authored  By: •  Peng  Jiang  -­‐‑  Alibaba  Inc •  Yadong  Zhu  -­‐‑  Alibaba  Inc •  Quan  Yuan†  -­‐‑  Alibaba  Inc •  Yi  Zhang  -­‐‑  University  of  California,  Santa  Cruz   KDD  2015,  August  10-­‐‑13,  2015,  Sydney,  Australia ©  2015  ACM    
  • 2. Overview •  A model based recommender •  Incorporates several ideas from ML and IR domains •  Tries to model life-stage of consumers •  Attempted on http://www.taobao.com
  • 3. hKp://www.taobao.com/market/baobao/2014/ •  Founded  by  Alibaba  Group  on  2003 •  ~  760  million  product  listings  as  of  2013 •  One  of  the  world’s  top  10  most  visited  websites    -­‐‑  Alexa
  • 4. Key  contributions 1.  Conception of life stages into E-commerce systems 2.  A Maximum Entropy Semi Markov model for segmentation and prediction 3.  An efficient large scale solution 4.  A solution for modeling multi-kids scenario via Gaussian mixture models 5.  Verification of the effectiveness in both offline and online scenarios
  • 5. Core  idea Importance of life stages on consumer’s purchasing behaviors •  Bachelor stage •  Newly married couples (young, no children) •  Full nest (married couple with dependent children) •  Empty nest (i.e. elder married couples with no children living together) o  Head in labor force o  Retired o  Solitary survivors o  Etc…
  • 8. Markov  models •  Hidden semi-Markov model •  Maximum-entropy Markov model Hidden  semi-­‐‑Markov  model + Maximum-­‐‑entropy  Markov  model = Maximum  Entropy  Semi  Markov  Model  (MESMM)  
  • 9. Hidden  Markov  Models •  Discrete  and  Continuous  versions •  Viterbi  algorithm  is  used  for  most  probable  state  sequence
  • 10. Viterbi  algorithm •  A dynamic programming algorithm for finding the most likely sequence of hidden states Vt,k  =    The  probability  of  the  most  probable  state  sequence  responsible                      for  the  first  t  observations  that  have  k  as  its  final  state ax,k      =  Transition  probability  from  state  x  to  k
  • 11. Hidden  semi-­‐‑Markov  model •  The probability of there being a change in the hidden state depends on the amount of time that has elapsed since entry into the current state •  This is in contrast to hidden Markov models.
  • 12. Maximum-­‐‑entropy  Markov  model •  A sequence of observations - O1, …, On •  Tag with the labels – S1,…,Sn •  Such that - P(S1,…,Sn | O1, …, On) is maximized •  Parameters  λ  can  be  learned  using  EM  (Baum–Welch) •  Optimal  state  sequence  using  Viterbi  algorithm •  Main  advantage  over  HMMs:  Overlapping/non-­‐‑independent  features
  • 13. Maximum  Entropy  Semi  Markov  Model   •  The probability of life stage yt at time t depends on, o  The previous life stage yt−1 at time t−1 o  How long the user has been in the previous life stage o  The observed user behavior sequence •  Variable d changes deterministically o  When life stage changes, d is reset to 0. Otherwise d decreases as time goes on
  • 14. MESMM  cont... Out  Goal:  Given  an  observed  behavior  sequence  X,  find   the  best  underlying  life-­‐‑stage  sequence  y1,…,yk  and  the   corresponding  duration  di,…,dk Xt:  the  observed  behavior  sequence  at  time  t dt:  the  duration  of  a  life  stage  at  time  t yt:  the  life  stage  label  at  time  t lmin,lmax:      The  minimum  and  maximum  lengths  of  life  stage
  • 15. Problem  ? The  inference  process  is  computationally  expensive! Have to predict both next state label and the duration of the period. But  in  Mom-­‐‑baby  domain… When you know the birthdate of the baby, transitions and durations become deterministic. Additionally, due to single-child policy in China, the default model assumes all families are single child.
  • 16. Simplified  model •  A logistic regression model to predict yt based on the Xt. •  Trains the model offline, using behavior sequences and birthdates provided.
  • 17. Logistic  regression  classifier •  Instead of items, categories are matched against user behavior sequences o  Available items are changing frequently o  Purchasing behaviors are more consistent at category level •  Categories weighted using TFIDF to reduce the influence of popular categories
  • 18. More  on  features… •  User search queries o  3 years old children’s garments, large-size diaper …. o  Lots of information o  Pre-processed using Chinese word segmentation -> word vectors •  Product labels and titles o  Size – “M” or “L” o  “Newborn”, “1-3 years” .etc •  Temporal Effect of Features
  • 19. Predicting •  Probability of a user purchasing a product at a specific age a - P(p_productj , a) o  p_productj - Probability of purchasing product J o  a – Baby’s age •  For users without age information: Estimate baby’s age distribution – Pu(a). Then do the same as above:
  • 20. •  With recent relaxations of one-child policy in China, there’s an increased number of multi-kids families (~10% from purchasing stats) •  Uses a Gaussian mixture model •  MLE/EM for parameter (w, sigma, mu) estimation •  BIC/AIC for best K Multi-­‐‑kids  scenario aj,t    =  Purchasing  age  of  the    baby  at  time  t C          =  The  index  of  child K        =  Total  number  of  children
  • 22. Evaluation •  ~8 million children birthdate information
  • 23. Classification  accuracy •  With 5 fold cross-validation Basic  -­‐‑  logistic  regression  model  with  only  product  category  features   Prop  –  Product  meta  data Title  –  Segmented  data  from  title Temp  –  Temporal  effects Seg  –  Fixed  baby  life  stage  template  is  introduced  to  segment
  • 24. Evaluating  temporal  effects •  Temporal information play a very important role •  Too small/too large window sizes are bad. 60 days is optimal •  More number of windows (more history) is better
  • 25. Online  experiments  (A/B  testing) •  2 buckets – same mom-baby products •  Bucket A – existing rec. system (CF, brand preference) •  Bucket B – new system •  Evaluating P(p_productj = yes)
  • 26. Online  experiments  (A/B  testing) uCTR  =  Click  Trough  Rate CVR  =  Click  Conversion  Rate
  • 27. •  Complex model and high computational cost •  Not recommended as the first recommendation system •  Fits very well for deterministic transition scenarios •  Generalization is questionable… Summary