When Relevance is not Enough:
Promoting Diversity and Freshness in
Personalized Question
Recommendation
IDAN SZPEKTOR,YOEL...
ABSTRACT
a good question recommendation system
1.

designed around answerers, rather than exclusively for askers

2.

Scal...
INTRODUCTION
Common way: only to the best possible answerers (“experts”)
All potential answerers
INTRODUCTION
relevance: to what degree the question matches the user’s tastes
diversity and freshness needs
Three requi...
RELATED WORK
limitations
real-time ranking
the needs of new users with very little historical data are not addressed wel...
Framework
Question profile:
1. LDA model
2. Lexical model
3. Category model

User profile:
Question recommendation
Mat...
QUESTION PROFILE
Split it according to the 26 top categories in Yahoo! Answers
Two Advantage:
1.
2.

represent disjoint u...
QUESTION PROFILE
Build profile, which is represented by three vectors:
1.

a Latent Dirichlet Allocation (LDA) topic vecto...
LDA Model
1. Initial training: a random sample
of up to 2 million resolved
questions
2. Incremental learning: a random
sam...
Lexical Model
a unigram bag-of-words representation of a question
tf·idf score / L1 normalized
a probability distributi...
USER PROFILE
the questions answered in the past
the user representation is generated by aggregating signals over these
q...
1. Aggregating the profiles of the questions the user answered
2. Update
the first and third tree levels:
a decaying factor on past questions

the second level:
1. Measure the similarity betwe...
QUESTION RECOMMENDATION
Matching Question and User Profiles
A list of open questions ranked by a relevance score, which is...
QUESTION RECOMMENDATION
For user profile:
associate with each user feature a score that consists of the product of each pr...
QUESTION RECOMMENDATION
Proactive Diversification
thematic sampling:
1.

For each user vector u , we generate N query vect...
QUESTION RECOMMENDATION
Recommendation Merging
blending algorithm
1.

Each list being associated with a probability score...
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
116 topics, 23 top categories
34% non-thematic topics
A logistic regressi...
EXPERIMENTS
Offline Experiment
8 different top categories
Active users: at least 21 questions as of January 2011
New use...
EXPERIMENTS
Online Experiment
A/B test
Control bucket , CTL ( n = 25093)
Relevance bucket , R ( n = 5359)
Freshness bu...
CONCLUSIONS
Relevance, but also by freshness and diversity
Several relevance models
“question retrieval engine“
Divers...
When relevance is not enough
When relevance is not enough
When relevance is not enough
When relevance is not enough
Upcoming SlideShare
Loading in …5
×

When relevance is not enough

402 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
402
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

When relevance is not enough

  1. 1. When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question Recommendation IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG YAHOO!RESEARCH
  2. 2. ABSTRACT a good question recommendation system 1. designed around answerers, rather than exclusively for askers 2. Scale to many questions and users and be fast enough 3. Relevant to his or her interests 4. diversity
  3. 3. INTRODUCTION Common way: only to the best possible answerers (“experts”) All potential answerers
  4. 4. INTRODUCTION relevance: to what degree the question matches the user’s tastes diversity and freshness needs Three requirements: 1. questions need to be recommended for all types of users 2. questions have to be diverse 3. recommendations need to be fresh and be served fast a) serve questions as recommendations immediately b) instantly adapting to users’ changes in taste
  5. 5. RELATED WORK limitations real-time ranking the needs of new users with very little historical data are not addressed well. only on relevance
  6. 6. Framework Question profile: 1. LDA model 2. Lexical model 3. Category model User profile: Question recommendation Matching question and user profiles Proactive diversification Recommendation merging
  7. 7. QUESTION PROFILE Split it according to the 26 top categories in Yahoo! Answers Two Advantage: 1. 2. represent disjoint users’ interests. word sense disambiguation 1. question textual content(title and body) 2. category
  8. 8. QUESTION PROFILE Build profile, which is represented by three vectors: 1. a Latent Dirichlet Allocation (LDA) topic vector 2. a lexical vector 3. a category vector
  9. 9. LDA Model 1. Initial training: a random sample of up to 2 million resolved questions 2. Incremental learning: a random sample of up to half a million questions per top category 3. Inference: at least10% of the probability mass
  10. 10. Lexical Model a unigram bag-of-words representation of a question tf·idf score / L1 normalized a probability distribution Category Model a probability of 1 to the category in which the question was posted
  11. 11. USER PROFILE the questions answered in the past the user representation is generated by aggregating signals over these questions user profile: a probability tree
  12. 12. 1. Aggregating the profiles of the questions the user answered 2. Update
  13. 13. the first and third tree levels: a decaying factor on past questions the second level: 1. Measure the similarity between the feature distribution of each model in the question and the corresponding feature distribution in the user profile 2. Normalized to a probability distribution
  14. 14. QUESTION RECOMMENDATION Matching Question and User Profiles A list of open questions ranked by a relevance score, which is calculated for the pair {question profile , user profile} For question profiles: 1. Turn the three vectors forming the question profile into a single vector, multiply the probability of each feature by 1/3 before storing it in the index 2. Index every question vector and build an inverted index
  15. 15. QUESTION RECOMMENDATION For user profile: associate with each user feature a score that consists of the product of each probability score on the tree path that led to this feature Ranking: Similarity: a simple dot-product
  16. 16. QUESTION RECOMMENDATION Proactive Diversification thematic sampling: 1. For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N 2. N ranked lists 3. Blending them together results in a final diverse list Two types of thematic constraints: specific top category: randomly select top categories as constraints by sampling without repetition based on their distribution in the root node of the user’s probability tree spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing the probability tree
  17. 17. QUESTION RECOMMENDATION Recommendation Merging blending algorithm 1. Each list being associated with a probability score 2. Sampling an intermediate list, based on the assigned probabilities 3. Removing one recommendation from the sampled list to be added at the end of the final list. 4. Repeat
  18. 18. QUESTION RECOMMENDATION Non-Thematic LDA Topics
  19. 19. QUESTION RECOMMENDATION Non-Thematic LDA Topics 116 topics, 23 top categories 34% non-thematic topics A logistic regression classifier
  20. 20. EXPERIMENTS Offline Experiment 8 different top categories Active users: at least 21 questions as of January 2011 New users: at least two questions as of January 2011
  21. 21. EXPERIMENTS Online Experiment A/B test Control bucket , CTL ( n = 25093) Relevance bucket , R ( n = 5359) Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling
  22. 22. CONCLUSIONS Relevance, but also by freshness and diversity Several relevance models “question retrieval engine“ Diversity: thematic sampling 内容上:different factors/models/levels 写作上:层次清楚,递进

×