SlideShare a Scribd company logo
When Relevance is not Enough:
Promoting Diversity and Freshness in
Personalized Question
Recommendation
IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG

YAHOO!RESEARCH
ABSTRACT
a good question recommendation system
1.

designed around answerers, rather than exclusively for askers

2.

Scale to many questions and users and be fast enough

3.

Relevant to his or her interests

4.

diversity
INTRODUCTION
Common way: only to the best possible answerers (“experts”)
All potential answerers
INTRODUCTION
relevance: to what degree the question matches the user’s tastes
diversity and freshness needs
Three requirements:
1. questions need to be recommended for all types of users
2. questions have to be diverse
3. recommendations need to be fresh and be served fast
a) serve questions as recommendations immediately
b) instantly adapting to users’ changes in taste
RELATED WORK
limitations
real-time ranking
the needs of new users with very little historical data are not addressed well.
only on relevance
Framework
Question profile:
1. LDA model
2. Lexical model
3. Category model

User profile:
Question recommendation
Matching question and user profiles
Proactive diversification
Recommendation merging
QUESTION PROFILE
Split it according to the 26 top categories in Yahoo! Answers
Two Advantage:
1.
2.

represent disjoint users’ interests.
word sense disambiguation

1.

question textual content(title and body)

2.

category
QUESTION PROFILE
Build profile, which is represented by three vectors:
1.

a Latent Dirichlet Allocation (LDA) topic vector

2.

a lexical vector

3.

a category vector
LDA Model
1. Initial training: a random sample
of up to 2 million resolved
questions
2. Incremental learning: a random
sample of up to half a million
questions per top category
3. Inference: at least10% of the
probability mass
Lexical Model
a unigram bag-of-words representation of a question
tf·idf score / L1 normalized
a probability distribution

Category Model
a probability of 1 to the category in which the question was posted
USER PROFILE
the questions answered in the past
the user representation is generated by aggregating signals over these
questions
user profile: a probability tree
1. Aggregating the profiles of the questions the user answered
2. Update
the first and third tree levels:
a decaying factor on past questions

the second level:
1. Measure the similarity between the feature distribution of each model in the
question and the corresponding feature distribution in the user profile
2. Normalized to a probability distribution
QUESTION RECOMMENDATION
Matching Question and User Profiles
A list of open questions ranked by a relevance score, which is calculated for the pair {question
profile , user profile}

For question profiles:
1.

Turn the three vectors forming the question profile into a single vector, multiply the
probability of each feature by 1/3 before storing it in the index

2.

Index every question vector and build an inverted index
QUESTION RECOMMENDATION
For user profile:
associate with each user feature a score that consists of the product of each probability score
on the tree path that led to this feature

Ranking:
Similarity: a simple dot-product
QUESTION RECOMMENDATION
Proactive Diversification
thematic sampling:
1.

For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N

2.

N ranked lists

3.

Blending them together results in a final diverse list

Two types of thematic constraints:

specific top category: randomly select top categories as constraints by sampling without repetition
based on their distribution in the root node of the user’s probability tree
spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing
the probability tree
QUESTION RECOMMENDATION
Recommendation Merging
blending algorithm
1.

Each list being associated with a probability score

2.

Sampling an intermediate list, based on the assigned probabilities

3.

Removing one recommendation from the sampled list to be added at the end of the final
list.

4.

Repeat
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
116 topics, 23 top categories
34% non-thematic topics
A logistic regression classifier
EXPERIMENTS
Offline Experiment
8 different top categories
Active users: at least 21 questions as of January 2011
New users: at least two questions as of January 2011
EXPERIMENTS
Online Experiment
A/B test
Control bucket , CTL ( n = 25093)
Relevance bucket , R ( n = 5359)
Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling
Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling
CONCLUSIONS
Relevance, but also by freshness and diversity
Several relevance models
“question retrieval engine“
Diversity: thematic sampling
内容上:different factors/models/levels

写作上:层次清楚,递进

More Related Content

Similar to When relevance is not enough

Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process
Sara Marsham
 
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docxPUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
potmanandrea
 
1 Social Science Statistics Project 1 Global Issu.docx
 1 Social Science Statistics  Project 1 Global Issu.docx 1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
ShiraPrater50
 
1 Social Science Statistics Project 1 Global Issu.docx
1 Social Science Statistics  Project 1 Global Issu.docx1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
poulterbarbara
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
Aravind Sesagiri Raamkumar
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has achenbojyh
 
Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a
TatianaMajor22
 
· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx
odiliagilby
 
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docxHUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
eugeniadean34240
 
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docxTYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
ouldparis
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)
CGC Technical campus,Mohali
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019
uoblibraries
 
! College of Doctoral Studies PSY 850 SPSS Assi.docx
!          College of Doctoral Studies PSY 850 SPSS Assi.docx!          College of Doctoral Studies PSY 850 SPSS Assi.docx
! College of Doctoral Studies PSY 850 SPSS Assi.docx
MARRY7
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015Hannah Marshall
 
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL  1  .docxRunning head HOW TO WRITE A RESEARCH PROPOSAL  1  .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
cowinhelen
 
Research Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docxResearch Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docx
debishakespeare
 
College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxCollege of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
adkinspaige22
 
College of Doctoral StudiesBackground Inform.docx
                College of Doctoral StudiesBackground Inform.docx                College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
hallettfaustina
 
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
blondellchancy
 
Academic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docxAcademic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docx
daniahendric
 

Similar to When relevance is not enough (20)

Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process
 
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docxPUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
 
1 Social Science Statistics Project 1 Global Issu.docx
 1 Social Science Statistics  Project 1 Global Issu.docx 1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
 
1 Social Science Statistics Project 1 Global Issu.docx
1 Social Science Statistics  Project 1 Global Issu.docx1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has a
 
Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a
 
· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx
 
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docxHUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
 
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docxTYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019
 
! College of Doctoral Studies PSY 850 SPSS Assi.docx
!          College of Doctoral Studies PSY 850 SPSS Assi.docx!          College of Doctoral Studies PSY 850 SPSS Assi.docx
! College of Doctoral Studies PSY 850 SPSS Assi.docx
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015
 
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL  1  .docxRunning head HOW TO WRITE A RESEARCH PROPOSAL  1  .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
 
Research Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docxResearch Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docx
 
College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxCollege of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
 
College of Doctoral StudiesBackground Inform.docx
                College of Doctoral StudiesBackground Inform.docx                College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
 
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
 
Academic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docxAcademic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docx
 

More from moresmile

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twittermoresmile
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networksmoresmile
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switchmoresmile
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogsmoresmile
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogsmoresmile
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthmoresmile
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
moresmile
 

More from moresmile (9)

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networks
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogs
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

When relevance is not enough

  • 1. When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question Recommendation IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG YAHOO!RESEARCH
  • 2. ABSTRACT a good question recommendation system 1. designed around answerers, rather than exclusively for askers 2. Scale to many questions and users and be fast enough 3. Relevant to his or her interests 4. diversity
  • 3. INTRODUCTION Common way: only to the best possible answerers (“experts”) All potential answerers
  • 4. INTRODUCTION relevance: to what degree the question matches the user’s tastes diversity and freshness needs Three requirements: 1. questions need to be recommended for all types of users 2. questions have to be diverse 3. recommendations need to be fresh and be served fast a) serve questions as recommendations immediately b) instantly adapting to users’ changes in taste
  • 5.
  • 6. RELATED WORK limitations real-time ranking the needs of new users with very little historical data are not addressed well. only on relevance
  • 7. Framework Question profile: 1. LDA model 2. Lexical model 3. Category model User profile: Question recommendation Matching question and user profiles Proactive diversification Recommendation merging
  • 8. QUESTION PROFILE Split it according to the 26 top categories in Yahoo! Answers Two Advantage: 1. 2. represent disjoint users’ interests. word sense disambiguation 1. question textual content(title and body) 2. category
  • 9. QUESTION PROFILE Build profile, which is represented by three vectors: 1. a Latent Dirichlet Allocation (LDA) topic vector 2. a lexical vector 3. a category vector
  • 10. LDA Model 1. Initial training: a random sample of up to 2 million resolved questions 2. Incremental learning: a random sample of up to half a million questions per top category 3. Inference: at least10% of the probability mass
  • 11. Lexical Model a unigram bag-of-words representation of a question tf·idf score / L1 normalized a probability distribution Category Model a probability of 1 to the category in which the question was posted
  • 12. USER PROFILE the questions answered in the past the user representation is generated by aggregating signals over these questions user profile: a probability tree
  • 13. 1. Aggregating the profiles of the questions the user answered 2. Update
  • 14. the first and third tree levels: a decaying factor on past questions the second level: 1. Measure the similarity between the feature distribution of each model in the question and the corresponding feature distribution in the user profile 2. Normalized to a probability distribution
  • 15. QUESTION RECOMMENDATION Matching Question and User Profiles A list of open questions ranked by a relevance score, which is calculated for the pair {question profile , user profile} For question profiles: 1. Turn the three vectors forming the question profile into a single vector, multiply the probability of each feature by 1/3 before storing it in the index 2. Index every question vector and build an inverted index
  • 16. QUESTION RECOMMENDATION For user profile: associate with each user feature a score that consists of the product of each probability score on the tree path that led to this feature Ranking: Similarity: a simple dot-product
  • 17. QUESTION RECOMMENDATION Proactive Diversification thematic sampling: 1. For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N 2. N ranked lists 3. Blending them together results in a final diverse list Two types of thematic constraints: specific top category: randomly select top categories as constraints by sampling without repetition based on their distribution in the root node of the user’s probability tree spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing the probability tree
  • 18. QUESTION RECOMMENDATION Recommendation Merging blending algorithm 1. Each list being associated with a probability score 2. Sampling an intermediate list, based on the assigned probabilities 3. Removing one recommendation from the sampled list to be added at the end of the final list. 4. Repeat
  • 20. QUESTION RECOMMENDATION Non-Thematic LDA Topics 116 topics, 23 top categories 34% non-thematic topics A logistic regression classifier
  • 21. EXPERIMENTS Offline Experiment 8 different top categories Active users: at least 21 questions as of January 2011 New users: at least two questions as of January 2011
  • 22. EXPERIMENTS Online Experiment A/B test Control bucket , CTL ( n = 25093) Relevance bucket , R ( n = 5359) Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling
  • 23.
  • 24.
  • 25.
  • 26. CONCLUSIONS Relevance, but also by freshness and diversity Several relevance models “question retrieval engine“ Diversity: thematic sampling 内容上:different factors/models/levels 写作上:层次清楚,递进