SlideShare a Scribd company logo
1 of 37
Enlister
Baidu's Recommender System for
the Biggest Chinese Q&A Website
   JasonFu | fujuxuan@gmail.com

   2012-12-10, Monday
We are leaving the age of information and
entering the age of recommendation.
                          克里斯·安德森,《长尾理论》作者




                                          1/36
ABOUT THE PAPER




                  2/36
BACKGROUND




             3/36
BACKGROUND




             4/36
BACKGROUND




             5/36
BACKGROUND

   Baidu Knows offers a Q&A platform to its users for knowledge
    and experience sharing.

   Today, Baidu Knows website:
        There are over 400 million questions
        More than 170 million questions were answered by users
        Around 100 million users search for answers every day
        Over 12 new questions are posted online every second


   Baidu Knows is the biggest Q&A website in China.



                                                                  6/36
BACKGROUND
   An eco-system of knowledge sharing between the website’s users.
   As more questions have been answered, the search result quality of
    common users will be improved.
   “Answer” is contribution, not consumption.




                                                                         7/36
APPLICATION DESCRIPTION

   Baidu Knows builds an intelligent RS, Enlister, to provide the Baidu
    Knows users with questions that they may be willing to answer.


   Is based on the machine learning technology


   Apply a machine learning based CTR prediction methodology to
    improve the recommendation accuracy




                                                                           8/36
APPLICATION DESCRIPTION


     Previous                       Enlister
                                  Content-based
   Typical content-based
                                  CTR prediction
   Cosine similarity degree
                                  Stream computing




                                                      9/36
ALGORITHM

   We expect the user models to disclose the nature of our users’

    choices of answering questions.

   The models have to be simple enough to accommodate massive

    calculation and industrial adoption.




                                                                     10/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          11/36
1. User Model
                               Age, Gender,
                                Education


                  attributes        Tags


                                     …
     user model
                               Interest Term
                                   Vector

                                  Related
                   interest
                                 Questions

                                  Abstract
                               Interest Vector

                                                 12/36
1. User Model
   Interest Term Vector
       A vector contains weights of terms, which implicate the correlation between
        user and the term on sematic level.


   Related Questions
       Questions are browsed or answered by a user.


   Abstract Interest Vector
       A vector contains weights of terms, which implicate the correlation between
        a user and an abstract concept.
       Use PLSA technology to get a conceptual topic model from millions of
        question-answer pairs on the Baidu Knows website .


                                                                               13/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          14/36
2. Click Prediction



   The click model that we created is a kind of probabilistic
    classification model. It is a binary classification model to
    calculate the probability of a sample belonging to a class.




                                                                   15/36
2. Click Prediction

   Sample Collection
       Positive samples: questions that the user had examined and clicked.
       Negative samples: questions that randomly choose from the question pool.


   Feature Selection
       User Attributes: (1) basic user attributes (2) other features from the statistics
       Correlation Degree: the number of matched terms, cosine similarity and bm25
        between the user interest term vectors and question vectors

       Classification Algorithm: two principles (1) probabilistic classification (2) linear
        classifier




                                                                                            16/36
2. Click Prediction
   Classification Algorithm
       maximum entropy classifier
       The probability P of a user u will click the question q can be calculated as
        follows:




       Global optimization solution:limited-memory Broyden-Fletcher-
        Goldfarb-Shanno (L-BFGS) ; Stochastic Gradient Descent (SGD)


                                                                                  17/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          18/36
3. Diversity Adjustment

   The head part and the tail part of a list garnered most attention from the
    users.


   For the head part, we apply a loose filtering algorithm, which only deletes
    some apparent duplication in the list.


   For the tail part, we use a strict filtering algorithm to take out any
    questions that have noticeable semantic level similarity to each other in
    the list.




                                                                             19/36
SYSTEM SETUP
   The most important concept in the Enlister system design is real-time CTR
    prediction. The major data process can be described as follows :




                                                                                20/36
SYSTEM SETUP
   For building the data processing flow, we construct multiple logic queues between
    processing nodes.
   The processing nodes are grouped into several node groups. Each group
    represents a simple logic section.




                                                                                 21/36
22/36
Before login   After login




                             23/36
24/36
25/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          26/36
1. Evaluation Metrics
   Precision(查准率):tp/(tp+fp),识别出的真正的正面观点数/所有的识别为正面观点的条数
   Recall(查全率):tp/(tp+fn), 识别出的真正的正面观点数/样本中所有的真正正面观点的条数
   Accuracy(准确率): (tp + tn)/(tp + fn + fp + tn), 正确识别观点数/所有观点的条数




                               Confusion Matrix

                                                                    27/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          28/36
2. Experiment
   Sample Selection
        100,000 questions that had been viewed and clicked by users are selected from
         users’ logs as positive sample, which involves 10 thousands users with 10
         records per user on average.
        Negative samples: random negative samples




                                                                                 29/36
2. Experiment

   Sample Proportion




                        30/36
2. Experiment

   Optimization Algorithm
        LBFGS algorithms is chosen as the optimization algorithm in the maximum
         entropy model training.




                                                                              31/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          32/36
3. Online Evaluation

   Enlister was released to the Baidu Knows users and an online evaluation
    was conducted from Feb. 11th, 2012.




                                                                        33/36
3. Online Evaluation




                       34/36
CONCLUSION
① Have successfully built a real-time RS that serves millions of users every
     day.
② The algorithm and system design fit the recommendation scenario quite
     well.
③ Great improvement had been made on the accuracy and time-sensitive
     issues.
④ The number of active users had grown substantially after the system was
     officially launched.
   The future work : the timing of recommendation and the utilization of
    relationships between users



                                                                            35/36
Thank You !



              36/36

More Related Content

What's hot

Recommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association RulesRecommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association RulesIJARIIE JOURNAL
 
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...TELKOMNIKA JOURNAL
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive ModelDKALab
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation enginehkbhadraa
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET Journal
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019loki536577
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
 
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance FeedbackIRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance FeedbackIRJET Journal
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataIRJET Journal
 
Multi-class Bio-images Classification
Multi-class Bio-images ClassificationMulti-class Bio-images Classification
Multi-class Bio-images ClassificationZhuo Li
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationToshihiro Kamishima
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
 

What's hot (20)

Recommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association RulesRecommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association Rules
 
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
 
Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
 
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance FeedbackIRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Multi-class Bio-images Classification
Multi-class Bio-images ClassificationMulti-class Bio-images Classification
Multi-class Bio-images Classification
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
 

Similar to Enlister baidu's recommender system for the biggest chinese q&a website

Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems Badr Hirchoua
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...IRJET Journal
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social NetworksIRJET Journal
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...siramatu-lab
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationEvgeny Frolov
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET Journal
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONIRJET Journal
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxdaniahendric
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)es712
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET Journal
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturergomathi chlm
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
 

Similar to Enlister baidu's recommender system for the biggest chinese q&a website (20)

Recommender-technology-ReColl08
Recommender-technology-ReColl08Recommender-technology-ReColl08
Recommender-technology-ReColl08
 
Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social Networks
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User Interest
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Enlister baidu's recommender system for the biggest chinese q&a website

  • 1. Enlister Baidu's Recommender System for the Biggest Chinese Q&A Website  JasonFu | fujuxuan@gmail.com  2012-12-10, Monday
  • 2. We are leaving the age of information and entering the age of recommendation.  克里斯·安德森,《长尾理论》作者 1/36
  • 4. BACKGROUND 3/36
  • 5. BACKGROUND 4/36
  • 6. BACKGROUND 5/36
  • 7. BACKGROUND  Baidu Knows offers a Q&A platform to its users for knowledge and experience sharing.  Today, Baidu Knows website:  There are over 400 million questions  More than 170 million questions were answered by users  Around 100 million users search for answers every day  Over 12 new questions are posted online every second  Baidu Knows is the biggest Q&A website in China. 6/36
  • 8. BACKGROUND  An eco-system of knowledge sharing between the website’s users.  As more questions have been answered, the search result quality of common users will be improved.  “Answer” is contribution, not consumption. 7/36
  • 9. APPLICATION DESCRIPTION  Baidu Knows builds an intelligent RS, Enlister, to provide the Baidu Knows users with questions that they may be willing to answer.  Is based on the machine learning technology  Apply a machine learning based CTR prediction methodology to improve the recommendation accuracy 8/36
  • 10. APPLICATION DESCRIPTION Previous Enlister  Content-based  Typical content-based  CTR prediction  Cosine similarity degree  Stream computing 9/36
  • 11. ALGORITHM  We expect the user models to disclose the nature of our users’ choices of answering questions.  The models have to be simple enough to accommodate massive calculation and industrial adoption. 10/36
  • 12. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 11/36
  • 13. 1. User Model Age, Gender, Education attributes Tags … user model Interest Term Vector Related interest Questions Abstract Interest Vector 12/36
  • 14. 1. User Model  Interest Term Vector  A vector contains weights of terms, which implicate the correlation between user and the term on sematic level.  Related Questions  Questions are browsed or answered by a user.  Abstract Interest Vector  A vector contains weights of terms, which implicate the correlation between a user and an abstract concept.  Use PLSA technology to get a conceptual topic model from millions of question-answer pairs on the Baidu Knows website . 13/36
  • 15. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 14/36
  • 16. 2. Click Prediction  The click model that we created is a kind of probabilistic classification model. It is a binary classification model to calculate the probability of a sample belonging to a class. 15/36
  • 17. 2. Click Prediction  Sample Collection  Positive samples: questions that the user had examined and clicked.  Negative samples: questions that randomly choose from the question pool.  Feature Selection  User Attributes: (1) basic user attributes (2) other features from the statistics  Correlation Degree: the number of matched terms, cosine similarity and bm25 between the user interest term vectors and question vectors  Classification Algorithm: two principles (1) probabilistic classification (2) linear classifier 16/36
  • 18. 2. Click Prediction  Classification Algorithm  maximum entropy classifier  The probability P of a user u will click the question q can be calculated as follows:  Global optimization solution:limited-memory Broyden-Fletcher- Goldfarb-Shanno (L-BFGS) ; Stochastic Gradient Descent (SGD) 17/36
  • 19. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 18/36
  • 20. 3. Diversity Adjustment  The head part and the tail part of a list garnered most attention from the users.  For the head part, we apply a loose filtering algorithm, which only deletes some apparent duplication in the list.  For the tail part, we use a strict filtering algorithm to take out any questions that have noticeable semantic level similarity to each other in the list. 19/36
  • 21. SYSTEM SETUP  The most important concept in the Enlister system design is real-time CTR prediction. The major data process can be described as follows : 20/36
  • 22. SYSTEM SETUP  For building the data processing flow, we construct multiple logic queues between processing nodes.  The processing nodes are grouped into several node groups. Each group represents a simple logic section. 21/36
  • 23. 22/36
  • 24. Before login After login 23/36
  • 25. 24/36
  • 26. 25/36
  • 27. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 26/36
  • 28. 1. Evaluation Metrics  Precision(查准率):tp/(tp+fp),识别出的真正的正面观点数/所有的识别为正面观点的条数  Recall(查全率):tp/(tp+fn), 识别出的真正的正面观点数/样本中所有的真正正面观点的条数  Accuracy(准确率): (tp + tn)/(tp + fn + fp + tn), 正确识别观点数/所有观点的条数 Confusion Matrix 27/36
  • 29. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 28/36
  • 30. 2. Experiment  Sample Selection  100,000 questions that had been viewed and clicked by users are selected from users’ logs as positive sample, which involves 10 thousands users with 10 records per user on average.  Negative samples: random negative samples 29/36
  • 31. 2. Experiment  Sample Proportion 30/36
  • 32. 2. Experiment  Optimization Algorithm  LBFGS algorithms is chosen as the optimization algorithm in the maximum entropy model training. 31/36
  • 33. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 32/36
  • 34. 3. Online Evaluation  Enlister was released to the Baidu Knows users and an online evaluation was conducted from Feb. 11th, 2012. 33/36
  • 36. CONCLUSION ① Have successfully built a real-time RS that serves millions of users every day. ② The algorithm and system design fit the recommendation scenario quite well. ③ Great improvement had been made on the accuracy and time-sensitive issues. ④ The number of active users had grown substantially after the system was officially launched.  The future work : the timing of recommendation and the utilization of relationships between users 35/36
  • 37. Thank You ! 36/36

Editor's Notes

  1. 在这样的顶级国际会议上,也出现了中国互联网公司的身影,来自中国内地的百度是唯一参加这个会议的国内公司,也是第一家以论文作者的身份参加会议的国内公司。这份论文受到了国外同行的一致认可,并最终被大会录用。据悉,RecSys 2012此次共接收长论文24篇,录取率20.2%;接收短论文21篇,录取率31.8%。