SlideShare a Scribd company logo
Enlister
Baidu's Recommender System for
the Biggest Chinese Q&A Website
   JasonFu | fujuxuan@gmail.com

   2012-12-10, Monday
We are leaving the age of information and
entering the age of recommendation.
                          克里斯·安德森,《长尾理论》作者




                                          1/36
ABOUT THE PAPER




                  2/36
BACKGROUND




             3/36
BACKGROUND




             4/36
BACKGROUND




             5/36
BACKGROUND

   Baidu Knows offers a Q&A platform to its users for knowledge
    and experience sharing.

   Today, Baidu Knows website:
        There are over 400 million questions
        More than 170 million questions were answered by users
        Around 100 million users search for answers every day
        Over 12 new questions are posted online every second


   Baidu Knows is the biggest Q&A website in China.



                                                                  6/36
BACKGROUND
   An eco-system of knowledge sharing between the website’s users.
   As more questions have been answered, the search result quality of
    common users will be improved.
   “Answer” is contribution, not consumption.




                                                                         7/36
APPLICATION DESCRIPTION

   Baidu Knows builds an intelligent RS, Enlister, to provide the Baidu
    Knows users with questions that they may be willing to answer.


   Is based on the machine learning technology


   Apply a machine learning based CTR prediction methodology to
    improve the recommendation accuracy




                                                                           8/36
APPLICATION DESCRIPTION


     Previous                       Enlister
                                  Content-based
   Typical content-based
                                  CTR prediction
   Cosine similarity degree
                                  Stream computing




                                                      9/36
ALGORITHM

   We expect the user models to disclose the nature of our users’

    choices of answering questions.

   The models have to be simple enough to accommodate massive

    calculation and industrial adoption.




                                                                     10/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          11/36
1. User Model
                               Age, Gender,
                                Education


                  attributes        Tags


                                     …
     user model
                               Interest Term
                                   Vector

                                  Related
                   interest
                                 Questions

                                  Abstract
                               Interest Vector

                                                 12/36
1. User Model
   Interest Term Vector
       A vector contains weights of terms, which implicate the correlation between
        user and the term on sematic level.


   Related Questions
       Questions are browsed or answered by a user.


   Abstract Interest Vector
       A vector contains weights of terms, which implicate the correlation between
        a user and an abstract concept.
       Use PLSA technology to get a conceptual topic model from millions of
        question-answer pairs on the Baidu Knows website .


                                                                               13/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          14/36
2. Click Prediction



   The click model that we created is a kind of probabilistic
    classification model. It is a binary classification model to
    calculate the probability of a sample belonging to a class.




                                                                   15/36
2. Click Prediction

   Sample Collection
       Positive samples: questions that the user had examined and clicked.
       Negative samples: questions that randomly choose from the question pool.


   Feature Selection
       User Attributes: (1) basic user attributes (2) other features from the statistics
       Correlation Degree: the number of matched terms, cosine similarity and bm25
        between the user interest term vectors and question vectors

       Classification Algorithm: two principles (1) probabilistic classification (2) linear
        classifier




                                                                                            16/36
2. Click Prediction
   Classification Algorithm
       maximum entropy classifier
       The probability P of a user u will click the question q can be calculated as
        follows:




       Global optimization solution:limited-memory Broyden-Fletcher-
        Goldfarb-Shanno (L-BFGS) ; Stochastic Gradient Descent (SGD)


                                                                                  17/36
ALGORITHM

1. User Model
2. Click Prediction

3. Diversity Adjustment




                          18/36
3. Diversity Adjustment

   The head part and the tail part of a list garnered most attention from the
    users.


   For the head part, we apply a loose filtering algorithm, which only deletes
    some apparent duplication in the list.


   For the tail part, we use a strict filtering algorithm to take out any
    questions that have noticeable semantic level similarity to each other in
    the list.




                                                                             19/36
SYSTEM SETUP
   The most important concept in the Enlister system design is real-time CTR
    prediction. The major data process can be described as follows :




                                                                                20/36
SYSTEM SETUP
   For building the data processing flow, we construct multiple logic queues between
    processing nodes.
   The processing nodes are grouped into several node groups. Each group
    represents a simple logic section.




                                                                                 21/36
22/36
Before login   After login




                             23/36
24/36
25/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          26/36
1. Evaluation Metrics
   Precision(查准率):tp/(tp+fp),识别出的真正的正面观点数/所有的识别为正面观点的条数
   Recall(查全率):tp/(tp+fn), 识别出的真正的正面观点数/样本中所有的真正正面观点的条数
   Accuracy(准确率): (tp + tn)/(tp + fn + fp + tn), 正确识别观点数/所有观点的条数




                               Confusion Matrix

                                                                    27/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          28/36
2. Experiment
   Sample Selection
        100,000 questions that had been viewed and clicked by users are selected from
         users’ logs as positive sample, which involves 10 thousands users with 10
         records per user on average.
        Negative samples: random negative samples




                                                                                 29/36
2. Experiment

   Sample Proportion




                        30/36
2. Experiment

   Optimization Algorithm
        LBFGS algorithms is chosen as the optimization algorithm in the maximum
         entropy model training.




                                                                              31/36
EXPERIMENT & EVALUATION

1. Evaluation Metrics
2. Experiment

3. Online Evaluation




                          32/36
3. Online Evaluation

   Enlister was released to the Baidu Knows users and an online evaluation
    was conducted from Feb. 11th, 2012.




                                                                        33/36
3. Online Evaluation




                       34/36
CONCLUSION
① Have successfully built a real-time RS that serves millions of users every
     day.
② The algorithm and system design fit the recommendation scenario quite
     well.
③ Great improvement had been made on the accuracy and time-sensitive
     issues.
④ The number of active users had grown substantially after the system was
     officially launched.
   The future work : the timing of recommendation and the utilization of
    relationships between users



                                                                            35/36
Thank You !



              36/36

More Related Content

What's hot

Recommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association RulesRecommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association Rules
IJARIIE JOURNAL
 
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
TELKOMNIKA JOURNAL
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
DKALab
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
Abhishek Jaisingh
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
IJECEIAES
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
rahulmonikasharma
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
hkbhadraa
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
IRJET Journal
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
IJMIT JOURNAL
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Sujith Jayaprakash
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
loki536577
 
Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
Aravindharamanan S
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Dr. Amarjeet Singh
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
ijaia
 
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance FeedbackIRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET Journal
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
IRJET Journal
 
Multi-class Bio-images Classification
Multi-class Bio-images ClassificationMulti-class Bio-images Classification
Multi-class Bio-images ClassificationZhuo Li
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
Toshihiro Kamishima
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
ijaia
 

What's hot (20)

Recommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association RulesRecommendation based on Clustering and Association Rules
Recommendation based on Clustering and Association Rules
 
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
 
Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
 
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance FeedbackIRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
IRJET-Semi-Supervised Collaborative Image Retrieval using Relevance Feedback
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Multi-class Bio-images Classification
Multi-class Bio-images ClassificationMulti-class Bio-images Classification
Multi-class Bio-images Classification
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
 

Similar to Enlister baidu's recommender system for the biggest chinese q&a website

Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems
Badr Hirchoua
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
IRJET Journal
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
Drjabez
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET Journal
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
Gabriel Moreira
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...
siramatu-lab
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
Evgeny Frolov
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User Interest
IRJET Journal
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
Axel de Romblay
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
IRJET Journal
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
Axel de Romblay
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
Salford Systems
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
daniahendric
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)es712
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET Journal
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
dongchangim30
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
gomathi chlm
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
IRJET Journal
 

Similar to Enlister baidu's recommender system for the biggest chinese q&a website (20)

Recommender-technology-ReColl08
Recommender-technology-ReColl08Recommender-technology-ReColl08
Recommender-technology-ReColl08
 
Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social Networks
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User Interest
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 

Recently uploaded

Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 

Recently uploaded (20)

Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 

Enlister baidu's recommender system for the biggest chinese q&a website

  • 1. Enlister Baidu's Recommender System for the Biggest Chinese Q&A Website  JasonFu | fujuxuan@gmail.com  2012-12-10, Monday
  • 2. We are leaving the age of information and entering the age of recommendation.  克里斯·安德森,《长尾理论》作者 1/36
  • 4. BACKGROUND 3/36
  • 5. BACKGROUND 4/36
  • 6. BACKGROUND 5/36
  • 7. BACKGROUND  Baidu Knows offers a Q&A platform to its users for knowledge and experience sharing.  Today, Baidu Knows website:  There are over 400 million questions  More than 170 million questions were answered by users  Around 100 million users search for answers every day  Over 12 new questions are posted online every second  Baidu Knows is the biggest Q&A website in China. 6/36
  • 8. BACKGROUND  An eco-system of knowledge sharing between the website’s users.  As more questions have been answered, the search result quality of common users will be improved.  “Answer” is contribution, not consumption. 7/36
  • 9. APPLICATION DESCRIPTION  Baidu Knows builds an intelligent RS, Enlister, to provide the Baidu Knows users with questions that they may be willing to answer.  Is based on the machine learning technology  Apply a machine learning based CTR prediction methodology to improve the recommendation accuracy 8/36
  • 10. APPLICATION DESCRIPTION Previous Enlister  Content-based  Typical content-based  CTR prediction  Cosine similarity degree  Stream computing 9/36
  • 11. ALGORITHM  We expect the user models to disclose the nature of our users’ choices of answering questions.  The models have to be simple enough to accommodate massive calculation and industrial adoption. 10/36
  • 12. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 11/36
  • 13. 1. User Model Age, Gender, Education attributes Tags … user model Interest Term Vector Related interest Questions Abstract Interest Vector 12/36
  • 14. 1. User Model  Interest Term Vector  A vector contains weights of terms, which implicate the correlation between user and the term on sematic level.  Related Questions  Questions are browsed or answered by a user.  Abstract Interest Vector  A vector contains weights of terms, which implicate the correlation between a user and an abstract concept.  Use PLSA technology to get a conceptual topic model from millions of question-answer pairs on the Baidu Knows website . 13/36
  • 15. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 14/36
  • 16. 2. Click Prediction  The click model that we created is a kind of probabilistic classification model. It is a binary classification model to calculate the probability of a sample belonging to a class. 15/36
  • 17. 2. Click Prediction  Sample Collection  Positive samples: questions that the user had examined and clicked.  Negative samples: questions that randomly choose from the question pool.  Feature Selection  User Attributes: (1) basic user attributes (2) other features from the statistics  Correlation Degree: the number of matched terms, cosine similarity and bm25 between the user interest term vectors and question vectors  Classification Algorithm: two principles (1) probabilistic classification (2) linear classifier 16/36
  • 18. 2. Click Prediction  Classification Algorithm  maximum entropy classifier  The probability P of a user u will click the question q can be calculated as follows:  Global optimization solution:limited-memory Broyden-Fletcher- Goldfarb-Shanno (L-BFGS) ; Stochastic Gradient Descent (SGD) 17/36
  • 19. ALGORITHM 1. User Model 2. Click Prediction 3. Diversity Adjustment 18/36
  • 20. 3. Diversity Adjustment  The head part and the tail part of a list garnered most attention from the users.  For the head part, we apply a loose filtering algorithm, which only deletes some apparent duplication in the list.  For the tail part, we use a strict filtering algorithm to take out any questions that have noticeable semantic level similarity to each other in the list. 19/36
  • 21. SYSTEM SETUP  The most important concept in the Enlister system design is real-time CTR prediction. The major data process can be described as follows : 20/36
  • 22. SYSTEM SETUP  For building the data processing flow, we construct multiple logic queues between processing nodes.  The processing nodes are grouped into several node groups. Each group represents a simple logic section. 21/36
  • 23. 22/36
  • 24. Before login After login 23/36
  • 25. 24/36
  • 26. 25/36
  • 27. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 26/36
  • 28. 1. Evaluation Metrics  Precision(查准率):tp/(tp+fp),识别出的真正的正面观点数/所有的识别为正面观点的条数  Recall(查全率):tp/(tp+fn), 识别出的真正的正面观点数/样本中所有的真正正面观点的条数  Accuracy(准确率): (tp + tn)/(tp + fn + fp + tn), 正确识别观点数/所有观点的条数 Confusion Matrix 27/36
  • 29. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 28/36
  • 30. 2. Experiment  Sample Selection  100,000 questions that had been viewed and clicked by users are selected from users’ logs as positive sample, which involves 10 thousands users with 10 records per user on average.  Negative samples: random negative samples 29/36
  • 31. 2. Experiment  Sample Proportion 30/36
  • 32. 2. Experiment  Optimization Algorithm  LBFGS algorithms is chosen as the optimization algorithm in the maximum entropy model training. 31/36
  • 33. EXPERIMENT & EVALUATION 1. Evaluation Metrics 2. Experiment 3. Online Evaluation 32/36
  • 34. 3. Online Evaluation  Enlister was released to the Baidu Knows users and an online evaluation was conducted from Feb. 11th, 2012. 33/36
  • 36. CONCLUSION ① Have successfully built a real-time RS that serves millions of users every day. ② The algorithm and system design fit the recommendation scenario quite well. ③ Great improvement had been made on the accuracy and time-sensitive issues. ④ The number of active users had grown substantially after the system was officially launched.  The future work : the timing of recommendation and the utilization of relationships between users 35/36
  • 37. Thank You ! 36/36

Editor's Notes

  1. 在这样的顶级国际会议上,也出现了中国互联网公司的身影,来自中国内地的百度是唯一参加这个会议的国内公司,也是第一家以论文作者的身份参加会议的国内公司。这份论文受到了国外同行的一致认可,并最终被大会录用。据悉,RecSys 2012此次共接收长论文24篇,录取率20.2%;接收短论文21篇,录取率31.8%。