SlideShare a Scribd company logo
Usman Sharif

RECOMMENDATION SYSTEMS
Why recommendation systems?

 Provide a better experience to your users.
 Understand the behavior and patterns of
  users.
 Enables an opportunity to re-engage inactive
  users.
 Boost sales
 Better than a search feature
How some companies are using
Recommendation Systems - Amazon
How some companies are using
Recommendation Systems - Gmail
A simple recommendation system

 Consider the following scenario
   A library has books and has members
   Members can have books issued
   The library wants to build a recommender system
    to recommend books to their members
Scoring Matrices
         Book 1   Book 2   Book 3   Book 4
User 1   X                 X
User 2   X
User 3            X                 X
User 4   X                 X        X
User 5   X        X

         Book 1   Book 2   Book 3   Book 4
Book 1   4        1        2        1
Book 2   1        2        0        1
Book 3   2        0        2        1
Book 4   1        1        1        2
Using the scoring matrices

 If a user has read Book 1 recommend Book 3, 2, 4.
 If a user has read Book 2 recommend Book 1, 4, 3.
 If a user has read Book 3 recommend Book 1, 4, 2.
 If a user has read Book 4 recommend Book 1, 2, 3.
Advantages

 Very simple to understand and implement.
 Works really well if you’re interested in
  looking at user’s one activity to recommend
  further.
Disadvantages

 Cannot work for a new user with no history.
 In a real world scenario where there are
  thousands of books and thousands of
  members, there are bound to be too many
  zeroes (a sparse matrix).
 Does not consider more than 1 item.
Another Try
 Our Books records might look like this:
BookId Title                     Genre         Writer               Language
1       The Great Gatsby         Classic       F Scott Fitzgerald   English
2       Nine Stories             Short Stories J D Salinger         English
3       The Sun Also Rises       Classic       Ernest Hemingway English
4       The Hunger Games         Action        Suzanne Collins      English
5       The Ambler Warning       Thriller      Robert Ludlum        English
6       The Catcher in the Rye   Classic       J D Salinger         English
7       To Kill a Mockingbird    Classic       Harper Lee           English
Create an Item Similarity
   Matrix
            Book 1     Book 2      Book 3     Book 4      Book 5     Book 6      Book 7
Book 1      3          1           2          1           1          2           2
Book 2      1          3           1          1           1          2           1
Book 3      2          1           3          1           1          2           2
Book 4      1          1           1          3           1          1           1
Book 5      1          1           1          1           3          1           1
Book 6      2          2           2          1           1          3           2
Book 7      2          1           2          1           1          2           3
• This would always be a square (n x n) matrix.
• Each cell has the count of similar attributes (excluding unique attributes).
• In general any measure for similarity can be used here.
To Recommend

 Look at what a user has previously read.
 Use the values from the similarity matrix and
  recommend books based on how similar it is
  to the book the user has already read.
Advantages

 Recommendations can be pre-computed for
  a very large Item base.
 Fast lookups can be built to perform
  recommendations.
 For example, if a user is seeing the page of
  Book 3, you may want to recommend them
  Books 1, 6 and 7.
 Would work for new/non-registered users.
Disadvantage

 Does not consider the user’s history.
 Instead looks at a collective trend.
Another Approach - The Users

 Our Users records might look like this:
 UserId     Gender    Age        Location
 1          Male      34         Pakistan
 2          Female    28         Pakistan
 3          Male      38         India
 4          Male      32         India
 5          Female    21         Pakistan
 6          Female    24         Pakistan
The User Borrowing
  UserId   BookId
  1        3
  1        7
  2        2
  3        1
  3        5
  3        7
  4        6
  4        7
  5        2
  6        4
  6        6
  6        7
Transforming User Borrowing
             User 1     User 2       User 3   User 4   User 5   User 6
   Book 1                            X
   Book 2               X                              X
   Book 3    X
   Book 4                                                       X
   Book 5                            X
   Book 6                                     X                 X
   Book 7    X                       X        X                 X


• Issue with too many zero values.
• Any solutions?
Transform the Users Records

 Consider Age as a discrete column with
  ranges like {0-10, 11-20, 21-30, 31-40, …} so
  that we can create some partitions like this:
  PartitionId   Gender   AgeGroup   Location
  1             Male     31-40      Pakistan
  2             Female   21-30      Pakistan
  3             Male     31-40      India
Recreate User Borrowing using
  Partition Information
 Lesser zero valued records (11/21 compared to
  30/42 previously)
 Much less columns than we previously had!
 The notation has been changed from ‘X’ to
  count.                  Partition 1 Partition 2 Partition 3
                         Book 1                      1
                         Book 2            2
                         Book 3   1
                         Book 4            1
                         Book 5                      1
                         Book 6            1         1
                         Book 7   1        1         2
To Recommend

 See what partition a user belongs to.
 Look at the column of that partition and sort
  the books in descending order based on their
  frequency count.
Advantages

 Continues to improve over time.
 More partitions can be added over time.
 Instead of using a collective scoring, the
  technique partitions the user base into
  ‘similar’ users.
 The technique can easily be extended on the
  item side and rather than having books as
  rows, we can have book clusters.
Disadvantages

 Needs some seed data to start.
 Requires some transformations.
 Can become very complex as the number of
  users/items grow.
Evaluating Performance
(Metrics)
 Almost any Information Retrieval metric can
  be used.
 Three interesting ones:
   Accuracy
   Coverage
   Normalized Distance Based Performance Measure
    (NDPM)
Accuracy
• Takes into account the order in which recommendations are
  shown to users and how they responded to them.
• For rank position = 1:
   • Acc(1) = # of Positive responses with rank less than or
      equal to 1 / total recommendations with rank less than or
      equal to 1
   • Therefore, Acc(1) = 1 / 3 = 33.33%
• Similarly, Acc(2) = 2 / 6 = 33.33%
                        UserId     BookId    Rank       Response
                        1          3         1          Yes
                        1          2         2          No
                        2          7         1          No
                        2          5         2          Yes
                        3          3         1          No
                        3          7         2          No
Coverage
 Shows the coverage of items that appear in the
  recommendations for all users.
 For rank position = 1:
   Cov(1) = Unique items in recommendations with rank less
    than or equal to 1 / total items.
   Therefore, Cov(1) = 2 / 7 = 28.57%
 Similarly, Cov(2) = 4 / 7 = 57.14%
                      UserId     BookId   Rank      Response
                      1          3        1         Yes
                      1          2        2         No
                      2          7        1         No
                      2          5        2         Yes
                      3          3        1         No
                      3          7        2         No
Normalized Distance Based Performance
    Measure (NDPM)
   Assesses the quality of the measure of recommendation system taking into account the
    ordering in which items are shown.
   NDPM = (C- + 0.5 x C+) / Cu
   C- - is the number of recommended item pairs where user responded as (No, Yes).
   C+ - is the number of recommended item pairs where user responded as (Yes, No).
   Cu - is the number of all item pairs where the user’s response was not same.
   In our example,
       C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%
       C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%
       NDPM = (0.75 + 0.5) / 2 = 62.5%
                                              UserId                 BookId       Rank   Response
                                              1                      3            1      Yes
                                              1                      2            2      No
                                              1                      7            3      No
                                              1                      5            4      Yes
                                              2                      3            1      Yes
                                              2                      7            2      No
How to improve results

 Ensure that you maintain a list of already
  seen recommendations for users and don’t
  recommend them back for some time.
 Provide some sort of mechanism to user to
  provide information about what they’re
  looking for.
 Infer the above from user searches.
Some standard algorithms
 Item Hierarchy
      You bought a printer, you will also need ink.
 Attribute-based recommendations
      You like reading classics, written by Salinger, you might like “Catcher in
       the Rye”.
 Collaborative Filtering – User-User Similarity
      People like you who read “The Hunger Games” also read “The Ambler
       Warning”.
 Collaborative Filtering – Item-Item Similarity
      You like “Catcher in the Rye” so you will like “Nine Stories”.
 Social + Interest Graph Based
      Your friends like “The Great Gatsby” so you will like “The Great Gatsby”
       too.
 Model Based
      Training SVM, LDA, SVD for implicit features.
Some Tools

 Apache Mahout (Java)


 Crab (Python)


 Easyrec (RESTful API)
Questions??
Thankyou!

            www.usman-sharif.com
                  @sharif_usman

More Related Content

Similar to Recommender Systems

NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
Anuj Gupta
 
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
IndicThreads
 
Memo Raft
Memo RaftMemo Raft
Memo Raft
Jennifer Spann
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
Tinderbook
Tinderbook  Tinderbook
Tinderbook
Enrico Palumbo
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for Targeting
Marcelo Salup
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1
bweldon
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Boundless
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information Matching
Jagadeesh Gorla
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test Format
BrightLink Prep
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visuals
vcuniversity
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
Unit 3
Unit 3Unit 3
Unit 3
Unit 3Unit 3
Rubric sample
Rubric sampleRubric sample
Rubric sample
Kelly Neal
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
Yusuke Yamamoto
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
Machine Learning Valencia
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
JonathanEarley3
 
Empowering Students Unit
Empowering Students UnitEmpowering Students Unit
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA Workshop
Amanda Stockwell
 

Similar to Recommender Systems (20)

NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
 
Memo Raft
Memo RaftMemo Raft
Memo Raft
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Tinderbook
Tinderbook  Tinderbook
Tinderbook
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for Targeting
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information Matching
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test Format
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visuals
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Rubric sample
Rubric sampleRubric sample
Rubric sample
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
 
Empowering Students Unit
Empowering Students UnitEmpowering Students Unit
Empowering Students Unit
 
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA Workshop
 

Recently uploaded

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 

Recently uploaded (20)

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 

Recommender Systems

  • 2. Why recommendation systems?  Provide a better experience to your users.  Understand the behavior and patterns of users.  Enables an opportunity to re-engage inactive users.  Boost sales  Better than a search feature
  • 3. How some companies are using Recommendation Systems - Amazon
  • 4. How some companies are using Recommendation Systems - Gmail
  • 5. A simple recommendation system  Consider the following scenario  A library has books and has members  Members can have books issued  The library wants to build a recommender system to recommend books to their members
  • 6. Scoring Matrices Book 1 Book 2 Book 3 Book 4 User 1 X X User 2 X User 3 X X User 4 X X X User 5 X X Book 1 Book 2 Book 3 Book 4 Book 1 4 1 2 1 Book 2 1 2 0 1 Book 3 2 0 2 1 Book 4 1 1 1 2
  • 7. Using the scoring matrices  If a user has read Book 1 recommend Book 3, 2, 4.  If a user has read Book 2 recommend Book 1, 4, 3.  If a user has read Book 3 recommend Book 1, 4, 2.  If a user has read Book 4 recommend Book 1, 2, 3.
  • 8. Advantages  Very simple to understand and implement.  Works really well if you’re interested in looking at user’s one activity to recommend further.
  • 9. Disadvantages  Cannot work for a new user with no history.  In a real world scenario where there are thousands of books and thousands of members, there are bound to be too many zeroes (a sparse matrix).  Does not consider more than 1 item.
  • 10. Another Try  Our Books records might look like this: BookId Title Genre Writer Language 1 The Great Gatsby Classic F Scott Fitzgerald English 2 Nine Stories Short Stories J D Salinger English 3 The Sun Also Rises Classic Ernest Hemingway English 4 The Hunger Games Action Suzanne Collins English 5 The Ambler Warning Thriller Robert Ludlum English 6 The Catcher in the Rye Classic J D Salinger English 7 To Kill a Mockingbird Classic Harper Lee English
  • 11. Create an Item Similarity Matrix Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7 Book 1 3 1 2 1 1 2 2 Book 2 1 3 1 1 1 2 1 Book 3 2 1 3 1 1 2 2 Book 4 1 1 1 3 1 1 1 Book 5 1 1 1 1 3 1 1 Book 6 2 2 2 1 1 3 2 Book 7 2 1 2 1 1 2 3 • This would always be a square (n x n) matrix. • Each cell has the count of similar attributes (excluding unique attributes). • In general any measure for similarity can be used here.
  • 12. To Recommend  Look at what a user has previously read.  Use the values from the similarity matrix and recommend books based on how similar it is to the book the user has already read.
  • 13. Advantages  Recommendations can be pre-computed for a very large Item base.  Fast lookups can be built to perform recommendations.  For example, if a user is seeing the page of Book 3, you may want to recommend them Books 1, 6 and 7.  Would work for new/non-registered users.
  • 14. Disadvantage  Does not consider the user’s history.  Instead looks at a collective trend.
  • 15. Another Approach - The Users  Our Users records might look like this: UserId Gender Age Location 1 Male 34 Pakistan 2 Female 28 Pakistan 3 Male 38 India 4 Male 32 India 5 Female 21 Pakistan 6 Female 24 Pakistan
  • 16. The User Borrowing UserId BookId 1 3 1 7 2 2 3 1 3 5 3 7 4 6 4 7 5 2 6 4 6 6 6 7
  • 17. Transforming User Borrowing User 1 User 2 User 3 User 4 User 5 User 6 Book 1 X Book 2 X X Book 3 X Book 4 X Book 5 X Book 6 X X Book 7 X X X X • Issue with too many zero values. • Any solutions?
  • 18. Transform the Users Records  Consider Age as a discrete column with ranges like {0-10, 11-20, 21-30, 31-40, …} so that we can create some partitions like this: PartitionId Gender AgeGroup Location 1 Male 31-40 Pakistan 2 Female 21-30 Pakistan 3 Male 31-40 India
  • 19. Recreate User Borrowing using Partition Information  Lesser zero valued records (11/21 compared to 30/42 previously)  Much less columns than we previously had!  The notation has been changed from ‘X’ to count. Partition 1 Partition 2 Partition 3 Book 1 1 Book 2 2 Book 3 1 Book 4 1 Book 5 1 Book 6 1 1 Book 7 1 1 2
  • 20. To Recommend  See what partition a user belongs to.  Look at the column of that partition and sort the books in descending order based on their frequency count.
  • 21. Advantages  Continues to improve over time.  More partitions can be added over time.  Instead of using a collective scoring, the technique partitions the user base into ‘similar’ users.  The technique can easily be extended on the item side and rather than having books as rows, we can have book clusters.
  • 22. Disadvantages  Needs some seed data to start.  Requires some transformations.  Can become very complex as the number of users/items grow.
  • 23. Evaluating Performance (Metrics)  Almost any Information Retrieval metric can be used.  Three interesting ones:  Accuracy  Coverage  Normalized Distance Based Performance Measure (NDPM)
  • 24. Accuracy • Takes into account the order in which recommendations are shown to users and how they responded to them. • For rank position = 1: • Acc(1) = # of Positive responses with rank less than or equal to 1 / total recommendations with rank less than or equal to 1 • Therefore, Acc(1) = 1 / 3 = 33.33% • Similarly, Acc(2) = 2 / 6 = 33.33% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 25. Coverage  Shows the coverage of items that appear in the recommendations for all users.  For rank position = 1:  Cov(1) = Unique items in recommendations with rank less than or equal to 1 / total items.  Therefore, Cov(1) = 2 / 7 = 28.57%  Similarly, Cov(2) = 4 / 7 = 57.14% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 26. Normalized Distance Based Performance Measure (NDPM)  Assesses the quality of the measure of recommendation system taking into account the ordering in which items are shown.  NDPM = (C- + 0.5 x C+) / Cu  C- - is the number of recommended item pairs where user responded as (No, Yes).  C+ - is the number of recommended item pairs where user responded as (Yes, No).  Cu - is the number of all item pairs where the user’s response was not same.  In our example,  C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%  C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%  NDPM = (0.75 + 0.5) / 2 = 62.5% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 1 7 3 No 1 5 4 Yes 2 3 1 Yes 2 7 2 No
  • 27. How to improve results  Ensure that you maintain a list of already seen recommendations for users and don’t recommend them back for some time.  Provide some sort of mechanism to user to provide information about what they’re looking for.  Infer the above from user searches.
  • 28. Some standard algorithms  Item Hierarchy  You bought a printer, you will also need ink.  Attribute-based recommendations  You like reading classics, written by Salinger, you might like “Catcher in the Rye”.  Collaborative Filtering – User-User Similarity  People like you who read “The Hunger Games” also read “The Ambler Warning”.  Collaborative Filtering – Item-Item Similarity  You like “Catcher in the Rye” so you will like “Nine Stories”.  Social + Interest Graph Based  Your friends like “The Great Gatsby” so you will like “The Great Gatsby” too.  Model Based  Training SVM, LDA, SVD for implicit features.
  • 29. Some Tools  Apache Mahout (Java)  Crab (Python)  Easyrec (RESTful API)
  • 31. Thankyou! www.usman-sharif.com @sharif_usman