SlideShare a Scribd company logo
People who liked this talk also liked …
Building Recommendation Systems
             Using Ruby

            Ryan Weald, @rweald
             LA RubyConf 2013




                                          1
Who is this guy?

 What does he know
about recommendation
       systems?

                       2
Data Scientist @Sharethrough




 Native advertising
     platform
                               3
4
Outline
1) What is a recommendation system?
2) Collaborative filtering based
   recommendations
3) Content based recommendations
4) Hybrid systems - the best of both worlds
5) Evaluating your recommendation system
6) Resources & existing libraries


                                              5
What this Talk is Not
• Everything there is to know about
  recommendation systems.
• Bleeding edge machine learning
• How to use a specific library




                                      6
What is a
recommendation system?



                         7
A program that predicts
a user’s preferences using information
 about the user, other users, and the
         items in your system.




                                         8
LinkedIn




           9
Netflix




         10
Spotify




          11
Amazon




         12
How do I build
recommendations?



                   13
Two Main Categories of Algorithm



1. Collaborative Filtering (CF)

2. Content Based - Classification




                                   14
Collaborative Filtering


Fill in missing user preferences using
         similar users or items




                                         15
Two Types of CF
1. Memory Based - Uses similarity
between users or items. Dataset
usually kept in memory

2. Model Based - Model generated
to “explain” observed ratings


                                    16
User Based CF


 (User x Item) Matrix + Similarity
Function = Top-K most similar users




                                      17
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               18
Similarity Functions

• Pearson Correlation Coefficient
• Cosine Similarity




                                   19
Pearson Correlation Coefficient




                                 20
Calculating PCC




                  21
Calculating PCC




                  22
Calculating PCC




                  23
Calculating PCC




                  24
Calculating PCC




                  25
Calculating PCC




                  26
27
Using similarity to
recommend items



                      28
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               29
30
Problems With CF

• Cold Start
• Data Sparsity
• Resource expensive



                        31
Doesn’t the video
content matter for
recommendations?


                     32
Content Based Recommendations


  Classify items based on features of
   the item. Pick other items from
      same class to recommend.




                                        33
Content Based Algorithms
• K-means clustering
• Random Forrest
• Support Vector Machines
• ...
• Insert your favorite ML algorithm

                                      34
Content Based Algorithms
          Type of    Duration   Maturity
          content                Rating
Video 1   comedy        60         G

Video 2    action      120         G

Video 3   comedy        34      PG-13

Video 4   romantic      15         R

Video 5    sports      120         G




                                           35
K-means Clustering


  Group items into K clusters.
Assign new item to a cluster and
  pick items from that cluster




                                   36
K-means Clustering




                     37
Problems With Content Based
      Recommendations

• Unsupervised Learning is hard
• Training data limited or expensive
• Doesn’t take user into account
• Limited by features of content

                                       38
Hybrid Recommendations


Combine collaborative filtering with
content based algorithm to achieve
          greater results




                                      39
Hybrid Recommendations

Input
           CF Based
         Recommender

                         Combiner   Reco


Input
         Content Based
         Recommender




                                           40
Hybrid Recommendations




                         41
Hybrid Recommendations



            Content         CF
Input                                 Reco
          Recommender   Recommender




                                             42
Hybrid Recommendations


            CF
        Recommender
Input                        Reco
          Content
        Recommender




                                    43
Evaluating Recommendation Quality


• Precision vs. Recall
• Clicks
• Click through rate
• Direct user feedback


                                    44
Precision vs. Recall




                       45
Precision vs. Recall




                       46
Summary of What We’ve Learned


 • Collaborative Filtering using similar users
 • Content clustering using k-means
 • Combining 2 algorithms to boost quality
 • How to evaluate your recommender


                                                 47
Don’t Reinvent the Wheel

• Apache Mahout
• JRuby mahout gem
• SciRuby
• Recommenderlab for R


                             48
Resources & Further Reading
• Recommender Systems: An Introduction
• Linden, Greg, Brent Smith, and Jeremy York.
"Amazon. com recommendations: Item-to-item
collaborative filtering."
• Resnick, Paul, et al. "GroupLens: an open architecture
for collaborative filtering of netnews."
• ACM RecSys Conference Proceedings


                                                           49
We’re Hiring
http://bit.ly/str-engineering




                                50
Thanks!
        Twitter: @rweald
Email: ryan@sharethrough.com




                               51

More Related Content

Similar to People who liked this talk also liked … Building Recommendation Systems Using Ruby

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Eric Schwartzman
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
guest3b9e35d
 
Code review in practice
Code review in practiceCode review in practice
Code review in practice
Edorian
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Atlassian
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modulesneilbowers
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit Testing
Shaun Abram
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
El Mahdi Benzekri
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great
Chris Sietsema
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
Nikolaos Tsantalis
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesJose Santos
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually Want
Cameron Presley
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationGiannis Tsakonas
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysis
meetcontent
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
Lorna Mitchell
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
Steven Hoober
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
Aravindharamanan S
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
Valerie Puffet-Michel
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
Lili Wu
 

Similar to People who liked this talk also liked … Building Recommendation Systems Using Ruby (20)

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
 
Code review in practice
Code review in practiceCode review in practice
Code review in practice
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modules
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit Testing
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprises
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually Want
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysis
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

People who liked this talk also liked … Building Recommendation Systems Using Ruby