SlideShare a Scribd company logo
1 of 18
Recommender Systems
Learning to build a simple Book recommendation system
By
R Venkat Raman
WE GET RECOMMENDATIONS ALL THE TIME
R Venkat Raman
WE GET RECOMMENDATIONS ALL THE TIME
R Venkat Raman
TYPES OF RECOMMENDER SYSTEMS
R Venkat Raman
Content Based: This technique is all about
recommending items to the user that are similar in
characteristics to the original item liked by the user
Collaborative Filtering: This technique is based
on the idea that similar users possibly share the
same interest and thereby like similar items
RECOMMENDER ENGINES ARE POWERFUL
THEY HELP IN CROSS SELLING AND UPSELLING OF PRODUCTS
R Venkat Raman
RECOMMENDER SYSTEMS ARE NOT PERFECT
THERE ARE MINOR PROBLEMS LIKE COLD START AND IRRELEVANT RECOMMENDATIONS. THERE ARE
SOLUTIONS TO OVERCOME THESE PROBLEMS
R Venkat Raman
KEY CONCEPTS TO UNDERSTAND : VECTORS
R Venkat Raman
Vectors: Computers speak and understand the language of numbers (technically only binary i.e. 0’s and 1’s).
They do not understand words. Here starts the biggest problem for NLP !!
A vector can be thought of as a line from the origin of the vector space with a direction and a magnitude.
Alternatively it can also be thought of as a point or coordinate in n – dimensional space.
Vectors are normally also represented as collection of numbers e.g. [2,3]
The fundamental idea in NLP is to convert the texts or words into a vector and represent in a vector space model.
This idea is so beautiful and in an essence this very idea of vectors is what is making the rapid strides in NLP ,
Machine learning and AI possible.
In fact Geoffrey Hinton (“Father of Deep Learning”) in a MIT technology review article acknowledged that the AI
institute at Toronto has been named “Vector Institute” owing to the beautiful properties of vectors that has helped
them in the field of Deep Learning and other variants of Neural nets.
KEY CONCEPTS TO UNDERSTAND : TF –IDF (1/4)
R Venkat Raman
TF - IDF Stands for Term Frequency and Inverse Document Frequency .TF-IDF helps in evaluating importance of a term (word) in a document.
TF – Term Frequency
In order to ascertain how frequent the term/word appears in the document and also to represent the document in vector form, let’s break it down
to following steps.
Step 1: Create a dictionary of words (also known as bag of words) present in the whole document space. We ignore some common words also
called as stop words e.g. the, of, a, an, is etc, since these words are pretty common and it will not help us in our goal of choosing important
words
In this current example I have used the file ‘test1.csv’ which contains titles of 50 books. But to drive home the point, just consider 3 book titles
(documents) to be making up the whole document space. So B1 is one document, B2 and B3 are other documents. Together B1, B2, B3 make
up the document space.
KEY CONCEPTS TO UNDERSTAND : TF –IDF (2/4)
R Venkat Raman
B1 — Recommender Systems
B2 — The Elements of Statistical Learning
B3 — Recommender Systems — Advanced
Now creating an index of these words (stop words ignored)
1. Recommender 2. Systems 3 Elements 4. Statistical 5.Learning 6. Advanced
Step 2: Forming the vector
KEY CONCEPTS TO UNDERSTAND : TF –IDF (3/4)
R Venkat Raman
The Term Frequency helps us identify how many times the term or word appears in a document but there is also an inherent problem, TF gives
more importance to words/ terms occurring frequently while ignoring the importance of rare words/terms.
This is not an ideal situation as rare words contain more importance or signal. This problem is resolved by IDF.
Sometimes a word / term might occur more frequently in longer documents than shorter ones; hence Term Frequency normalization is carried
out.
TFn = (Number of times term t appears in a document) / (Total number of terms in the document), where n represents normalized.
IDF (Inverse Document Frequency)
KEY CONCEPTS TO UNDERSTAND : TF –IDF (4/4)
R Venkat Raman
Basically a simple definition would be: IDF = ln (Total number of documents / Number of documents with term t in it)
Now let’s take an example from our own dictionary or bag of words and calculate the IDFs
We had 6 terms or words which are as follows
1. Recommender 2. Systems 3 Elements 4. Statistical 5.Learning 6. Advanced
and our documents were :
B1 — Recommender Systems
B2 — The Elements of Statistical Learning
B3 — Recommender Systems — Advanced
Now IDF (w1) = log 3/2; IDF(w2) = log 3/2; IDF (w3) = log 3/1; IDF (W4) = log 3/1; IDF (W5) = log 3/1; IDF(w6) = log 3/1
We then again get a vector as follows:
= (0.4054, 0.4054, 1.0986, 1.0986, 1.0986, 1.0986)
Now the final step would be to get the TF-IDF weight. The TF vector and IDF vector are converted into a matrix.
Then TF-IDF weight is represented as: TF-IDF Weight = TF (t,d) * IDF(t,D)
KEY CONCEPTS TO UNDERSTAND : COSINE SIMILARITY
R Venkat Raman
Cosine similarity is a measure of similarity between two non zero vectors. It is basically a measure of orientation and not magnitude. It is got by
basically taking the dot product of the two vectors.
The dot product is given by the formula
If you are wondering why the cos angle comes into picture, adjacent diagram provides the intuition
One of the beautiful thing about vector representation is we can now see how closely related two sentence are based on what angles their respective
vectors make. Cosine value ranges from -1 to 1.
So if two vectors make an angle 0, then cosine value would be 1, which in turn would mean that the sentences are closely related to each other.
If the two vectors are orthogonal, i.e. cos 90 then it would mean that the sentences are almost unrelated.
R Venkat Raman
Lets get to the Code
CODE WALKTHROUGH
R Venkat Raman
• First we download standard packages like Pandas, numpy, sklearn
• We then read the csv file which contains the ‘Book Title’
• From the sklearn package, we import TFidfVectorizer.
• The TFidfVectorizer helps creating the TF-IDF scores. We apply this on the column ‘Book Title’ to generate the TF-IDF scores
CODE WALKTHROUGH
R Venkat Raman
Next, we calculate the cosine similarities between each document (read Book Titles) .
We then store the corresponding ID s in descending order of cosine similarity score
CODE WALKTHROUGH
R Venkat Raman
Next, we define a function ‘item’ to get the Book Title for the corresponding ID.
Finally through the function ‘recommend’, we recommend similar books once the user gives the arguments (ID, number of books to be recommended
CODE AND FILES
R Venkat Raman
List of Books : https://gist.github.com/venkarafa/64df1ee21ae8d62bafe64a96f9bff881
Code : https://gist.github.com/venkarafa/0da815727f1ee098b201c371b60b2d72
RESOURCES
R Venkat Raman
http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
https://hbr.org/2018/07/to-see-the-future-of-competition-look-at-netflix
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://kavita-ganesan.com/tfidftransformer-tfidfvectorizer-usage-differences/#.XXILJygzbIU
https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/dot-and-cross-product-comparison-intuition

More Related Content

What's hot

6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
RIILP
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Lifeng (Aaron) Han
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
RIILP
 

What's hot (20)

Nlp
NlpNlp
Nlp
 
Csc410 presentation
Csc410 presentationCsc410 presentation
Csc410 presentation
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
Primitive data types
Primitive data typesPrimitive data types
Primitive data types
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Data types vbnet
Data types vbnetData types vbnet
Data types vbnet
 
Decision tables
Decision tablesDecision tables
Decision tables
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Cs6503 theory of computation book notes
Cs6503 theory of computation book notesCs6503 theory of computation book notes
Cs6503 theory of computation book notes
 
Tries
TriesTries
Tries
 
Data types in C
Data types in CData types in C
Data types in C
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
[ppt]
[ppt][ppt]
[ppt]
 
Theory of Computer Science - Post Correspondence Problem
Theory of Computer Science - Post Correspondence ProblemTheory of Computer Science - Post Correspondence Problem
Theory of Computer Science - Post Correspondence Problem
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MT
 
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm DetectorIRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 

Similar to Recommender systems

Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
nilesh405711
 

Similar to Recommender systems (20)

A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Search pitb
Search pitbSearch pitb
Search pitb
 
Document similarity
Document similarityDocument similarity
Document similarity
 
NLP todo
NLP todoNLP todo
NLP todo
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Ir 03
Ir   03Ir   03
Ir 03
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
Aman chaudhary
 Aman chaudhary Aman chaudhary
Aman chaudhary
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
Knowledge based System
Knowledge based SystemKnowledge based System
Knowledge based System
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
3517 10753-1-pb
3517 10753-1-pb3517 10753-1-pb
3517 10753-1-pb
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
word level analysis
word level analysis word level analysis
word level analysis
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 

Recently uploaded

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Recommender systems

  • 1. Recommender Systems Learning to build a simple Book recommendation system By R Venkat Raman
  • 2. WE GET RECOMMENDATIONS ALL THE TIME R Venkat Raman
  • 3. WE GET RECOMMENDATIONS ALL THE TIME R Venkat Raman
  • 4. TYPES OF RECOMMENDER SYSTEMS R Venkat Raman Content Based: This technique is all about recommending items to the user that are similar in characteristics to the original item liked by the user Collaborative Filtering: This technique is based on the idea that similar users possibly share the same interest and thereby like similar items
  • 5. RECOMMENDER ENGINES ARE POWERFUL THEY HELP IN CROSS SELLING AND UPSELLING OF PRODUCTS R Venkat Raman
  • 6. RECOMMENDER SYSTEMS ARE NOT PERFECT THERE ARE MINOR PROBLEMS LIKE COLD START AND IRRELEVANT RECOMMENDATIONS. THERE ARE SOLUTIONS TO OVERCOME THESE PROBLEMS R Venkat Raman
  • 7. KEY CONCEPTS TO UNDERSTAND : VECTORS R Venkat Raman Vectors: Computers speak and understand the language of numbers (technically only binary i.e. 0’s and 1’s). They do not understand words. Here starts the biggest problem for NLP !! A vector can be thought of as a line from the origin of the vector space with a direction and a magnitude. Alternatively it can also be thought of as a point or coordinate in n – dimensional space. Vectors are normally also represented as collection of numbers e.g. [2,3] The fundamental idea in NLP is to convert the texts or words into a vector and represent in a vector space model. This idea is so beautiful and in an essence this very idea of vectors is what is making the rapid strides in NLP , Machine learning and AI possible. In fact Geoffrey Hinton (“Father of Deep Learning”) in a MIT technology review article acknowledged that the AI institute at Toronto has been named “Vector Institute” owing to the beautiful properties of vectors that has helped them in the field of Deep Learning and other variants of Neural nets.
  • 8. KEY CONCEPTS TO UNDERSTAND : TF –IDF (1/4) R Venkat Raman TF - IDF Stands for Term Frequency and Inverse Document Frequency .TF-IDF helps in evaluating importance of a term (word) in a document. TF – Term Frequency In order to ascertain how frequent the term/word appears in the document and also to represent the document in vector form, let’s break it down to following steps. Step 1: Create a dictionary of words (also known as bag of words) present in the whole document space. We ignore some common words also called as stop words e.g. the, of, a, an, is etc, since these words are pretty common and it will not help us in our goal of choosing important words In this current example I have used the file ‘test1.csv’ which contains titles of 50 books. But to drive home the point, just consider 3 book titles (documents) to be making up the whole document space. So B1 is one document, B2 and B3 are other documents. Together B1, B2, B3 make up the document space.
  • 9. KEY CONCEPTS TO UNDERSTAND : TF –IDF (2/4) R Venkat Raman B1 — Recommender Systems B2 — The Elements of Statistical Learning B3 — Recommender Systems — Advanced Now creating an index of these words (stop words ignored) 1. Recommender 2. Systems 3 Elements 4. Statistical 5.Learning 6. Advanced Step 2: Forming the vector
  • 10. KEY CONCEPTS TO UNDERSTAND : TF –IDF (3/4) R Venkat Raman The Term Frequency helps us identify how many times the term or word appears in a document but there is also an inherent problem, TF gives more importance to words/ terms occurring frequently while ignoring the importance of rare words/terms. This is not an ideal situation as rare words contain more importance or signal. This problem is resolved by IDF. Sometimes a word / term might occur more frequently in longer documents than shorter ones; hence Term Frequency normalization is carried out. TFn = (Number of times term t appears in a document) / (Total number of terms in the document), where n represents normalized. IDF (Inverse Document Frequency)
  • 11. KEY CONCEPTS TO UNDERSTAND : TF –IDF (4/4) R Venkat Raman Basically a simple definition would be: IDF = ln (Total number of documents / Number of documents with term t in it) Now let’s take an example from our own dictionary or bag of words and calculate the IDFs We had 6 terms or words which are as follows 1. Recommender 2. Systems 3 Elements 4. Statistical 5.Learning 6. Advanced and our documents were : B1 — Recommender Systems B2 — The Elements of Statistical Learning B3 — Recommender Systems — Advanced Now IDF (w1) = log 3/2; IDF(w2) = log 3/2; IDF (w3) = log 3/1; IDF (W4) = log 3/1; IDF (W5) = log 3/1; IDF(w6) = log 3/1 We then again get a vector as follows: = (0.4054, 0.4054, 1.0986, 1.0986, 1.0986, 1.0986) Now the final step would be to get the TF-IDF weight. The TF vector and IDF vector are converted into a matrix. Then TF-IDF weight is represented as: TF-IDF Weight = TF (t,d) * IDF(t,D)
  • 12. KEY CONCEPTS TO UNDERSTAND : COSINE SIMILARITY R Venkat Raman Cosine similarity is a measure of similarity between two non zero vectors. It is basically a measure of orientation and not magnitude. It is got by basically taking the dot product of the two vectors. The dot product is given by the formula If you are wondering why the cos angle comes into picture, adjacent diagram provides the intuition One of the beautiful thing about vector representation is we can now see how closely related two sentence are based on what angles their respective vectors make. Cosine value ranges from -1 to 1. So if two vectors make an angle 0, then cosine value would be 1, which in turn would mean that the sentences are closely related to each other. If the two vectors are orthogonal, i.e. cos 90 then it would mean that the sentences are almost unrelated.
  • 13. R Venkat Raman Lets get to the Code
  • 14. CODE WALKTHROUGH R Venkat Raman • First we download standard packages like Pandas, numpy, sklearn • We then read the csv file which contains the ‘Book Title’ • From the sklearn package, we import TFidfVectorizer. • The TFidfVectorizer helps creating the TF-IDF scores. We apply this on the column ‘Book Title’ to generate the TF-IDF scores
  • 15. CODE WALKTHROUGH R Venkat Raman Next, we calculate the cosine similarities between each document (read Book Titles) . We then store the corresponding ID s in descending order of cosine similarity score
  • 16. CODE WALKTHROUGH R Venkat Raman Next, we define a function ‘item’ to get the Book Title for the corresponding ID. Finally through the function ‘recommend’, we recommend similar books once the user gives the arguments (ID, number of books to be recommended
  • 17. CODE AND FILES R Venkat Raman List of Books : https://gist.github.com/venkarafa/64df1ee21ae8d62bafe64a96f9bff881 Code : https://gist.github.com/venkarafa/0da815727f1ee098b201c371b60b2d72