What should I read next?
A Book Recommendation Engine
Based on GoodReads Ratings
Team 10:
Shravani Bheema,Coco Huang,Sharon Heber,Chen Zhou,Mohit Gupta
About the dataset
Goodreads is an American social cataloging website and a
subsidiary of Amazon that allows individuals to search its
database of books,annotations,quotes,and reviews.
With a Goodreads account,you can keep track of the books
you've read,the books you're reading,and the books you want
to read.
You can also follow friends and authors to see what they're
reading,leave reviews,and comment on reviews written by
others.
About the dataset
Three separate files:
● Users: Contains basic information regarding the reader.
○ UserID,Location,Age
● Books: Contains basic information regarding the books.
○ ISBN,Title,Author,Year of publication,Publisher
● Rating: Contains all of the user rating information.
○ UserID,ISBN,Rating(1-10) 10 being the highest
About the dataset
1 million ratings
300k users
300k books
16k publishers
Name of the book
Author
Publisher
Published Year
ISBN
Rating
User ID
User Age
User Location
Each row is the rating given by a particular user for a specific book.
Filtered books with less than 50 ratings
and users who have rated less than 20 books
for the recommender system.
200k ratings
6k users
5k books
600 publishers
Project Goal
Explore and compare different approaches to recommending the
most relevant books to users based on their interactions
(ratings) with other books in the past or based on other similar
users’interactions with books.
Preparing Data (Data clean up)
● Data was provided in 3 separate files,and had to be merged based on ISBN and UserID before starting analysis.
● Many null values in the Age column of users (about ⅓ of users).
● Data contained random errors.For example,some publisher names were entered into Year of Publication column,
and there were also 0s entered for Year of Publication.
● Location information was written as a long list of strings.Was broken up into different columns containing City,
State,and Country information for cleaner analysis.
● Each file included lot of duplicated rows.All duplications were removed before starting.
Exploratory Data Analysis
The most popular book is
not the most well-liked book
Exploratory Data Analysis
Exploratory Data Analysis
Top 25 most popular author shows that not all
popular author are well liked.Most authors who have
sold lots of book have very low rating for those books.
For example,Rich Shapero’s book have a very low
rating due to the fact that his style are loved by some
but hated by many.
Exploratory Data Analysis
Average rating of a book
per user is 7.62
Exploratory Data Analysis
Exploratory Data Analysis
62% of users are
between the ages
18 -40
Recommendation systems
● A recommender system is a subclass of
information filtering system that seeks to
predict the “preference”a user would give to an
item.
What is a good recommendation?
● The one that is personalized (relevant to that
user)
● The one that is diverse (includes different user
interests)
● The one that doesn’t recommend the same
items to users for the second time
Collaborative Filtering
Collaborative filtering (CF) is a method for generating recommendations by calculating preference scores of a user for
an item using the historical preference for that item from other similar users in the database.This algorithm takes
account of the explicit interaction with the item irrespective of the attributes of the item,so is domain agnostic as
long as we have sufficient historical interaction data.
Two types of collaborative filtering (CF):
1. Memory based collaborative filtering: uses user rating data to compute the similarities between users or items.
This technique relies heavily on simple similarity measures,such as cosine similarity,to match similar people
or items together.
2. Model based collaborative filtering: models are developed using different data mining,machine learning
algorithms to predict users’rating of unrated items.Popular techniques include Bayesian networks and single
value composition.
Item Based Collaborative Filtering
“Since you liked this,you may also like..”
Item-item collaborative filtering,is a form of collaborative filtering for
recommender systems based on the similarity between items calculated
using people's ratings of those items.In this method similar items build
neighbourhoods on the behaviour of users.
What makes 2 items or books similar?
If User_A likes 3 books (or rates them highly),these 3 books are
considered similar.This process is iterated across thousands of books and
users.
This method is not based on the features/contents of the items.Similarity
scores of items with other items is calculated using a similarity score
measure like Euclidean distance/Pearson Correlation/Cosine Similarity.
The Sparse Matrix
A sparse matrix or sparse array is a matrix in which
most of the elements are zero.
There is no strict definition regarding the proportion
of zero-value elements for a matrix to qualify as
sparse,but a common criterion is that the number of
non-zero elements is roughly equal to the number of
rows or columns.
The sparsity of our
matrix is 98.3%
On average,a user has only read
28 books among 6k available.
Cosine Similarity: Evaluating Closeness of 2 Items or Users
Cosine similarity is a measure of similarity between two
sequences of numbers.The sequences here,are viewed
as vectors.
The cosine similarity is defined as the cosine of the
angle between them,that is,the dot product of the
vectors divided by the product of their absolute
lengths.
In our case,we can view the vectors in 2 ways.Each
book is a vector of n dimensions,where n is the number
of users OR Each user is a vector of n dimensions,
where n is the number of books.
User_85 User_2245 User_1134 User_92
The Da Vinci Code 10 7
A Walk to Remember 9 5 5
Angels & Demons 5 7
Life of Pi 8
The Alchemist 7 6
The Hobbit 10
Harry Potter & The Goblet
of Fire 9 8 7
Cosine Similarity: Evaluating Closeness of 2 Items
The output is an array of similarity scores for
each book with every other book
Similarity Score Distributions for Sample Items
Pride and Prejudice
The Da Vinci Code The Hobbit
k nearest neighbors
The k-nearest neighbors (KNN) algorithm is a data classification
method for estimating the likelihood that a data point will
become a member of one group or another based on what group
the data points nearest to it belong to.
The principle behind nearest neighbor methods is to find a
predefined number of samples closest in distance to the new
point,and predict the label from these.
The number of samples (k) can be a user-defined constant
(k-nearest neighbor learning), or vary based on the local density
of points (radius-based neighbor learning).
Working Principle of KNN
● Choose the K value
● Calculate the distance between all the training points and new data points.
● Sort the computed distance in ascending order between training points and
new data points.
● Choose the first K distances from the sorted list.
● Take the mode/mean of the classes associated with the distances.
For classification, compute mode else for regression problem compute mean with the
distances.
Centered Cosine Similarity: Penalizing Opposite Ratings
Instead of treating missing ratings as zero
ratings,this method treats them as average
ratings (since the mean of each row is zero).
It scales strict raters and liberal raters.
Also known as the Pearson Correlation.
Normalizing ratings by subtracting the row mean.
Each book rating is now centred around 0,positive
ratings indicate that the book was liked more than
average by the user,and negative implies that it
was below average when taking their own
personal rating system into consideration.
The Da Vinci
Code
A Walk to
Remember
Angels &
Demons Life of Pi The Alchemist The Hobbit
Harry Potter &
The Goblet of
Fire
User_85 2 -2 0
User_2245 2.25 -2.75 -0.75 1.25
User_1134 -2 2 0
User_92 0.25 -1.75 1.25 0.25
Pearson Correlation Matrix
Item Based Collaborative Filtering
Item Based Collaborative Filtering
Item Based Collaborative Filtering
Item-Based Filtering Pros and Cons
Pros
- The item-based method provides more
consistent recommendation results compared to
others,because there is high consistency among
similarities between books compared to that
between users.
- It can be used to recommend books for new
users and those with limited rating history.
Cons
- Item-based methods might sometimes
recommend obvious items or items that are
not novel from previous user experiences.
User-Based
Collaborative Filtering
“Users similar to you also liked..”
A technique used to predict the items
that a user might like based on ratings
given to that item by the other users who
have similar tastes.
Steps for User-Based Filtering -I
1.Filter out users with fewer than 50 ratings.
2. Create user-items matrix (pivot table).
Steps for User-Based Filtering -II
3.Calculate similarity scores for each pair of users.
4. Create a function to retrieve top three book choices from similar users.
Steps for User-Based Filtering -III
5.Use a predefined function to identify the ones with the highest similarity scores to the target user and
return their top book choices.
Steps for User-Based Filtering -IV
5a.Alternatively,we can use KNN to identify the ones with the highest similarity scores and their
“distances”to the target user.
Similarity Score Distributions for Sample Users
User 187517with 631 ratings
User 141902 with 200 ratings User 153662 with 5814 ratings
Pearson Correlation Matrix
User-Based Filtering Results Exhibition -I
User-Based Filtering Results Exhibition -II
User-Based Filtering Results Exhibition -III
User-Based Filtering Results Exhibition -IV
User-Based Filtering Pros and Cons
Pros
- The performance of the recommendations
will keep improving as the size of the
neighborhood grows.
- It requires only user ratings to make
recommendations,which is independent from
user demographic features.
- It tends to generate more diverse results
because users have varied tastes.
Cons
- Only a small percentage of users on
Goodreads provided rating scores.
- We have very limited information on new
users to calculate similarity scores.
- The computation of user neighborhoods
needs to be performed more frequently with
the addition of new users.
Item-Based VS.User-Based Methods
- In theory,use-user and item-item are dual approaches with similar expected performance.In
practice,item-based outperforms user-based in many cases.
- Users have changing tastes,while two items would always remain similar.Users have varied
tastes,while items belong to a small set of “genres”.
- Incremental maintenance of the recommendation model is more challenging in the case of
user-based methods compared to item-based methods.
Content Based Filtering
Content-based filtering uses item features to recommend
other items similar to what the user likes,based on their
previous actions or explicit feedback (in this case,the
rating).
Features used to develop a content based model include
author,publisher In contrast collaborative filtering,does
not take item attributes/features into consideration.This
is done by representing a profile vector of the user in the
same dimensions as the item attribute vector and
calculating the weights based on users’historical
interaction with the items.
Steps for Content-Based Method
1. Filter out books with fewer than 200 ratings.
2. Use predefined function to identify items with similar author,publisher and publishing year to
generate recommendation results.
Content-Based Recommendation Results -I
recommends
Content-Based Recommendation Results -II
recommends
Content-Based Recommendation Results -III
recommends
Results -IV
recommends
Content-Based Filtering Pros and Cons
Pros
- We don’t need a long rating history of the
user to make the recommendation,nor do we
need any demographic info.
- It can capture the niche interests of a user.
Cons
- This technique requires a lot of domain
knowledge.The model can only be as good as
the hand-engineered features.In our case,for
example,content-based filtering wouldn’t be
as good as the other two techniques,because
we have very limited information on book
features.
K-means Clustering
Cluster 1: Mostly international users,would recommend books that are popular worldwide,rather than just USA.
Cluster 3: Users who tend to read books that are the most popular (most rated),even if they may not be highly rated (AvgBook.Rating = 3.2).
Would recommend most popular books regardless of ratings.
K-means Clustering
Cluster 2: Users who are interested only in books that are highly rated,regardless of their popularity.Would recommend other highly rated
books to these users.
Cluster 5: Users who tend to read books that are published by the Big 5 Publishing Houses: Penguin Books,HarperCollins,Hachette Livre,
Macmillan,and Simon & Schuster.Would recommend any book published under these houses.
K-means Clustering
● There is some interpretability:
○ If user falls into high AvgBook.Rating cluster,can recommend them only books that received high ratings,etc.
● Can be a first step,but recommendation system should be improved through other methodologies.
○ Collaborative based filtering
○ Gather more feature data for each book such as genre,themes,etc.and cluster again.
Business Value
To recommend the most relevant books to users based on their interactions (ratings) with
other books in the past or based on other similar users’interactions with books.
How Our Recommendation System Can Drive Business Value
● Help Goodreads provide an improved customer experience and gain a competitive advantage.This will
improve customer retention rate and in turn acquisition costs.
● Drive customer engagement with the Goodreads website through new recommendation engine.
● Increase product awareness by helping all types of books reach new customers -strategic value of Amazon
owning Goodreads.
● Improve the product design process of Goodreads website.Recommendation systems can help Goodreads
make design decisions by surfacing the most relevant products to any given user.
Future Improvements
● The cold-start problem: Collaborative filtering systems are based on the action of available data from similar users. If you are
building a brand new recommendation system, you would have no user data to start with. You can use content-based filtering
first and then move on to the collaborative filtering approach.
● Input data may not always be accurate because hall ratings are self reports.User behavior is more important than ratings.
● A strong recommendation engine will be able to identify changes (or signs of an impending changes) in customers’ preferences
and behavior,and constantly auto-train themselves in real time in order to serve relevant recommendations.
Future Improvements
Content Boosted Collaborative
Filtering for recommender systems:
CBCF is a type of hybrid
recommendation technique that uses
a combination of content-based
filtering and collaborative filtering.
Its main idea is to overcome the
sparsity problem that degrades the
performance of collaborative
filtering algorithms by using item
content to make the user-item
interaction matrix dense.
Future Improvements
Single Value Decomposition: SVD is
a matrix factorization technique
that is usually used to reduce the
number of features of a data set by
reducing space dimensions from N to
K where K < N.The matrix
factorization is done on the
user-item ratings matrix.From a high
level,matrix factorization can be
thought of as finding 2 matrices
whose product is the original matrix.

Book Recommendation Engine

  • 1.
    What should Iread next? A Book Recommendation Engine Based on GoodReads Ratings Team 10: Shravani Bheema,Coco Huang,Sharon Heber,Chen Zhou,Mohit Gupta
  • 2.
    About the dataset Goodreadsis an American social cataloging website and a subsidiary of Amazon that allows individuals to search its database of books,annotations,quotes,and reviews. With a Goodreads account,you can keep track of the books you've read,the books you're reading,and the books you want to read. You can also follow friends and authors to see what they're reading,leave reviews,and comment on reviews written by others.
  • 3.
    About the dataset Threeseparate files: ● Users: Contains basic information regarding the reader. ○ UserID,Location,Age ● Books: Contains basic information regarding the books. ○ ISBN,Title,Author,Year of publication,Publisher ● Rating: Contains all of the user rating information. ○ UserID,ISBN,Rating(1-10) 10 being the highest
  • 4.
    About the dataset 1million ratings 300k users 300k books 16k publishers Name of the book Author Publisher Published Year ISBN Rating User ID User Age User Location Each row is the rating given by a particular user for a specific book. Filtered books with less than 50 ratings and users who have rated less than 20 books for the recommender system. 200k ratings 6k users 5k books 600 publishers
  • 5.
    Project Goal Explore andcompare different approaches to recommending the most relevant books to users based on their interactions (ratings) with other books in the past or based on other similar users’interactions with books.
  • 6.
    Preparing Data (Dataclean up) ● Data was provided in 3 separate files,and had to be merged based on ISBN and UserID before starting analysis. ● Many null values in the Age column of users (about ⅓ of users). ● Data contained random errors.For example,some publisher names were entered into Year of Publication column, and there were also 0s entered for Year of Publication. ● Location information was written as a long list of strings.Was broken up into different columns containing City, State,and Country information for cleaner analysis. ● Each file included lot of duplicated rows.All duplications were removed before starting.
  • 7.
    Exploratory Data Analysis Themost popular book is not the most well-liked book
  • 8.
  • 9.
    Exploratory Data Analysis Top25 most popular author shows that not all popular author are well liked.Most authors who have sold lots of book have very low rating for those books. For example,Rich Shapero’s book have a very low rating due to the fact that his style are loved by some but hated by many.
  • 10.
    Exploratory Data Analysis Averagerating of a book per user is 7.62
  • 11.
  • 12.
    Exploratory Data Analysis 62%of users are between the ages 18 -40
  • 13.
    Recommendation systems ● Arecommender system is a subclass of information filtering system that seeks to predict the “preference”a user would give to an item. What is a good recommendation? ● The one that is personalized (relevant to that user) ● The one that is diverse (includes different user interests) ● The one that doesn’t recommend the same items to users for the second time
  • 14.
    Collaborative Filtering Collaborative filtering(CF) is a method for generating recommendations by calculating preference scores of a user for an item using the historical preference for that item from other similar users in the database.This algorithm takes account of the explicit interaction with the item irrespective of the attributes of the item,so is domain agnostic as long as we have sufficient historical interaction data. Two types of collaborative filtering (CF): 1. Memory based collaborative filtering: uses user rating data to compute the similarities between users or items. This technique relies heavily on simple similarity measures,such as cosine similarity,to match similar people or items together. 2. Model based collaborative filtering: models are developed using different data mining,machine learning algorithms to predict users’rating of unrated items.Popular techniques include Bayesian networks and single value composition.
  • 15.
    Item Based CollaborativeFiltering “Since you liked this,you may also like..” Item-item collaborative filtering,is a form of collaborative filtering for recommender systems based on the similarity between items calculated using people's ratings of those items.In this method similar items build neighbourhoods on the behaviour of users. What makes 2 items or books similar? If User_A likes 3 books (or rates them highly),these 3 books are considered similar.This process is iterated across thousands of books and users. This method is not based on the features/contents of the items.Similarity scores of items with other items is calculated using a similarity score measure like Euclidean distance/Pearson Correlation/Cosine Similarity.
  • 16.
    The Sparse Matrix Asparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse,but a common criterion is that the number of non-zero elements is roughly equal to the number of rows or columns. The sparsity of our matrix is 98.3% On average,a user has only read 28 books among 6k available.
  • 17.
    Cosine Similarity: EvaluatingCloseness of 2 Items or Users Cosine similarity is a measure of similarity between two sequences of numbers.The sequences here,are viewed as vectors. The cosine similarity is defined as the cosine of the angle between them,that is,the dot product of the vectors divided by the product of their absolute lengths. In our case,we can view the vectors in 2 ways.Each book is a vector of n dimensions,where n is the number of users OR Each user is a vector of n dimensions, where n is the number of books. User_85 User_2245 User_1134 User_92 The Da Vinci Code 10 7 A Walk to Remember 9 5 5 Angels & Demons 5 7 Life of Pi 8 The Alchemist 7 6 The Hobbit 10 Harry Potter & The Goblet of Fire 9 8 7
  • 18.
    Cosine Similarity: EvaluatingCloseness of 2 Items The output is an array of similarity scores for each book with every other book
  • 19.
    Similarity Score Distributionsfor Sample Items Pride and Prejudice The Da Vinci Code The Hobbit
  • 20.
    k nearest neighbors Thek-nearest neighbors (KNN) algorithm is a data classification method for estimating the likelihood that a data point will become a member of one group or another based on what group the data points nearest to it belong to. The principle behind nearest neighbor methods is to find a predefined number of samples closest in distance to the new point,and predict the label from these. The number of samples (k) can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning).
  • 21.
    Working Principle ofKNN ● Choose the K value ● Calculate the distance between all the training points and new data points. ● Sort the computed distance in ascending order between training points and new data points. ● Choose the first K distances from the sorted list. ● Take the mode/mean of the classes associated with the distances. For classification, compute mode else for regression problem compute mean with the distances.
  • 22.
    Centered Cosine Similarity:Penalizing Opposite Ratings Instead of treating missing ratings as zero ratings,this method treats them as average ratings (since the mean of each row is zero). It scales strict raters and liberal raters. Also known as the Pearson Correlation. Normalizing ratings by subtracting the row mean. Each book rating is now centred around 0,positive ratings indicate that the book was liked more than average by the user,and negative implies that it was below average when taking their own personal rating system into consideration. The Da Vinci Code A Walk to Remember Angels & Demons Life of Pi The Alchemist The Hobbit Harry Potter & The Goblet of Fire User_85 2 -2 0 User_2245 2.25 -2.75 -0.75 1.25 User_1134 -2 2 0 User_92 0.25 -1.75 1.25 0.25
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    Item-Based Filtering Prosand Cons Pros - The item-based method provides more consistent recommendation results compared to others,because there is high consistency among similarities between books compared to that between users. - It can be used to recommend books for new users and those with limited rating history. Cons - Item-based methods might sometimes recommend obvious items or items that are not novel from previous user experiences.
  • 28.
    User-Based Collaborative Filtering “Users similarto you also liked..” A technique used to predict the items that a user might like based on ratings given to that item by the other users who have similar tastes.
  • 29.
    Steps for User-BasedFiltering -I 1.Filter out users with fewer than 50 ratings. 2. Create user-items matrix (pivot table).
  • 30.
    Steps for User-BasedFiltering -II 3.Calculate similarity scores for each pair of users. 4. Create a function to retrieve top three book choices from similar users.
  • 31.
    Steps for User-BasedFiltering -III 5.Use a predefined function to identify the ones with the highest similarity scores to the target user and return their top book choices.
  • 32.
    Steps for User-BasedFiltering -IV 5a.Alternatively,we can use KNN to identify the ones with the highest similarity scores and their “distances”to the target user.
  • 33.
    Similarity Score Distributionsfor Sample Users User 187517with 631 ratings User 141902 with 200 ratings User 153662 with 5814 ratings
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    User-Based Filtering Prosand Cons Pros - The performance of the recommendations will keep improving as the size of the neighborhood grows. - It requires only user ratings to make recommendations,which is independent from user demographic features. - It tends to generate more diverse results because users have varied tastes. Cons - Only a small percentage of users on Goodreads provided rating scores. - We have very limited information on new users to calculate similarity scores. - The computation of user neighborhoods needs to be performed more frequently with the addition of new users.
  • 40.
    Item-Based VS.User-Based Methods -In theory,use-user and item-item are dual approaches with similar expected performance.In practice,item-based outperforms user-based in many cases. - Users have changing tastes,while two items would always remain similar.Users have varied tastes,while items belong to a small set of “genres”. - Incremental maintenance of the recommendation model is more challenging in the case of user-based methods compared to item-based methods.
  • 41.
    Content Based Filtering Content-basedfiltering uses item features to recommend other items similar to what the user likes,based on their previous actions or explicit feedback (in this case,the rating). Features used to develop a content based model include author,publisher In contrast collaborative filtering,does not take item attributes/features into consideration.This is done by representing a profile vector of the user in the same dimensions as the item attribute vector and calculating the weights based on users’historical interaction with the items.
  • 42.
    Steps for Content-BasedMethod 1. Filter out books with fewer than 200 ratings. 2. Use predefined function to identify items with similar author,publisher and publishing year to generate recommendation results.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    Content-Based Filtering Prosand Cons Pros - We don’t need a long rating history of the user to make the recommendation,nor do we need any demographic info. - It can capture the niche interests of a user. Cons - This technique requires a lot of domain knowledge.The model can only be as good as the hand-engineered features.In our case,for example,content-based filtering wouldn’t be as good as the other two techniques,because we have very limited information on book features.
  • 48.
    K-means Clustering Cluster 1:Mostly international users,would recommend books that are popular worldwide,rather than just USA. Cluster 3: Users who tend to read books that are the most popular (most rated),even if they may not be highly rated (AvgBook.Rating = 3.2). Would recommend most popular books regardless of ratings.
  • 49.
    K-means Clustering Cluster 2:Users who are interested only in books that are highly rated,regardless of their popularity.Would recommend other highly rated books to these users. Cluster 5: Users who tend to read books that are published by the Big 5 Publishing Houses: Penguin Books,HarperCollins,Hachette Livre, Macmillan,and Simon & Schuster.Would recommend any book published under these houses.
  • 50.
    K-means Clustering ● Thereis some interpretability: ○ If user falls into high AvgBook.Rating cluster,can recommend them only books that received high ratings,etc. ● Can be a first step,but recommendation system should be improved through other methodologies. ○ Collaborative based filtering ○ Gather more feature data for each book such as genre,themes,etc.and cluster again.
  • 51.
    Business Value To recommendthe most relevant books to users based on their interactions (ratings) with other books in the past or based on other similar users’interactions with books. How Our Recommendation System Can Drive Business Value ● Help Goodreads provide an improved customer experience and gain a competitive advantage.This will improve customer retention rate and in turn acquisition costs. ● Drive customer engagement with the Goodreads website through new recommendation engine. ● Increase product awareness by helping all types of books reach new customers -strategic value of Amazon owning Goodreads. ● Improve the product design process of Goodreads website.Recommendation systems can help Goodreads make design decisions by surfacing the most relevant products to any given user.
  • 52.
    Future Improvements ● Thecold-start problem: Collaborative filtering systems are based on the action of available data from similar users. If you are building a brand new recommendation system, you would have no user data to start with. You can use content-based filtering first and then move on to the collaborative filtering approach. ● Input data may not always be accurate because hall ratings are self reports.User behavior is more important than ratings. ● A strong recommendation engine will be able to identify changes (or signs of an impending changes) in customers’ preferences and behavior,and constantly auto-train themselves in real time in order to serve relevant recommendations.
  • 53.
    Future Improvements Content BoostedCollaborative Filtering for recommender systems: CBCF is a type of hybrid recommendation technique that uses a combination of content-based filtering and collaborative filtering. Its main idea is to overcome the sparsity problem that degrades the performance of collaborative filtering algorithms by using item content to make the user-item interaction matrix dense.
  • 54.
    Future Improvements Single ValueDecomposition: SVD is a matrix factorization technique that is usually used to reduce the number of features of a data set by reducing space dimensions from N to K where K < N.The matrix factorization is done on the user-item ratings matrix.From a high level,matrix factorization can be thought of as finding 2 matrices whose product is the original matrix.