1. Data Science Movie Recommendation System
Team members:
1. Vikram Parmar (32)
2. Priyanka Hanchate (21)
2. OUTLINE OF PROJECT
• Have you ever been on an online streaming platform like Netflix,
Amazon Prime, Voot? I watched a movie and after some time, that
platform started recommending me different movies and TV shows.
I wondered, how the movie streaming platform could suggest me
content that appealed to me. Then I came across something
known as Recommendation System. This system is capable of
learning my watching patterns and providing me with relevant
suggestions.
3. IDEA OF PROJECT
• The main goal of this machine learning project is to build a
recommendation engine that recommends movies to users. This R
project is designed to help you understand the functioning of how a
recommendation system works. We will be developing an Item Based
Collaborative Filter. By the end of this tutorial, you will gain
experience of implementing your R, Data Science, and Machine
learning skills in a real-life project.
• Before moving ahead in this movie recommendation system project
in ML, you need to know what recommendation system means. Read
below to find the answer.
4. SOLUTION OF OUR PROJECT
• A recommendation system provides suggestions to the users
through a filtering process that is based on user preferences and
browsing history. The information about the user is taken as an
input. The information is taken from the input that is in the form of
browsing data. This information reflects the prior usage of the
product as well as the assigned ratings. A recommendation
system is a platform that provides its users with various contents
based on their preferences and likings. A recommendation system
takes the information about the user as an input. The
recommendation system is an implementation of the machine
learning algorithms.
5. IMPLEMENTATION
• In order to build our recommendation system, we have used the
MovieLens Dataset. You can find the movies.csv and ratings.csv
file that we have used in our Recommendation System Project
here. This data consists of 105339 ratings applied over 10329
movies.
• Importing Essential Libraries
• In our Data Science project, we will make use of these four
packages – ‘recommenderlab’, ‘ggplot2’, ‘data.table’ and
‘reshape2’.
7. Collaborative Filtering
• This filtration strategy is based on the combination of the user’s behavior and
comparing and contrasting that with other users’ behavior in the database.
The history of all users plays an important role in this algorithm. The main
difference between content-based filtering and collaborative filtering that in
the latter, the interaction of all users with the items influences the
recommendation algorithm while for content-based filtering only the
concerned user’s data is taken into account.
• There are multiple ways to implement collaborative filtering but the main
concept to be grasped is that in collaborative filtering multiple user’s data
influences the outcome of the recommendation. and doesn’t depend on only
one user’s data for modeling.
8. DISADVANTAGES
Cannot handle fresh items
The prediction of the model for a given (user, item) pair is the dot product of the corresponding embeddings.
So, if an item is not seen during training, the system can't create an embedding for it
and can't query the model with this item.
This issue is often called the cold-start problem. However, the following techniques can address
the cold-start problem to some extent:
•Heuristics to generate embeddings of fresh items. If the system does not have interactions, the system can
approximate its embedding by averaging the embeddings of items from the same category, from the same uploader
(in YouTube), and so on.
Hard to include side features for query/item
Side features are any features beyond the query or item ID. For movie recommendations,
the side features might include country or age.
Including available side features improves the quality of the model. Although it may not be easy to include side
features in WALS, a generalization of WALS makes this possible.
To generalize WALS, augment the input matrix with features by defining a block matrix
, where:
•Block (0, 0) is the original feedback matrix
•.
•Block (0, 1) is a multi-hot encoding of the user features.
•Block (1, 0) is a multi-hot encoding of the item features.
9. ADDVANTAGES
• No domain knowledge necessary
• We don't need domain knowledge because the embeddings are
automatically learned. Serendipity
• The model can help users discover new interests. In isolation, the ML system
may not know the user is interested in a given item, but the model might still
recommend it because similar users are interested in that item. Great
starting point
• To some extent, the system needs only the feedback matrix to train a matrix
factorization model. In particular, the system doesn't need contextual
features. In practice, this can be used as one of multiple candidate
generators.
13. CONCLUSION
• Recommendation systems provide content for us by
taking what other people recommend as well as our
selections into account•Collaborative Filtering is a widely
used solution for this problem which we make use of in
our project
14. Refrences
Xiaoyuan Su and Taghi M. Khoshgoftaar, “A Survey of
Collaborative Filtering Techniques,” Advances in Artificial
Intelligence, vol. 2009, Article ID 421425, 19 pages, 2009[2]
R. M. Bell, Y. Koren and C. Volinsky, "The BellKor solution
to the Netflix grand prize," March 2012. [Online]. Available
at:http://www2.research.att.com/~volinsky/netflix/ProgressP
rize2007BellKorSolution.pdf[3] A Nearest Neighbor
Approach using Clustering on the Netflix Prize Data