Movies recommendation system in R Studio, Machine learning
1. Movies Recommendation System
Submitted in partial fulfillment of the requirements
of the degree of
T. E. Computer Engineering
By
Suraj R. Maurya Roll No. 56 PID: 182262
Om V. Pise Roll No: 58 PID: 172074
Guide (s):
Mr. Rupesh Mishra
Asst. Professor
Department of Computer Engineering
St. Francis Institute of Technology
(Engineering College)
University of Mumbai
2019-2020
2. i
CERTIFICATE
This is to certify that the project entitled “Movies Recommendation System” is a bonafide
work of “Suraj Maurya (Roll No: 56), Om Pise (Roll No: 58)” submitted to the University of
Mumbai in partial fulfillment of the requirement for the award of the degree of T.E. in Computer
Engineering
(Mr. Rupesh Mishra)
Guide
(Dr. Kavita Sonawane)
Head of Department
3. Project Report Approval for T.E.
This project report entitled
Maurya , Mr. Om Pise
Engineering.
Date:
Place:Mumbai
ii
Project Report Approval for T.E.
This project report entitled Movies Recommendation System
Mr. Om Pise, is approved for the degree of
Examiners
1.---------------------------------------------
2.---------------------------------------------
Project Report Approval for T.E.
Movies Recommendation System by Mr. Suraj
is approved for the degree of T.E. in Computer
---------------------------------------------
---------------------------------------------
4. Declaration
I/ We (Make changes in college copy. Individual copy it will be I and
College copy it will be We)
ideas in my own words and where other
have adequately cited and referenced
adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above wi
disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission
has not been taken when needed.
Date:
iii
(Make changes in college copy. Individual copy it will be I and
College copy it will be We) declare that this written submission represents my
ideas in my own words and where other’s ideas or words have been included, I
have adequately cited and referenced the original sources. I also declare that I have
adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above wi
disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission
has not been taken when needed.
-----------------------------------------
(Signature)
Suraj Maurya Roll
Om Pise Roll no:
(Make changes in college copy. Individual copy it will be I and
declare that this written submission represents my
s ideas or words have been included, I
the original sources. I also declare that I have
adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above will be cause for
disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission
-----------------------------------------
(Signature)
no: 56
Roll no: 58
5. iv
Abstract
A recommendation engine filters the data using different algorithms and recommends the
most relevant items to users. It first captures the past behavior of a customer and based on
that, recommends products which the users might be likely to buy. If a completely new user
visits an e-commerce site, that site will not have any past history of that user. So how does
the site go about recommending products to the user in such a scenario? One possible
solution could be to recommend the best selling products, i.e. the products which are high in
demand. Another possible solution could be to recommend the products which would bring
the maximum profit to the business. Three main approaches are used for our recommender
systems. One is Demographic Filtering i.e They offer generalized recommendations to every
user, based on movie popularity and/or genre. The System recommends the same movies to
users with similar demographic features. Since each user is different , this approach is
considered to be too simple. The basic idea behind this system is that movies that are more
popular and critically acclaimed will have a higher probability of being liked by the average
audience. Second is content-based filtering, where we try to profile the users interests using
information collected, and recommend items based on that profile. The other is collaborative
filtering, where we try to group similar users together and use information about the group to
make recommendations to the user.
6. Chapter Title
1
INTRODUCTION
1.1 Project Description
1.2 Problem Formulation
1.3 Motivation
1.4 Proposed Solution
1.5 Scope of the project
2 REVIEW OF LITERATURE
3
SYSTEM ANALYSIS
3.1 Functional Requirements
3.2 Non Functional Requirements
3.3 Specific Requirements
3.4 Use-Case Diagrams and description
4
ANALYSIS MODELING
4.1
v
Contents
INTRODUCTION
Project Description
Problem Formulation
Proposed Solution
Scope of the project
REVIEW OF LITERATURE
SYSTEM ANALYSIS
Functional Requirements
Non Functional Requirements
Specific Requirements
Case Diagrams and description
ANALYSIS MODELING
Page
No.
1
1
1
1
2
2
3
5
6
6
7
8
9
7. Activity Diagrams
Class Diagram
4.2
Functional Modeling
5
DESIGN
5.1 Architectural Design
5.2 User Interface Design
6
IMPLEMENTATION
6.1 Algorithms / Methods Used
6.2 Working of the project
7 CONCLUSIONS
References
Acknowledgement
vi
Activity Diagrams
Class Diagram
Functional Modeling
Architectural Design
User Interface Design
IMPLEMENTATION
Algorithms / Methods Used
Working of the project
CONCLUSIONS
9
11
14
14
18
20
20
23
26
8. Fig.
No.
1. Use Case Diagram f
2. Class Diagram for
3. Activity Diagram for
4. Context Level DFD for
5. Level 0 DFD for
6. Level 1 DFD for
7. Architecture for
8. Main Window
Sr. No. Abbreviation
1. DFD
2. CFS
vii
List of Figures
Figure Caption
Case Diagram for Movies Recommendation system
Class Diagram for Movies Recommendation system
Activity Diagram for Movies Recommendation system
Context Level DFD for Movies Recommendation system
Level 0 DFD for Movies Recommendation system
Level 1 DFD for Movies Recommendation system
Architecture for Movies Recommendation system
Main Window (GUI)
List of Abbreviations
Abbreviation Expanded form
Data Flow Diagram
Collaborative filtering system
Page
No.
7
9
10
m 11
12
13
14-17
18-19
9. 1
Chapter 1
Introduction
1.1 Description
A recommendation system is a type of information filtering system which attempts to
predict the preferences of a user, and make suggestions based on these preferences.
There are a wide variety of applications for recommendation systems.
2. These have become increasingly popular over the last few years and are now utilized in
most online platforms that we use.
3. The content of such platforms varies from movies, music, books and video, to friends
and stories on social media platforms, to products on e-commerce websites, to people
on professional and dating websites, to search results returned on Google.
4. Often, these systems are able to collect information about a user’s choices, and can use
this information to improve their suggestions in the future.
5. For example, if Amazon observes that a large number of customers who buy the latest
Apple MacBook also buy a USB-C-to USB Adapter, they can recommend the Adapter
to a new user who has just added a MacBook to his cart.
1.2 Problem Formulation
The movie recommendation system will be built using artificial algorithms that analyze user's
favorite genres and recommend movies according to their liking. The response will be based on
the liking of the user. The User will submit queries depending on their liking of their movies.
The System analyses the liking and then recommends the user movies. Providing related content
out of relevant and irrelevant collection of items to users of online service providers. Netflix
aims to recommend movies to users based on content of items rather than other user’s opinions.
1.3 Motivation
A recommendation system also finds a similarity between the different products. For example,
Netflix Recommendation System provides you with the recommendations of the movies that are
similar to the ones that have been watched in the past. Furthermore, there is a collaborative
content filtering that provides you with the recommendations in respect with the other users who
might have a similar viewing history or preferences. There are two types of recommendation
systems – Content-Based Recommendation System and Collaborative Filtering
Recommendation. In this project of recommendation system in R, we will work on a
collaborative filtering recommendation system and more specifically, ITEM based collaborative
recommendation system.
10. 2
1.4 Proposed Solution
The proposed movie recommendation system is based on the abstract maximal clique
method.The k-cliques, which are partially graphs that are fully connected to k vertices
and a very effective method to build groups in social networks analysis is proposed. In
the proposed approach, a similarity measure of cosine is used to measure similarities
between users. The proposed solution offers improved k-clique methods for more
efficient performance than existing collaborative filtering and maximal clique. For
performance evaluation, use MovieLens data, which is general information in movie
recommendation systems. To assess the effectiveness of a MovieLens dataset, it is
divided into experimental and test data that are widely used in artificial intelligence.
Comparison of collaborative filtering methods using k nearest neighbor, maximal clique
method, k-clique method, and improve k-clique to evaluate performance.
1.5 Scope of the Project
In the near future, it will be installed in Apache Server and so it will be published on the
internet. Datasets will be updated continuously and it will make online actual rating
predictions to the users whose habits are changing day by day. As a result, it can be
sensitively satisfying current user tastes. Web services in particular suffer from producing
recommendations of millions of items to millions of users. The time and computational
power can even limit the performance of the best hybrid systems. For larger dataset, we
can work on scalability problems of recommendation systems. The Prediction approach
can also be tried in different datasets to test harmony performance of system scalability
problems of recommendation systems.
11. 3
Chapter 2
Review of Literature
2.1 A STUDY ON CONTENT-BASED VIDEO RECOMMENDATION
Authors: Yan Li Hanjie Wang Hailong Liu Bo Chen - Tencent WeChat, China
Publication: IBM Research, Yorktown Heights, China
Approach:
The competition is challenging, and the reason lies in three aspects, i.e., large vision appearance
variance, insufficient training data, and serious data incompleteness. Meta-data feature: the
provided meta-data information includes actor/actress, director, description, and genre.
Considering the small amount of training data, in this paper we only take the advantage of show
descriptions. For description representation, we first apply the Latent Dirichlet Allocation
(LDA) algorithm to generate a topic model from about 400 movies descriptions (we build this
corpus by crawling data in terms of genre from IMDb, which is treated as the world’s most
popular and authoritative source for movie, TV and celebrity content), and then compute the
topic distribution probability for each TV-show
2.2 Content-based recommender system for online stores using expert system
Authors: Bogdan Walek, Petra Spackova
Publication:University of Ostrava
Approach:
The main goal of the recommender system is to propose and deliver suitable content to the user.
One of the goals of the proposed recommendation system is to decrease the cold start effect. At
the end of the paper, the proposed system is experimentally verified. The recommender system
uses a collaborative filtering system for recommending suitable items and an expert system for
evaluating the popularity of items. The system also proposes an algorithm for showing items
from similar users after the first login to decrease the effect of cold start problem. The
knowledge base of the proposed expert system contains three input linguistic variables and one
output linguistic variables.
2.3 A Content-based Movie Recommender System based on Temporal User Preferences
Authors: Bagher Rahimpour Cami, Hamid Hassanpour, Hoda Mashayekhi
Publication: Shahrood University of Technology Shahrood, Iran
Approach:
the user profile consists of user activities as userId, activity1, ..., activity-n, where each activity-i
indicates the content and access time of selected items denoted as itemId, itemDesc, accessDate.
12. 4
This model is user-centered and employs the profile of each user to create a user model for
individuals. In movie domain, each rating record of rating matrix (movieId, movieDesc, rate,
accessDate) is corresponded to an activity. The temporal preferences model is based on Bayesian
non-parametric framework and has three main components: interest extraction, inferring of
preferences, and prediction.Interests extraction, where analysis the user profile to discover user
interests. This model employs the user profile into Distance Dependent Chinese Restaurant
Process (DDCRP) [27] and performs clustering. DDCRP is based on Bayesian non-parametric
thus, the clusters can grow whenever new data is observed.
2.4 An Improved Content Based Collaborative Filtering Algorithm For Movie
Recommendations
Authors: Ashish Pal, Prateek Parhi and Manuj Aggarwal
Publication: ARSD College, University of Delhi, New Delhi, India
Approach:
Our proposed algorithm takes into consideration the tags and genres specified in the dataset, and
for the content-based prediction, we have applied a set matching comparator. This comparator
returns the number of common objects between two movies. The term object here refers to tags
and genres. For each particular movie, the tags and genres are merged into a single set. This
gives us a bulky content for each movie, and more the content better is the predictions. After
getting the set of common objects, the weight of each set for a movie is calculated. Once the
weights are assigned to each of the set, they are then used to provide the ratings of the unrated
movies using the rated movies which were previously compared. In our methodology first, the
tags for each movie assigned by different users are used and converted into a single list. The
genres for each movie are appended to the same list of tags. This final list is referred to as the
objects for a particular movie. The object set for each active movie is compared with the object
set of every other movie in the dataset and the number of matching objects are assigned to a set.
13. 5
Chapter 3
System Analysis
3.1 Functional Requirements
Major functionalities associated with the user are:
● Enable users to submit his preferred genres by typing into the input text box.
● The text to be written should support adding text from different encodings such as utf-8,
latin encoded text which represents information in English.
Major functionalities associated with the system are as follows:
● Enable system to use the keywords obtained after tokenization of a new input to find the
cluster it belongs to.
Interface Requirements:
● Field 1 accepts the preferences of a user.
● Field 2 recommends the movies to the user
3.2 Non Functional Requirements
3.2.1 Performance
The computer running the software did not require a powerful CPU and GPU, it
requires a 64 bit operating system for execution of a program in order to call
inbuilt packages like flask,sklearn etc.
3.2.2 Reliability
The system takes in the inputs without any error and predicts the expected
response accurately so that the users of the system can get its query response.
3.2.3 Usability
The system is easy to handle and navigates in the most expected way with no
delays. The system interacts with the user in a very friendly manner making it
easier for the users to use the system.
14. 6
3.3 Specific Requirements
3.3.1 User Interfaces
● Front-end Software: Flask ,HTML, CSS, JavaScript, Web browser.
● Back-end Software: R Studio
3.3.2 Hardware Requirements
● CPU Type : Intel Core or above
● Clock speed : 1.0GHz
● Ram size : 1GB and above
● Hard Disk capacity : 100GB and above
● Working keyboard
3.3.2 Software Requirements
● Operating System : Windows
● Python 3.5 or above
3.3.3 Communication Interfaces
This system will be completely based on a local system of the user.
15. 3.4 Use-Case Diagrams and Description
Fig 1. Use Case Diagram for Movies Recommendation System.
Use Case Diagram:
A use case diagram is a dynamic or behavior diagram in UML. Use case diagrams model the
functionality of a system using actors and the use cases. Use cases are set of actions, services,
and functions that the system needs to perform. In this context, a “Sy
developed and operated, such as a website. The “Actors” are people or entities operating under
defined roles within the system.
Case Diagrams and Description
Fig 1. Use Case Diagram for Movies Recommendation System.
A use case diagram is a dynamic or behavior diagram in UML. Use case diagrams model the
functionality of a system using actors and the use cases. Use cases are set of actions, services,
and functions that the system needs to perform. In this context, a “System” is something being
developed and operated, such as a website. The “Actors” are people or entities operating under
defined roles within the system.
7
Fig 1. Use Case Diagram for Movies Recommendation System.
A use case diagram is a dynamic or behavior diagram in UML. Use case diagrams model the
functionality of a system using actors and the use cases. Use cases are set of actions, services,
stem” is something being
developed and operated, such as a website. The “Actors” are people or entities operating under
16. 8
Use Case Specifications
Use case: Submit Query
Brief Description. This use case will be expecting user to submit or a keyword as the input
which is compatible with the system.
Primary Actor: User
Use Case: View Response
Brief Description: This use case will show the bot response in a text format
Primary Actor: User
Use Case: Query Processing
Brief Description: In this use case user input query is processed inorder to give the response.
Primary Actor: System
Main Flow:
1. It will tokenize the sentences followed by word tokenization.
2. For each word in the input given, it is checked against a common set of punctuations and
stop words and are removed accordingly.
Use Case: Generate Response
Brief Description: In this use case responses are generated on the basis of query analysis and
AIML query
Primary Actor: System
Use Case: Submit Feedback
Brief Description: This use case used for taking feedback from user inorder to get better
performance of a bot.
Primary Actor: User
Main Flow:
Taken feedback is stored in a text file and analyze by administrator
17. 4.1 Class Diagram and Activity Diagram
Fig 2. Class Diagram for
Chapter 4
Analysis Modeling
Class Diagram and Activity Diagram
Fig 2. Class Diagram for Movies Recommendation syste
9
em .
18. Fig 3. Activity Diagram forFig 3. Activity Diagram for Movies Recommendation sys
10
stem.
19. 4.2 Functional Modeling
Data Flow Diagram
A data flow diagram (DFD) illustrates how data is processed by a system in terms of inputs and
outputs. As its name indicates its focus is on flow of information, where data comes from, where
it goes and how it gets stored.
Fig4. Level 0 DFD for Movies
Context Level DFD:
This Level is called the Context Level DFD. It is a basic overview of the whole system or
process being analyzed or modelled. Here the basic flow of the system is showed. The user
gives input which is stored by the system. Based on the input given the system accordingly
processes and gives then output to then user.
Modeling
A data flow diagram (DFD) illustrates how data is processed by a system in terms of inputs and
outputs. As its name indicates its focus is on flow of information, where data comes from, where
. Level 0 DFD for Movies recommendation system.
This Level is called the Context Level DFD. It is a basic overview of the whole system or
process being analyzed or modelled. Here the basic flow of the system is showed. The user
is stored by the system. Based on the input given the system accordingly
processes and gives then output to then user.
11
A data flow diagram (DFD) illustrates how data is processed by a system in terms of inputs and
outputs. As its name indicates its focus is on flow of information, where data comes from, where
recommendation system.
This Level is called the Context Level DFD. It is a basic overview of the whole system or
process being analyzed or modelled. Here the basic flow of the system is showed. The user
is stored by the system. Based on the input given the system accordingly
20. Fig 5. Level 1 DFD for Movies Recommendation system.
Level 1 DFD:
DFD Level 1provides a more detailed breakout of pieces of Context Level DFD. It basically
explains the system more in detail.
DFD comprises of details which are fabricated in level 0 of DFD. Here
login details and two databases consisting of movie recommendation system data set and user
data set.
Fig 5. Level 1 DFD for Movies Recommendation system.
DFD Level 1provides a more detailed breakout of pieces of Context Level DFD. It basically
explains the system more in detail.The level 1 DFD is more detailed than level 0. This level of
DFD comprises of details which are fabricated in level 0 of DFD. Here in DFD 1 we can see
login details and two databases consisting of movie recommendation system data set and user
12
Fig 5. Level 1 DFD for Movies Recommendation system.
DFD Level 1provides a more detailed breakout of pieces of Context Level DFD. It basically
The level 1 DFD is more detailed than level 0. This level of
in DFD 1 we can see
login details and two databases consisting of movie recommendation system data set and user
21. Fig 6. Level 2 DFD for Movies recommendation system
C] Level 2 DFD
A level 2 DFD is much more informative than its previous counterparts. Here the system is
further divided and is explained in much more detail so that it is very easy to understand the
whole system. We can go for further level 3 and level 4 of DFDs but the
complicated and make the system hard to understand and implement.
. Level 2 DFD for Movies recommendation system
A level 2 DFD is much more informative than its previous counterparts. Here the system is
further divided and is explained in much more detail so that it is very easy to understand the
whole system. We can go for further level 3 and level 4 of DFDs but they will be much more
complicated and make the system hard to understand and implement.
13
. Level 2 DFD for Movies recommendation system
A level 2 DFD is much more informative than its previous counterparts. Here the system is
further divided and is explained in much more detail so that it is very easy to understand the
y will be much more
22. 14
Chapter 5
Design
5.1 Architectural Design
To start with, we present an overall system diagram for recommendation systems in the
following figure. The main components of the architecture contain one or more machine learning
algorithms.
Fig 7. Architectural Design for Movies recommendation system
23. 15
The simplest thing we can do with data is to store it for later offline processing, which leads to
part of the architecture for managing Offline jobs. However, computation can be done offline,
nearline, or online. Online computation can respond better to recent events and user interaction,
but has to respond to requests in real-time. This can limit the computational complexity of the
algorithms employed as well as the amount of data that can be processed. Offline computation
has less limitations on the amount of data and the computational complexity of the algorithms
since it runs in a batch manner with relaxed timing requirements. However, it can easily grow
stale between updates because the most recent data is not incorporated. One of the key issues in a
personalization architecture is how to combine and manage online and offline computation in a
seamless manner. Nearline computation is an intermediate compromise between these two
modes in which we can perform online-like computations, but do not require them to be served
in real-time. Model training is another form of computation that uses existing data to generate a
model that will later be used during the actual computation of results. Another part of the
architecture describes how the different kinds of events and data need to be handled by the Event
and Data Distribution system. A related issue is how to combine the different Signals and
Models that are needed across the offline, nearline, and online regimes. Finally, we also need to
figure out how to combine intermediate Recommendation Results in a way that makes sense for
the user. The rest of this post will detail these components of this architecture as well as their
interactions. In order to do so, we will break the general diagram into different sub-systems and
we will go into the details of each of them. As you read on, it is worth keeping in mind that our
whole infrastructure runs across the public Amazon Web Services cloud.
Online computation can respond quickly to events and use the most recent data. An example is to
assemble a gallery of action movies sorted for the member using the current context. Online
components are subject to an availability and response time Service Level Agreements (SLA)
that specifies the maximum latency of the process in responding to requests from client
applications while our member is waiting for recommendations to appear. This can make it
harder to fit complex and computationally costly algorithms in this approach. Also, a purely
24. 16
online computation may fail to meet its SLA in some circumstances, so it is always important to
think of a fast fallback mechanism such as reverting to a precomputed result. Computing online
also means that the various data sources involved also need to be available online, which can
require additional infrastructure.Nearline computation can be seen as a compromise between the
two previous modes. In this case, computation is performed exactly like in the online case.
However, we remove the requirement to serve results as soon as they are computed and can
instead store them, allowing it to be asynchronous. The nearline computation is done in response
to user events so that the system can be more responsive between requests. This opens the door
for potentially more complex processing to be done per event. An example is to update
recommendations to reflect that a movie has been watched immediately after a member begins to
watch it. Results can be stored in an intermediate caching or storage back-end. Nearline
computation is also a natural setting for applying incremental learning algorithms.
In any case, the choice of online/nearline/offline processing is not an either/or question. All
approaches can and should be combined. There are many ways to combine them. We already
mentioned the idea of using offline computation as a fallback. Another option is to precompute
part of a result with an offline process and leave the less costly or more context-sensitive parts of
the algorithms for online computation.
Much of the computation we need to do when running personalization machine learning
algorithms can be done offline. This means that the jobs can be scheduled to be executed
periodically and their execution does not need to be synchronous with the request or presentation
of the results. There are two main kinds of tasks that fall in this category: model training and
batch computation of intermediate or final results. In the model training jobs, we collect relevant
existing data and apply a machine learning algorithm produces a set of model parameters (which
we will henceforth refer to as the model). This model will usually be encoded and stored in a file
for later consumption. Although most of the models are trained offline in batch mode, we also
have some online learning techniques where incremental training is indeed performed online.
Batch computation of results is the offline computation process defined above in which we use
25. existing models and corresponding input data to compute results that will be used at a later time
either for subsequent online processing or direct presentation to the user.
Fig 8. Architecture for Movies recommendation System
existing models and corresponding input data to compute results that will be used at a later time
either for subsequent online processing or direct presentation to the user.
. Architecture for Movies recommendation System
17
existing models and corresponding input data to compute results that will be used at a later time
. Architecture for Movies recommendation System
26. 5.2 User Interface Design
Fig 9
User Interface Design
9. GUI for Movies recommendation System
18
28. 6.1 Algorithms Used
USER-based Collaborative Filtering Model
Now, I will use the user-based approach. According to this approach, given a new user, its
similar users are first identified. Then, the top
recommended.
For each new user, these are the steps:
1. Measure how similar each user is to the new one. Like IBCF, popular similarity measures
are correlation and cosine.
2. Identify the most similar users. The options are:
● Take account of the top k users (k
● Take account of the users whose similarity is above a defined threshold
3. Rate the movies rated by the most similar users. The rating is the average rating among
similar users and the approaches are:
Chapter 6
Implementation
based Collaborative Filtering Model
based approach. According to this approach, given a new user, its
similar users are first identified. Then, the top-rated items rated by similar users are
se are the steps:
Measure how similar each user is to the new one. Like IBCF, popular similarity measures
are correlation and cosine.
Identify the most similar users. The options are:
Take account of the top k users (k-nearest_neighbors)
e users whose similarity is above a defined threshold
Rate the movies rated by the most similar users. The rating is the average rating among
similar users and the approaches are:
20
based approach. According to this approach, given a new user, its
rated items rated by similar users are
Measure how similar each user is to the new one. Like IBCF, popular similarity measures
e users whose similarity is above a defined threshold
Rate the movies rated by the most similar users. The rating is the average rating among
29. ● Average rating
● Weighted average rating, using the similarities as weights
● Pick the top-rated movies.
In content-based filtering, items are recommended based on comparisons between item profile
and user profile. A user profile is content that is found to be relevant to the user in form of
keywords(or features). A user profile m
features) collected by algorithm from items found relevant (or interesting) by the user. A set of
keywords (or features) of an item is the Item profile. For example, consider a scenario in which a
person goes to buy his favorite cake ‘X’ to a pastry. Unfortunately, cake ‘X’ has been sold out
and as a result of this the shopkeeper recommends the person to buy cake ‘Y’ which is made up
of ingredients similar to cake ‘X’. This is an instance of content
We will be using the cosine similarity to calculate a numeric quantity that denotes the
similarity between two movies. We use the cosine similarity score since it is independent of
magnitude and is relatively easy and fast to calculate. Mathematically, it is defined as
follows:
Weighted average rating, using the similarities as weights
rated movies.
based filtering, items are recommended based on comparisons between item profile
and user profile. A user profile is content that is found to be relevant to the user in form of
keywords(or features). A user profile might be seen as a set of assigned keywords (terms,
features) collected by algorithm from items found relevant (or interesting) by the user. A set of
keywords (or features) of an item is the Item profile. For example, consider a scenario in which a
oes to buy his favorite cake ‘X’ to a pastry. Unfortunately, cake ‘X’ has been sold out
and as a result of this the shopkeeper recommends the person to buy cake ‘Y’ which is made up
of ingredients similar to cake ‘X’. This is an instance of content-based filtering
Fig. 10 Content Based Filtering
We will be using the cosine similarity to calculate a numeric quantity that denotes the
similarity between two movies. We use the cosine similarity score since it is independent of
magnitude and is relatively easy and fast to calculate. Mathematically, it is defined as
21
based filtering, items are recommended based on comparisons between item profile
and user profile. A user profile is content that is found to be relevant to the user in form of
ight be seen as a set of assigned keywords (terms,
features) collected by algorithm from items found relevant (or interesting) by the user. A set of
keywords (or features) of an item is the Item profile. For example, consider a scenario in which a
oes to buy his favorite cake ‘X’ to a pastry. Unfortunately, cake ‘X’ has been sold out
and as a result of this the shopkeeper recommends the person to buy cake ‘Y’ which is made up
iltering
We will be using the cosine similarity to calculate a numeric quantity that denotes the
similarity between two movies. We use the cosine similarity score since it is independent of
magnitude and is relatively easy and fast to calculate. Mathematically, it is defined as
30. We are now in a good position to define our recommendation function. These are the
following steps we'll follow :-
● Get the index of the movie given its t
● Get the list of cosine similarity scores for that particular movie with all movies.
Convert it into a list of tuples where the first element is its position and the second is the
similarity score.
● Sort the aforementioned list of tuples based on t
element.
● Get the top 10 elements of this list. Ignore the first element as it refers to self (the
movie most similar to a particular movie is the movie itself).
● Return the titles corresponding to the indices of the top elements.
While our system has done a decent job of finding movies with similar plot descriptions, the
quality of recommendations is not that great. "The Dark Knight Rises" returns all Batman
movies while it is more likely that the people who liked that movie are more inclined to
enjoy other Christopher Nolan movies. This is something that cannot be captured by the
present system.
We are now in a good position to define our recommendation function. These are the
-
● Get the index of the movie given its title.
● Get the list of cosine similarity scores for that particular movie with all movies.
Convert it into a list of tuples where the first element is its position and the second is the
● Sort the aforementioned list of tuples based on the similarity scores; that is, the second
● Get the top 10 elements of this list. Ignore the first element as it refers to self (the
movie most similar to a particular movie is the movie itself).
● Return the titles corresponding to the indices of the top elements.
While our system has done a decent job of finding movies with similar plot descriptions, the
quality of recommendations is not that great. "The Dark Knight Rises" returns all Batman
ies while it is more likely that the people who liked that movie are more inclined to
enjoy other Christopher Nolan movies. This is something that cannot be captured by the
22
We are now in a good position to define our recommendation function. These are the
● Get the list of cosine similarity scores for that particular movie with all movies.
Convert it into a list of tuples where the first element is its position and the second is the
he similarity scores; that is, the second
● Get the top 10 elements of this list. Ignore the first element as it refers to self (the
While our system has done a decent job of finding movies with similar plot descriptions, the
quality of recommendations is not that great. "The Dark Knight Rises" returns all Batman
ies while it is more likely that the people who liked that movie are more inclined to
enjoy other Christopher Nolan movies. This is something that cannot be captured by the
31. 6.2 Working of the project
CODE SNIPPETS
Fig 11. Fro
6.2 Working of the project
ont end code for Movies recommendation System
23
for Movies recommendation System
32. Fig 12. Bacckend code for Movies recommendation System
24
for Movies recommendation System
33. Fig 13. Bacckend code for Movies recommendation System
25
for Movies recommendation System
34. 26
Chapter 7
Conclusion
In our project, a collaborative filtering algorithm is used to predict a user's movie rating. The
MovieLens dataset, which has 10 million ratings, is selected in our project and divided into
training set and test set. The RMSE method is used for algorithm evaluation. According to
evaluation as a result, our movie recommender system has pretty good prediction performance.
A hybrid approach is taken between context based filtering and collaborative filtering to
implement the system. This approach overcomes drawbacks of each individual algorithm and
improves the performance of the system. Techniques like Clustering, Similarity and
Classification are used to get better recommendations thus reducing MAE and increasing
precision and accuracy. In future we can work on hybrid recommender using clustering and
similarity for better performance. Our approach can be further extended to other domains to
recommend songs, video, venue, news, books, tourism and e-commerce sites, etc.
36. 28
Acknowledgements
We take the opportunity to thank all those people who have helped and guided us through this
project and make this experience worthwhile for us. We wish to sincerely thank our reverend
Bro. Jose Thuruthiyil and principal Dr. Sincy George for giving us this opportunity for
making a project in the Third Year of Engineering. We would also like to thank HOD of
Computer department Dr. Kavita Sonawane and all teaching and nonteaching staff for their
immense support and cooperation.
Last but not the least we would like to thank Mr. Rupesh Mishra for guiding us throughout
the project and encouraging us to explore in this domain.