2. Agenda
Data and data preprocessing
Data visualization
No model approach for new users
Content based filtering using correlation
Collaborative filtering using user attributes
Q&A
2
3. Data and Data Preprocessing
Data Source Link: https://grouplens.org/datasets/movielens/100k/
Reading files: Rating file, Item file, User file
3
original data and files
4. Data and Data Preprocessing
4
Users Information Rating Information
Movie information
Merge dataset
user_id and movie_id
5. Data and Data Preprocessing
Drop blank column
Genres of movie
5
17. No Model Recommender System
17
Step 1: Loading the Dataset into
python
Step 2: Merging the Datasets into
one
Step 3: Calculating count of ratings
and average of ratings
Step 4: Sorting the data based on
count and average of ratings
Step 5: Deciding the cutoff value for
count
Step 6: Recommending movies
18. Content based Recommender System
● Focus on attributes of item
● Recommendations based on item - similarity
18
Movies seen by user
Similar Movies
19. What is the catch?
● Explore data to see best rated movies
● Most popular movies:
19
23. Approaches for old users
• Preference for the movie type
• Same Occupation
• Similar Age
23
Movies seen by both users
Seen by her, Recommend
to him!
27. Basic Structure of the Code
Input dataset
For loop of User_id :
For loop to find preference type:
For loop to rate movies of different movie types:
For loop to rate movies of different occupations:
For loop to rate movies of different ages ( age between “age-5” and “age+5” ):
Output to CSV
27
We did a group by here so it may happen that only one or two people saw it and gave it 5 star rating. Hence next we see movies with most ratings
Most of the number of ratings is between 0 10 10 which makes sense because people will only rate a few. Most of people watch only big blockbuster movies so those are the ones which will have a lot of ratings.
Peaks at whole number because most people will rate in whole numbers. Most movies are distributed almost evenly. Hunch around 3-3.5. 1 rated movies must be the horrible ones.