4. This is story of Annie –
A new Movieflix user
She just finished watching
Shrek and is thinking
Why does Movieflix keep recommending
animation movie. Just because I watched
Shrek doesn't mean I like only animation. It
would be nice to watch something else…
umm… maybe a period movie..
Can Movieflix read her
mind?
5. List Movies from community
List prominent movies from each genre
that the community has watched.
Identify community
Identify community with
those movies. Select
community with most
maximum matching list
basis.
Input Movie
List of movies Annie has
watched so far (minimum 1)
Mark key player
Identify central, keyplayers and
neighbor nodes in the
community.
Recommend
To Annie - Movies from
other genre preferred by
community
Probably – community detection
Most of the content based recommender system suffer from “cold start” and
“overspecialisation” problems. One way to deal with such problem is divide users into
communities, identify the community of a new user and then recommend movies that other
viewers in the same community have watched
7. Raw Data – Two files – Movies and Ratings
Ratings.csv
Movies.csv
Source: Extract from movie review site
imdb (available publicly) – Movies mostly
from last decade
8. Data profile
All time classics like Forrest Gump
and Shawshank Redemption have
large number of review
Dominating genre in the whole mix
is Drama/Comedy/Thriller/Action
Distribution of number of reviews
per movie follows a exponential
distribution
Movies attracting most reviews
Prominent Genre Distribution of reviews counts
9. Graph Generation – Nodes Edges and its
meaning
Each movie is a node. Only movies between 30 and 180
reviews were considered
Node
01
Two movies are connected if a user reviewed both movies
positively and gave a score of more then 3 on a scale 1-5
Edge02
Number of common positive reviews. Only weights above
particular threshold were considered
Edge Weight03
Undirected and weighted
Nature of Graph04
The Mask / Comedy
The Lion King / Animation
100 common positive reviews
10. Raw Graph – Nothing but a hairball
Edge thickness denote weight
Node Size Denote Degree
Average degree = 7
Strongly connected
Diameter= 5
12. First thing first – Identify type of network
Degree Distribution of
nodes indicates graph is
similar to scale free
network
13. The chart below compares the output from various community detection Algo. The algorithms are compared in terms of modularity
of resulting community cluster and efficiency of execution (CPU times)
Let’s get to work – Detect Communities
Modularity based method like leading eigenvector and Louvain worked well
Top Modularity
Score 0.52
15. Let’s see what the viewers groups are like
Drama King
Nerd Clan
Adventure Club
16. Insight 1 : Genre not a representation of viewer’s mindset
Drama King
Nerd Clan
Adventure Club
Each community has movies from
variety of genre. Demonstrates that the
approach to categorise a new user
based on genre could be misleading.
17. Insight 1: Consider for example the movie Shrek
Genre suggests it should be kids movie
However PG rating suggests otherwise,
a rating of PG usually means not suitable
for kids
Actually more popular among young adults.
Movie review database tells us the true story
18. Insight 2: Effect of all time hit movies - Expectation
Several reviews
All time hit movies get high number of reviews.
01
High Degree
Thus will have higher degree
03
Several shared reviews with other nodes
As a result, these movies will
share reviewers with large number of other movies02
Central nodes – Key players
These high degree nodes will the central to a community
and will uniquely characterize a community04
19. Insight 2: Effect of all time hit movies - Reality
1. All time hits only served to make
community detection difficult
2. Network of all time hits had following
characteristics:
a. All nodes had high degree
b. Poor modularity- 0.08
c. Well interconnected web- poor
centrality distinction
3. When these nodes were integrated with
overall network, the modularity of overall
network dropped significantly. These movies
were ultimately removed from the data mix.
Possible explanation of such behaviour – Almost
everyone have watched and liked all time hit
movies, regardless of their movie preferences. As a
result, data from these movies provide no
information on viewer’s choice.
28. Adventure Club – Central Movie
List of all movies Gladiator shares fans with.
Genre ranges from thriller to animation to Action
29. Adventure Club – Key Players
Notice how America History turn out to be
a keyplayer even thought its degree is very low
30. Similarly Memento too is quite
representative of the community
American History shares fan only
with Lord of the Rings
and Memento
Lord of Rings in turn shares
fans with all prominent movies
of the group
Adventure Club – American History
32. List Movies from community
If “Shrek” is a keyplayer movie of the
community, Jackpot! Choose all
keyplayers(across genres). Otherwise,
choose movies that are neighbors to
“Shrek”.
Identify community
Identify communities has
the movie “Shrek”.
(Community Adventure
Club)
Input Movie
Annie has just watched 1
movie – “Shrek”
Mark key player
Mark keyplayer and central
movies of “Adventure Club”.
Recommend
To Annie – Recommend
chosen movie.
Recommended movies:
Gladiator (Period drama)
Lord of the Ring (Fantasy)
Pirates of Caribbean
(Fantasy)
Back to Annie’s problem
33. Future Work
1. Create overlapping communities
2. Each node should have a probability associated with it of
belonging to each of the community
3. Design accommodates hopping user better.
If user preference change over time, they can easily follow a
chain of movies into another community
4. Add more movies to network to broaden its scope