3. Introduction
Film and television are an integral part of culture and one
way that people understand and interact with it
However studies that require watching and annotating
video are time-consuming and expensive to run at scale
This paper explore from media database cast lists to
explore the evolution of different roles for gender over
time
4. Dataset and the Methodology
Internet Movie Database (IMDb)
IMDb is the world's most popular and
authoritative source for movie, TV and
celebrity content.
5. Dataset and the Methodology
The dataset are available from IMDb and
examine cast lists
Cast list contains performer names,
images and the character name
Exploit following three factors from data
Release date
Gender
Role
6. Dataset and the Methodology
We downloaded the plain text data files actors.list.gz and actresses.list.gz
Applied several cleaning phases
filtering roles marked n/a, or reference selves (e.g. himself or herself)
Remove any text in parentheses
split multi-role characters
p(F|r, y) is the count of records with role r in year y by a performer from the
actresses list
p(M|r, y) = 1 - p(F|r, y)
7. Role
The dataset show what roles are popular in onscreen media and how
has this changed over time
While the analysis shows the enduring popularity of hosted screen
entertainment, this can obscure some of the emerging roles through
time
8. Role
For the same period, which roles are new and did not
appear in the top 50 roles of the previous period
9. Gender
One of the most valuable characteristics of dataset is that
each performer has gender information.
Aggregating by role allows us to consider
biases of the gender of onscreen roles
10. Gender
The author also analyze the gender distribution of common
roles to characterise how gender relates to roles at a high
level
11. Gender
The author then analyze dataset and show strong biases in the
representation of executive roles, looking for key roles in areas
IT, Doctor, Corporate, Law, Politics, Science, Religion, Engineering
12. Gender
The analyses to this point have only
referenced IMDb data
It is also interesting to compare with their
real-world counterparts
13. Conclusions
This paper presents methodologies for mining information
about onscreen media gender from cast lists.
The IMDb data release does not report the information of
country directly and it would have to be inferred