Rezapour R, Diesner J (2017) Classification and Detection of Micro-Level Impact of Issue-Focused Films based on Reviews. Proceedings of 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017), Portland, OR.
Engler and Prantl system of classification in plant taxonomy
Classification and Detection of Micro-Level Impact-CSCW2017 (Link: http://dl.acm.org/citation.cfm?id=2998201)
1. Classification and Detection of Micro-Level
Impact of Issue-Focused Films
based on Reviews
Rezvaneh (Shadi) Rezapour
Jana Diesner
The 20th ACM Conference on
Computer-Supported Cooperative Work and Social Computing (CSCW)
February 25 – March 1, 2017
Modified for Slideshow
2. Introduction
• What is Social Impact?
• Any cognitive, behavioral, emotional or cultural change in a person or a group
of people (Latané, 1981).
• The source, strength and immediacy of impact can lead to different outcomes
in different social agents (Latané, 1981).
3. Different Aspects of Impact
• Macro
• Society
• Legislation
• Meso
• Corporate
• Institutions
• Micro
• Individuals
5. Data:
Data Collection
• Types of Review
• Extrinsically motivated
• Experts and critics
• Intrinsically motivated
• Volunteers, laymen
• Data Collection
• From Amazon (with their permission)
• Collected 2,290 reviews on eight selected documentary films:
“Fed Up,” “This Changes Everything,” “Pray the Devil Back to Hell,” “Through a Lens Darkly,”
“Pandora’s Promise,” “Solar Mamas,” “The House I Live in,” and “Pay to Play.”
6. Data:
Codebook Development and Annotation Schema
• Reviewed prior work from media studies and psychology
• Randomly selected a small sample of reviews from corpus for close reading
• Defined six initial impact types: “change in cognition,” “change in attitude,” “change
in emotion,” “change in behavior,” “personal opinion,” and “impersonal report”
• Conducted a three-step procedure for codebook development
• Created codebook and annotated 50 reviews
• Closely studied annotations and discussed weaknesses and shortcomings of
codebook with annotators
• Refined codebook
• Iterated through these steps (4 times) until we felt confident that the labels
comprehensively covered different types of impact reflected in the data
and in theory
7. Data:
Labeling
• Labeled reviews on sentence level
• Trained two annotators
• Annotated around 900 reviews
• Cross annotated 10% of the reviews
• Designed three check points to get feedback from annotators, resolve
any issues, and check if annotators still have good understanding of
task and codebook
• Inter-coder reliability:
• Primary stage: 45%
• After resolving misunderstandings and confusions: 97%
9. Dealing with Imbalanced Data
• Oversampling for classes with small numbers of instances
• Synthetic Minority Over-Sampling Technique (SMOTE)
• Instances are synthetically created using k nearest neighborhoods
• Over-sampled using a range between 100 to 500% was chosen using k=5 nearest
neighbor
• Undersampling for large classes
• Random Undersampling with the ratio of 9:1 to reduce the size of large
classes
10. Classification
• Classifiers:
• Support Vector Machine (SVM)
• Random Forest (RF)
• Naïve Bayes (NB)
• 10-fold cross validation
• Attribute Selection:
• Information Gain
𝐼𝑛𝑓𝑜𝐺𝑎𝑖𝑛 𝐶𝑙𝑎𝑠𝑠, 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 = 𝐻 𝐶𝑙𝑎𝑠𝑠 − 𝐻 𝐶𝑙𝑎𝑠𝑠 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒
• Standard metrics for prediction
• precision, recall, and F-score with 𝛽 = 1
11. Result:
Classification
Features
SVM Random Forest Naïve Bayes
P R F1 P R F1 P R F1
Lexical
Unigram (Baseline) 53.3 46.4 47.3 63.8 61.0 61.3 50.9 49.2 49.3
Unigram+Bigram 57.4 51.2 52.5 67.4 64.7 65.0 55.2 53.1 53.1
Unigram+Bigram+Trigram 57.3 51.5 52.7 67.7 65.2 65.3 56.1 54.4 54.3
Lexical + Psychological 71.0 70.6 70.6 80.2 79.2 79.5 55.2 52.8 52.5
Lexical + Linguistic 72.7 72.5 72.5 81.4 80.8 81.1 64.4 64.1 63.0
Lexical + Psychological +
Linguistic
73.0 73.1 73.0 80.5 79.9 80.2 58.6 56.9 56.4
12. Result:
Class Distribution
• 20% of sentences feature emotional, cognitive, and behavioral impact
(change or reaffirmation)
• 6% do not contain any impact types
• 51% contains general opinions and 20% provide summaries
• Ratio of “intention to change” is higher than “change in behavior”
• Suggests that people are more prone to plan to change their course of action
or way of thinking than actually implement these changes.
13. Conclusion
• We found that:
• Reviews do contain different types of impact.
• Information products have the capability to change people’s conception
about an issue, and can be associated with changes in attitudes toward
societal problems.
• Why is it important?
• Shows the potential impact of documentary films and highlights the
importance of assessing impact (important for sponsoring organizations and
filmmakers).
14. Conclusion
• This study is useful for documentary sponsors and producers because
it is helpful for gaining a more detailed and comprehensive
understanding of citizen engagement with issue-focused films.
• It broadens our understanding about individuals’ interactions with
online communities, political movements, their surrounding society,
and their everyday life.
• The impact codebook can be used in different areas of studies to
capture micro-level impact
• The designed codebook bring a new level of research to the current
study in the area of review mining.
15. • Acknowledgment:
This work was supported by the FORD Foundation, grant 0155- 0370, and by a faculty fellowship from the National Center
of Supercomputing Applications (NCSA) at UIUC. We are grateful to Amazon for giving us permission to collect reviews
from their website. We thank Professor Corina Roxana Girju from the Linguistics department at UIUC for her helpful
insights and advise for developing the codebook. We also thank Ming Jiang, Sandra Franco, Harathi Korrapati, and Julian
Chin from the iSchool at Illinois for their help with this paper.
• Related Papers:
• http://context.ischool.illinois.edu/publications.php
• Contact:
Rezvaneh Rezapour (rezapou2@Illinois.edu), PhD student
Jana Diesner (jdiesner@Illinois.edu), Assistant Professor
School of Information Sciences
University of Illinois at Urbana-Champaign