Biases that emerge in Social Media Research. Talk presented at the NoBias EU project. Inspired by Olteanou et al. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries (2019)
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Biases in Social Media Research (NoBias EU project)
1. 1
NoBias onboarding week March 2021
Biases in Social Media Research
Presenting: Miriam Fernandez
@miriam_fs
fernandezmiriam
@miriamfs
2. 2
NoBias onboarding week March 2021
2 Before we start…
• 1.- This is an online talk…
– Hope you took the necessary precautions!
– PJs are allowed and higlighy recommended J
• 2.- It is an overview of biases and problems in
social media research
– If you were expecting something very complex this may not
be for you J
• 3.- I hate talking alone for long periods of time
– So get ready for questions and discussions at any point!
4. 4
NoBias onboarding week March 2021
Internet Users: 59.5% of the World’s Population
Source: https://datareportal.com/reports/digital-2021-global-overview-report
5. 5
NoBias onboarding week March 2021
Social Media are 53.6% of the Global Population
Source: https://datareportal.com/reports/digital-2021-global-overview-report
6. 6
NoBias onboarding week March 2021
After TV users concentrate most of their internet time
in using Social Media
Source: https://datareportal.com/reports/digital-2021-global-overview-report
7. 7
NoBias onboarding week March 2021
The World’s Most-Used Social Platforms
Source: https://datareportal.com/reports/digital-2021-global-overview-report
8. 8
NoBias onboarding week March 2021
8 Yes. If are not on Tick-Tock (like me), you are too old!
Image from :https://www.telegraph.co.uk/women/life/cant-one-bewildered-tiktok/
10. 10
NoBias onboarding week March 2021
10
AI & Social Media
AI -> for Social Media
Social Media -> for AI
11. 11
NoBias onboarding week March 2021
AI for Social Media
• Recommender Systems /
Personalisation Systems
– Suggest Information
– Suggest User Connections
– Suggest Events
– Suggest Products
• Search/Ranking Systems
– Provide Information
– Personalise Information
• NLP Systems
– ‘Understand’ text
– Extract knowledge
• Image Processing Systems
Social Media for AI
• Understand phenomena at scale
– Business/brand monitoring
– Political reactions
– Marketing
Decision Making
– Policy making
– Employability
• Address societal challenges
– Misinformation
– Hate
– Radicalisation
– Disaster management
– Child grooming
– Climate Change
13. 13
NoBias onboarding week March 2021
RS are affected by
popularity and
homogeneity
biases
Bellogin, P. Castells, I. Cantador,
Statistical biases in information retrieval
metrics for recommender systems,
Information Retrieval Journal 20 (6)
(2017) 606–634.[
D. Jannach, L. Lerche, I. Kamehkhosh,
M. Jugovac, What recommenders
recommend: an analysis of
recommendation biases and possible
countermeasures, User Modeling
andUser-Adapted Interaction 25 (5)
(2015) 427–491
14. 14
NoBias onboarding week March 2021
Search/Ranking Biases
Search biases may influence
+ The local business that are found
+ The products that are bought
+ The candidates that are hired
+ The events that are attended
+ The dating / affective success
+ ….
Castillo, Carlos. "Fairness and transparency in ranking." ACM SIGIR Forum. Vol. 52. No. 2. New
York, NY, USA: ACM, 2019
15. 15
NoBias onboarding week March 2021
Personalisation and Filtering
• What social media algorithms show and to whom?
lack of transparency and accountability
17. 17
NoBias onboarding week March 2021
Targeting Societal Challenges by Analysing Social Media
Data
17
Social
Phenomena
18. 18
NoBias onboarding week March 2021
Many studies seem to assume that social media
data, the methods used for its analysis, and the AI
applications created on top or it, are adequate,—
with little or no scrutiny
Olteanu, Alexandra, et al. "Social data: Biases, methodological pitfalls, and ethical
boundaries." Frontiers in Big Data 2 (2019): 13.
21. 21
NoBias onboarding week March 2021
Population Biases
• Differences in demographics or other user characteristics between
a population of users represented in a dataset or platform and a
target population.
• E.g., can we really use social media to inform Policy Making? To
whom are we listening?
22. 22
NoBias onboarding week March 2021
Population Biases
Can we investigate misinformation
spreading among senior citizens by
looking at tick-tock data?
https://shorensteincenter.org/information
-disorder-framework-for-research-and-
policymaking/
As a Policy Maker, can I really
understand issues affecting my
constituency if I don’t have geo-
located data?
23. 23
NoBias onboarding week March 2021
Behavioural Biases
• Differences in user behaviour
across platforms or contexts, or
across users represented in
different datasets.
• Participatory Budgeting
Platforms. If users from certain
ideologies / socio-economical
backgrounds are more active
making and voting on proposals
-> urban inequality
Ivan Cantador, Maria E. Cortes-Cediel, Miriam Fernandez.
Exploiting Open Data to analyze discussion and controversy
in online citizen participation. Information Processing and
Management 2020.
24. 24
NoBias onboarding week March 2021
Content Production Biases
• Behavioral biases that are expressed as lexical, syntactic, semantic, and
structural differences in the contents generated by users.
• English vs. Other languages. Higher amount of research / tools
produced for the English language -> unequal opportunity, particularly
for users of underrepresented languages. We also know and
understand less about the needs of those populations.
25. 25
NoBias onboarding week March 2021
Linking Biases
• Behavioral biases that are expressed as differences in the
attributes of networks obtained from user connections,
interactions or activity.
Is this network really
representative of the
general US
population?
Careful with
homophily!
26. 26
NoBias onboarding week March 2021
Temporal Biases
• Differences in populations or behaviours over time.
• Classifiers or models developed today, may be biased to the
entities (Persons, Organisations, Geographical locations, …)
discussed today, and be ineffective to categorise / classify / filter
/predict tomorrow’s content
Missinformation Detection Systems
trained with data from 2020
28. 28
NoBias onboarding week March 2021
Data Source
• “garbage in – garbage out” A system that receives the wrong data
often gets the wrong conclusions (e.g., misrepresentation of the
population)
• E.g., understand user behaviour (e.g., citizen participation,
misinformation spreading across users, etc.) without filtering
accounts from organisations, media outlets, bots, etc.
Psss, I’m not a human!
29. 29
NoBias onboarding week March 2021
Data Collection / Verification
• Biases introduced due to the selection of data sources, or by the
way in which data from these sources are acquired and prepared
30. 30
NoBias onboarding week March 2021
• Fernandez, Miriam and Alani, Harith (2019). Artificial Intelligence and Online Extremism: Challenges and
Opportunities. In: McDaniel, John L.M. and Pease, Ken eds. Predictive Policing and Artificial Intelligence.
Taylor & Francis. http://oro.open.ac.uk/69799/1/Fernandez_Alani_final_pdf.pdf
31. 31
NoBias onboarding week March 2021
Data Processing: definitions
• Biases introduced by data processing operations such as cleaning,
enrichment, annotation, and aggregation.
• Same messages are
judged very differently
in different parts of the
world
• Highlights differences
and potential bias
Not-Radical
Radical
Tie
NA: North America
SA: South America
ME: Middle East
AS: Asia
EU: Europe
AF: Africa
32. 32
NoBias onboarding week March 2021
Data Processing: disagreements
Mensio, Martino and Alani, Harith (2019). News Source Credibility in the Eyes of Different
Assessors. In: Conference for Truth and Trust Online, 4-5 Oct 2019, London, UK, (In Press)
http://oro.open.ac.uk/62771/1/TTO2019_credibility.pdf
33. 33
NoBias onboarding week March 2021
Data Processing: data gaps
• Do not account for data gaps/ data imbalances
• To produce systems that are equally
secure to protect different groups against
online hate we need to account for
differences about how such hate is
manifested across groups
Farrell Tracie, Fernandez Miriam, Novotny Jakub and Alani Harith (2019). Exploring
Misogyny across the Manosphere in Reddit. 10th ACM Conference on Web Science
34. 34
NoBias onboarding week March 2021
Data Analyses / Usage
• Studying social phenomena without a control group, or with a
“wrong” control group
vs. vs.
Radicalisation Detection Algorithm
Radical User:
uses
radicalisation
terminology
“General User”:
talks about cats
and other things
Researchers, media agencies,
journalists, political figures,
religious non-radical individuals:
use radicalisation terminology
Radical Non Radical Radical
False Positives
35. 35
NoBias onboarding week March 2021
Data Analyses / Usage
• Lack of robustness against over-time changes
Tweets
Conceptual.
Semantics.
Extraction
DBpedia
Semantic.Graph.
Representation
Frequent.Semantic.
Subgraph.Mining
Classifier.Training
Pipeline of detecting pro-ISIS stances using semantic sub-graph mining-based feature extraction
Extract and use the semantic interdependencies and relations between
ISIS
Syria
Jihadist Group
Country
(Military Intervention Against ISIL, place, Syria)
Entities Concepts Semantic Relations
Saif, Hassan, et al. "On the role of semantics for detecting pro-isis stances on social media."
36. 36
NoBias onboarding week March 2021
Data Analyses / Usage
Lack of robustness against new
types of events
Burel, Grégoire, et al. "On semantics
and deep learning for event detection
in crisis situations." (2017).
37. 37
NoBias onboarding week March 2021
Data Analysis/Usage
Network & propagation patterns
Information source
Content Text/images/videos
Context Lists of
misleading sites
specific features
(hashtags, mentions)
Misinformation?
Partial view of the problem/
available data
Fernandez, Miriam, and Harith
Alani. "Online misinformation:
Challenges and future
directions." The Web Conference
2018. 2018.
38. 38
NoBias onboarding week March 2021
Data Analysis / Usage
• Bias towards certain research fields/methodologies
• Historical/contextual approaches
• Rich description of communities
• Qualitative attempts to characterise
the phenomena
• Exacerbating factors, both social and
technological
• Impacts on society and culture
• Small number of researchers/ data
• Mostly qualitative
• Observational studies
• Automatic detection and
categorisation
• Preference for certain
platforms
• Less attention to sociology/
psychology models and domain
knowledge
• Bias to time snapshots
39. 39
NoBias onboarding week March 2021
Roots of Radicalisation & Radicalisation Influence
Micro or
Individual roots
Macro or
Global roots
Meso or
Group roots
= Radicalisation
Influence
Fernandez, Miriam, Moizzah Asif, and Harith
Alani. "Understanding the roots of
radicalisation on twitter." Proceedings of the
10th ACM Conference on Web Science. 2018.
40. 40
NoBias onboarding week March 2021
Olson, L. N., Daggs, J. L., Ellevold, B. L.
and Rogers, T. K. K. (2007),
Entrapping the Innocent: Toward a
Theory of Child Sexual Predators’
Luring Communication.
Communication Theory, 17
Child Grooming
Grooming Trust
Development
Physical
Approach
other
Yes
No
No
No
Cano, Amparo et al "Detecting child grooming behaviour patterns on social media.” 2014
41. 41
NoBias onboarding week March 2021
Data Analysis/Usage
• Bias towards the obtained results (classification performance is
not always enough, particularly when humans are involved!)
Simply presenting people with corrective information is likely to fail in changing
their salient beliefs and opinions, or may, even, reinforce them
Provide an
explanation
rather than a
simple refute
Expose the user
to related but
disconfirming
stories
Revealing the
demographic
similarity of the
opposing group
Expose the
users to “small
doses” of
misinformation
Combatting
misinformation
Facts
Early detection of
malicious accounts
Use of ranking and
selection strategies
based on corrective
information
42. 42
NoBias onboarding week March 2021
Data Presentation/Explanation of Results
• Bias towards expert users
43. 43
NoBias onboarding week March 2021
Evaluation and Interpretation
• The choice of metrics shapes a research study
– Even if a metric indicates good overall performance on a classification task, it is
hard to know what that implies, as errors may be concentrated in one
particular class or group of classes
• False positives and false negatives should not always weight the
same!
• Negative results are often overlooked
• Big problems with data sharing and reproducibility
44. 44
NoBias onboarding week March 2021
44
Biases on Social Media Research
Miriam Fernandez
Knowledge Media Institute
Open University, UK
@miriam_fs
@miriamfs
Credit to all these fantastic people!