Biases in Social Media Research (NoBias EU project)

1
NoBias onboarding week March 2021
Biases in Social Media Research
Presenting: Miriam Fernandez
@miriam_fs
fernandezmiriam
@miriamfs

2
2 Before we start…
• 1.- This is an online talk…
– Hope you took the necessary precautions!
– PJs are allowed and higlighy recommended J
• 2.- It is an overview of biases and problems in
social media research
– If you were expecting something very complex this may not
be for you J
• 3.- I hate talking alone for long periods of time
– So get ready for questions and discussions at any point!

3
3
Understanding Social Media

4
Internet Users: 59.5% of the World’s Population
Source: https://datareportal.com/reports/digital-2021-global-overview-report

5
Social Media are 53.6% of the Global Population

6
After TV users concentrate most of their internet time
in using Social Media

7
The World’s Most-Used Social Platforms

8
8 Yes. If are not on Tick-Tock (like me), you are too old!
Image from :https://www.telegraph.co.uk/women/life/cant-one-bewildered-tiktok/

9
Full of Challenges

10
10
AI & Social Media
AI -> for Social Media
Social Media -> for AI

11
AI for Social Media
• Recommender Systems /
Personalisation Systems
– Suggest Information
– Suggest User Connections
– Suggest Events
– Suggest Products
• Search/Ranking Systems
– Provide Information
– Personalise Information
• NLP Systems
– ‘Understand’ text
– Extract knowledge
• Image Processing Systems
Social Media for AI
• Understand phenomena at scale
– Business/brand monitoring
– Political reactions
– Marketing
Decision Making
– Policy making
– Employability
• Address societal challenges
– Misinformation
– Hate
– Radicalisation
– Disaster management
– Child grooming
– Climate Change

12
12
AI -> for Social Media
(not the focus of this talk but..)

13
RS are affected by
popularity and
homogeneity
biases
Bellogin, P. Castells, I. Cantador,
Statistical biases in information retrieval
metrics for recommender systems,
Information Retrieval Journal 20 (6)
(2017) 606–634.[
D. Jannach, L. Lerche, I. Kamehkhosh,
M. Jugovac, What recommenders
recommend: an analysis of
recommendation biases and possible
countermeasures, User Modeling
andUser-Adapted Interaction 25 (5)
(2015) 427–491

14
Search/Ranking Biases
Search biases may influence
+ The local business that are found
+ The products that are bought
+ The candidates that are hired
+ The events that are attended
+ The dating / affective success
+ ….
Castillo, Carlos. "Fairness and transparency in ranking." ACM SIGIR Forum. Vol. 52. No. 2. New
York, NY, USA: ACM, 2019

15
Personalisation and Filtering
• What social media algorithms show and to whom?
lack of transparency and accountability

16
16
Social Media -> for AI

17
Targeting Societal Challenges by Analysing Social Media
Data
17
Social
Phenomena

18
Many studies seem to assume that social media
data, the methods used for its analysis, and the AI
applications created on top or it, are adequate,—
with little or no scrutiny
Olteanu, Alexandra, et al. "Social data: Biases, methodological pitfalls, and ethical
boundaries." Frontiers in Big Data 2 (2019): 13.

19
Issues when working with Social Data

20
20
Types of Biases

21
Population Biases
• Differences in demographics or other user characteristics between
a population of users represented in a dataset or platform and a
target population.
• E.g., can we really use social media to inform Policy Making? To
whom are we listening?

22
Population Biases
Can we investigate misinformation
spreading among senior citizens by
looking at tick-tock data?
https://shorensteincenter.org/information
-disorder-framework-for-research-and-
policymaking/
As a Policy Maker, can I really
understand issues affecting my
constituency if I don’t have geo-
located data?

23
Behavioural Biases
• Differences in user behaviour
across platforms or contexts, or
across users represented in
different datasets.
• Participatory Budgeting
Platforms. If users from certain
ideologies / socio-economical
backgrounds are more active
making and voting on proposals
-> urban inequality
Ivan Cantador, Maria E. Cortes-Cediel, Miriam Fernandez.
Exploiting Open Data to analyze discussion and controversy
in online citizen participation. Information Processing and
Management 2020.

24
Content Production Biases
• Behavioral biases that are expressed as lexical, syntactic, semantic, and
structural differences in the contents generated by users.
• English vs. Other languages. Higher amount of research / tools
produced for the English language -> unequal opportunity, particularly
for users of underrepresented languages. We also know and
understand less about the needs of those populations.

25
Linking Biases
• Behavioral biases that are expressed as differences in the
attributes of networks obtained from user connections,
interactions or activity.
Is this network really
representative of the
general US
population?
Careful with
homophily!

26
Temporal Biases
• Differences in populations or behaviours over time.
• Classifiers or models developed today, may be biased to the
entities (Persons, Organisations, Geographical locations, …)
discussed today, and be ineffective to categorise / classify / filter
/predict tomorrow’s content
Missinformation Detection Systems
trained with data from 2020

27
27
Sources of Biases

28
Data Source
• “garbage in – garbage out” A system that receives the wrong data
often gets the wrong conclusions (e.g., misrepresentation of the
population)
• E.g., understand user behaviour (e.g., citizen participation,
misinformation spreading across users, etc.) without filtering
accounts from organisations, media outlets, bots, etc.
Psss, I’m not a human!

29
Data Collection / Verification
• Biases introduced due to the selection of data sources, or by the
way in which data from these sources are acquired and prepared

30
• Fernandez, Miriam and Alani, Harith (2019). Artificial Intelligence and Online Extremism: Challenges and
Opportunities. In: McDaniel, John L.M. and Pease, Ken eds. Predictive Policing and Artificial Intelligence.
Taylor & Francis. http://oro.open.ac.uk/69799/1/Fernandez_Alani_final_pdf.pdf

31
Data Processing: definitions
• Biases introduced by data processing operations such as cleaning,
enrichment, annotation, and aggregation.
• Same messages are
judged very differently
in different parts of the
world
• Highlights differences
and potential bias
Not-Radical
Radical
Tie
NA: North America
SA: South America
ME: Middle East
AS: Asia
EU: Europe
AF: Africa

32
Data Processing: disagreements
Mensio, Martino and Alani, Harith (2019). News Source Credibility in the Eyes of Different
Assessors. In: Conference for Truth and Trust Online, 4-5 Oct 2019, London, UK, (In Press)
http://oro.open.ac.uk/62771/1/TTO2019_credibility.pdf

33
Data Processing: data gaps
• Do not account for data gaps/ data imbalances
• To produce systems that are equally
secure to protect different groups against
online hate we need to account for
differences about how such hate is
manifested across groups
Farrell Tracie, Fernandez Miriam, Novotny Jakub and Alani Harith (2019). Exploring
Misogyny across the Manosphere in Reddit. 10th ACM Conference on Web Science

34
Data Analyses / Usage
• Studying social phenomena without a control group, or with a
“wrong” control group
vs. vs.
Radicalisation Detection Algorithm
Radical User:
uses
radicalisation
terminology
“General User”:
talks about cats
and other things
Researchers, media agencies,
journalists, political figures,
religious non-radical individuals:
use radicalisation terminology
Radical Non Radical Radical
False Positives

35
• Lack of robustness against over-time changes
Tweets
Conceptual.
Semantics.
Extraction
DBpedia
Semantic.Graph.
Representation
Frequent.Semantic.
Subgraph.Mining
Classifier.Training
Pipeline of detecting pro-ISIS stances using semantic sub-graph mining-based feature extraction
Extract and use the semantic interdependencies and relations between
ISIS
Syria
Jihadist Group
Country
(Military Intervention Against ISIL, place, Syria)
Entities Concepts Semantic Relations
Saif, Hassan, et al. "On the role of semantics for detecting pro-isis stances on social media."

36
Lack of robustness against new
types of events
Burel, Grégoire, et al. "On semantics
and deep learning for event detection
in crisis situations." (2017).

37
Data Analysis/Usage
Network & propagation patterns
Information source
Content Text/images/videos
Context Lists of
misleading sites
specific features
(hashtags, mentions)
Misinformation?
Partial view of the problem/
available data
Fernandez, Miriam, and Harith
Alani. "Online misinformation:
Challenges and future
directions." The Web Conference
2018. 2018.

38
Data Analysis / Usage
• Bias towards certain research fields/methodologies
• Historical/contextual approaches
• Rich description of communities
• Qualitative attempts to characterise
the phenomena
• Exacerbating factors, both social and
technological
• Impacts on society and culture
• Small number of researchers/ data
• Mostly qualitative
• Observational studies
• Automatic detection and
categorisation
• Preference for certain
platforms
• Less attention to sociology/
psychology models and domain
knowledge
• Bias to time snapshots

39
Roots of Radicalisation & Radicalisation Influence
Micro or
Individual roots
Macro or
Global roots
Meso or
Group roots
= Radicalisation
Influence
Fernandez, Miriam, Moizzah Asif, and Harith
Alani. "Understanding the roots of
radicalisation on twitter." Proceedings of the
10th ACM Conference on Web Science. 2018.

40
Olson, L. N., Daggs, J. L., Ellevold, B. L.
and Rogers, T. K. K. (2007),
Entrapping the Innocent: Toward a
Theory of Child Sexual Predators’
Luring Communication.
Communication Theory, 17
Child Grooming
Grooming Trust
Development
Physical
Approach
other
Yes
No
No
No
Cano, Amparo et al "Detecting child grooming behaviour patterns on social media.” 2014

41
Data Analysis/Usage
• Bias towards the obtained results (classification performance is
not always enough, particularly when humans are involved!)
Simply presenting people with corrective information is likely to fail in changing
their salient beliefs and opinions, or may, even, reinforce them
Provide an
explanation
rather than a
simple refute
Expose the user
to related but
disconfirming
stories
Revealing the
demographic
similarity of the
opposing group
Expose the
users to “small
doses” of
misinformation
Combatting
misinformation
Facts
Early detection of
malicious accounts
Use of ranking and
selection strategies
based on corrective
information

42
Data Presentation/Explanation of Results
• Bias towards expert users

43
Evaluation and Interpretation
• The choice of metrics shapes a research study
– Even if a metric indicates good overall performance on a classification task, it is
hard to know what that implies, as errors may be concentrated in one
particular class or group of classes
• False positives and false negatives should not always weight the
same!
• Negative results are often overlooked
• Big problems with data sharing and reproducibility

44
44
Biases on Social Media Research
Miriam Fernandez
Knowledge Media Institute
Open University, UK
@miriam_fs
@miriamfs
Credit to all these fantastic people!

Biases in Social Media Research (NoBias EU project)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Biases in Social Media Research (NoBias EU project)

Similar to Biases in Social Media Research (NoBias EU project) (20)

More from Miriam Fernandez

More from Miriam Fernandez (15)

Recently uploaded

Recently uploaded (20)

Biases in Social Media Research (NoBias EU project)