This instalment of the (un)data Seminar Series on Outrageous Questions will discuss topics related to studying political extremism and monitoring global development through social media analysis. Dr. Weber will explain the methods and analysis behind three case studies of political analysis and global development: first, how one can use Twitter to understand the antecedents of ISIS support by building a classifier that, in retrospect, predicts if a Twitter user will oppose or support ISIS. The features that are predictive (or not) of ISIS support help to understand potential motivations. Second, a methodology for monitoring political polarization and show how increases in this polarization measure tended to precede outbreaks in Egypt. Third, how publicly accessible advertising data from Facebook can de repurposed to monitor migration and track internet access gender gaps around the globe. The overarching goal is to illustrate how despite challenges around lack of representativeness, social media can provide useful signals to study current affairs.
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
Not so-obvious social media analysis to study current affairs
1. Not-So-Obvious Social
Media Analysis to
Study Current Affairs
Hosted by DPPA-DPO as part of “(un)data Seminar
Series on Outrageous Questions”
@ingmarweber
May 6, 2020
2. QATAR COMPUTING RESEARCH INSTITUTE
Main Question of this Talk
Can social media provide useful signals to study current affairs?
Yes, but ...
“Proof by example” through four case studies:
1: Understanding ISIS Support
2: Tracking political tension in Egypt
3: Monitoring international migration
4: Tracking digital gender gaps
3. QATAR COMPUTING RESEARCH INSTITUTE
#FailedRevolutions: Using Twitter to Study
the Antecedents of ISIS Support
By Walid Magdy, Kareem Darwish, Ingmar Weber
Published in First Monday, 2016
01
4. User’s first reference to ISIS
#IslamicState something else #IslamicState againDown with the tyrants!
Pre-ISIS Period Post-ISIS Period
What were ISIS supporters tweeting in the pre-ISIS period?
How does it differ from what ISIS opponents were tweeting?
Twitter as a Time Machine to Look Into the Past
5. #داعش ... #ISIS ... الدولةاإلسالمية # … #IslamicState
In 93% of tweets, using the long form
(“Islamic State”) indicates support for ISIS.
In 77% of tweets, using the short form
(“ISIS”) indicates opposition to ISIS.
How to Tell Current ISIS Support from Opposition?
6. QATAR COMPUTING RESEARCH INSTITUTE
Data Collection
Final data set
57K active Twitter accounts with 10+ ISIS-related tweets
(123 million tweets in total)
• 46K opposing ISIS (#ISIS)
• 11K supporting ISIS (#IslamicState)
Build classifiers to tell the two groups apart using (i) only
the pre-ISIS tweets and (ii) only the post-ISIS tweets.
Predicts #ISIS vs. #IslamicState with ~90% accuracy
7. QATAR COMPUTING RESEARCH INSTITUTE
Discriminative Hashtags
Most discriminative tokens differentiating between ISIS supporters (left) and ISIS
opponents (right) for the pre-ISIS period (top) and the post-ISIS period (bottom)
8. QATAR COMPUTING RESEARCH INSTITUTE
Closer Look at “Predictive” (in Retrospect) Hashtags
Pro-ISIS Anti-ISIS
Before #انتخبوا_العرص(# electThePimp)
# اعتصام_بريده(# BuraidaProtest)
#libya (#Libya)
# مرسي(# Morsy)
# ال_سعود(# FamilyOfSaud)
# الربيع_العراقي(# IraqiSpring)
#feb17 (#Feb17 -- launch of Libyan revolution against
Ghaddafi)
# الشعب_يقول_كلمته(# thePeopleSayTheirWord)
# جبهه_النصره(# AnnusraFront)
# حسم(# Hasm -- an Egyptian anti-government armed group)
# مسيره_كرامه_وطن(# nationalDignityMarch)
#kwu89 (#Kuwait)
# رابعه_العدويه(# Rabae -- site of anti-coup protest)
→ Most top discriminating hashtags refers to revolutions and
opposition to dictatorial regimes in different Arabic countries
#معرض_ابوظبي_للصيد_والفروسيه
(# AbuDhabiHuntingAndEquestrianExpo)
# تحيا_مصر(# longLiveEgypt -- slogan of pro-coup
camp)
# الحفاظ_علي_مواردنا_سلوك_وطني
(# preservingOurResourcesIsAPatrioticBehavior)
# غزه(# Gaza)
# غزه_تحت_القصف(# GazaUnderShelling)
→ More general, with some density from UAE,
supporters of military coup in Egypt, and general
support for Gaza
9. QATAR COMPUTING RESEARCH INSTITUTE
Secular vs. Islamist polarization in Egypt on
Twitter
By Ingmar Weber, Kiran Garimella, Alaa Batayneh
Published in ASONAM, 2013
02
11. QATAR COMPUTING RESEARCH INSTITUTE
Use Retweets to Identify Likely Supporters
Get ~7,000 users retweeting a seed users at least once
Label users fractionally according to retweeted camp
12. QATAR COMPUTING RESEARCH INSTITUTE
Retweeting = Endorsement?
Asked two judges to label 100 users
Judges labeled 38% as “unknown” – not easy!
For the non-“unknown” labels …
77% agreement of inferred label with judges
80% inter-judge agreement
Noisy at individual level
Strong signal at aggregate level
More cross-ideological retweets than for the US
15. QATAR COMPUTING RESEARCH INSTITUTE
From Users to URLs
aljazeera.net (Arabic) .61i aljazeera.com (English) .63s
16. QATAR COMPUTING RESEARCH INSTITUTE
A Hashtag Barometer?
For a given week, average the “extremism” score across all hashtags
2*|0.5-0.948|= .896
facebook 0.537s
2*|0.5-0.537|= .074
a - Assailants with rocks and firebombs gather outside Ministry of Defence to call for an end to military rule.
b - Demonstrations and clashes break out after President Morsi grants himself increased power to protect the nation.
c,d - Continuing protests after the November 22nd declaration.
e - Demonstrations in Tahrir square, Port Said and all across the country.
f,g - Demonstrations at Tahrir square.
17. QATAR COMPUTING RESEARCH INSTITUTE
Monitoring International Migration
With Emilio Zagheni, Kiran Garimella, and many others
See publication list at https://ingmarweber.de/publications/ for details
03
22. QATAR COMPUTING RESEARCH INSTITUTE
Bias Reduction via Model-Fitting
Mean out-of-sample absolute percentage error 37%,
down from 56% without origin-age bias correction
Adjusted R^2 = .70
Does not use GDP, language, internet penetration, …
z = age-gender group
i = country of birth
j = US state of residence
24. QATAR COMPUTING RESEARCH INSTITUTE
Validation w/ (Few) Available Data
Registro de Administrativo de
Migrantes Venezolanos (RAMV)
- Jun, 2018
Facebook - Jun, 2018
Kendall's τ = .71 (n=31)
25. QATAR COMPUTING RESEARCH INSTITUTE
Previously Unavailable Estimates
Brazil - Facebook. Feb 2019 Peru - Facebook. Feb 2019 Ecuador - Facebook. Feb 2019
28. QATAR COMPUTING RESEARCH INSTITUTE
Case Study: Syrian Refugees in Lebanon
For 6 governorates and 770 cities
720k FB users “lives abroad” + AR
3.5M FB users overall
Only 157 cities with > 1000 FB users
Strong gender bias
29. QATAR COMPUTING RESEARCH INSTITUTE
OS-Type Predictive of Poverty?
Predicting: % below poverty line
Model variable CV performance (LOOCV)
R^2
% iOS device users 0.895
% high-end phones
(iphones/galaxy)
users
0.678
30. QATAR COMPUTING RESEARCH INSTITUTE
Using Facebook Ad Data to Track the Global Digital
Gender Gap
By Masoomali Fatehki, Ridhi Kashyap, Ingmar Weber
Published in World Development, 2018
04
37. QATAR COMPUTING RESEARCH INSTITUTE
Data on Politics from Twitter
+ Relatively easy to collect through open APIs
+ Both real-time and historic data are available
+ Supports individual-level longitudinal studies
+ Population opinion shifts across time can be meaningful
- Bots and coordinated action are a risk (less so in 2013/2015)
- Selection bias and “silent majority”
- Demographic information needs to be inferred
- Public in the sense that conversations on a bus are public
- Dealing with multi-lingual social media is messy
38. QATAR COMPUTING RESEARCH INSTITUTE
Advertising Audience Estimates
+ Facebook, LinkedIn, Twitter, Snapchat, Google, ...
+ Real-time estimates
+ Uses anonymous and aggregate data
+ Gender, age, location, country of origin, ….
- Black box on how attributes are inferred
- Needs modeling for bias correction
- Usage patterns change over time
- No historic data available
- Risk of misuse and population harm
39. QATAR COMPUTING RESEARCH INSTITUTE
Existing DPPA-QCRI Collaboration
DPPA E-Analytics & Innovation Course
QCRI, Dec 10-12, 2019
Diplomatic Pulse (under development)
Search engine over diplomatic statements
40. QATAR COMPUTING RESEARCH INSTITUTE
Thanks!
iweber@hbku.edu.qa
@ingmarweber
https://ingmarweber.de/publications/
https://www.slideshare.net/IngmarWeber