Webinar hosted by Georgetown Global Human Development Program. Recording of the session is available at https://georgetown.zoom.us/rec/play/vMElIrir-mg3HYWWtwSDVP4rW461Jqis2iAZ8qUEyErnBiQGNFqlb7tEM-ofDv5GgHLYljjfYoBR0852?continueMode=true&_x_zm_rtaid=8UtYz48MSICS36P_gsSTDg.1588250263912.a1e6098b19d99104d94a3a1063c22f70&_x_zm_rhtaid=503
Original eventbrite at https://www.eventbrite.com/e/monitoring-migration-using-social-media-data-an-introduction-tickets-102687830064
6. Data Collection
• Used Twitter streaming API filter for geo-tagged tweets
from OECD countries
• Pick 3,000 users per country, get their tweets
• Estimate out-migration and oversample countries
where migration is rare
• Get data for ~500K users
• Activity thresholding: 3+ tweets in four-months
windows, May 2011->April 2013
• Left with ~15K users -> Small!
9. Results
(Soft) Validation: Ireland out-migration rate grew by 2.2% 2011 -> 2012, more than most
countries (Irish Central Statistics Office)
Mexico also sees a reduction in out-migration (Pew Research Center)
10. Another Data Collection
• Same seed user set as before
• Collected more recent tweets
• Dropped conditions on coverage
• ~62k users with at least some tweets in the US
12. The Duration-Interval Interplay
Plot of estimated migration rate as a function of
interval and duration length. Rates were estimated
fixing July 1st 2012 as the starting point.
14. Beyond Origin-Destination Migration Analysis
• I’m a German citizen living in Qatar. So did I
migrate from Germany to Qatar?
• Yes, according to Qatari border control.
• But: Germany (78->99), United Kingdom (99->03),
• Germany (03->07), Switzerland (07->09),
• Spain (09->12), Qatar (12->now)
• Use the “places lived” on Google+
• In 2012, no “currently”, just set of places
• Get tuples of co-lived countries
16. Expected Cluster Frequencies
• Lots of migrant flows on (A,B), (A,C) and (B,C) =>
expect lots on (A,B,C)
• “Expect” = rank clusters according to:
• min(freqAB; freqAC; freqBC) * mean(freqAB;
freqAC; freqBC)
• Best performing ranking approximation (Kendall
.565, Spearman .754)
• Look at outliers and try to explain those
17. Outlier Frequencies
• Look at “expected
rank – actual rank”
• Middle 20%: “close to
expected”
• Top 20%: “higher
than expected”
• Low 20%: “lower than
expected”
18. Feature Analysis
More than expected:
• (Spain, France, Italy)
• (UAE, India, Singapore)
Less than expected:
• (Brazil, Mexico, USA)
• (Canada, China, UK)
Most discriminative features for 3-class distinction
24. Bias Reduction via Model-Fitting
Mean out-of-sample absolute percentage error 37%,
down from 56% without origin-age bias correction
Adjusted R^2 = .70
Does not use GDP, language, internet penetration, …
z = age-gender group
i = country of birth
j = US state of residence
25. Venezuelan Exodus: Motivation
• Large outpouring of migrants and refugees
– Mostly into Colombia and neighboring countries
• Lack of reliable survey data on the crisis
– Irregular migrants
– Fear of persecution
• Goal: Help improve humanitarian response
– Detect temporal trends with low latency
– Insights into spatial distribution
– Insights into socio-economic status
27. Validation w/ (Few) Available Data
Registro de Administrativo de
Migrantes Venezolanos (RAMV)
- Jun, 2018
Facebook - Jun, 2018
Kendall's τ = .71 (n=31)
32. Syrian Refugees in Lebanon
For 6 governorates and 770 cities
720k FB users “lives abroad” + AR
3.5M FB users overall
Only 157 cities with > 1000 FB users
Strong gender bias
33. OS-Type Predictive of Poverty?
Predicting: % below poverty line
Model variable CV performance (LOOCV)
R^2
% iOS device users 0.895
% high-end phones
(iphones/galaxy)
users
0.678
34. Do Refugees Share German Interests?
What interests to consider? Everybody likes “Music” and “Technology”.
How to interpret the score? High/low compared to European migrants?
Germans in DEU
FB Interests:
Football (90%)
Max Planck (70%)
Sauerkraut (40%)
…
Arabs in MENA
FB Interests:
Quran (80%)
Ibn Al-Haytham (60%)
Falafel (60%)
…
Arabs in DEU
FB Interests:
?
35. Obtaining an Assimilation Score
Migrant Group Assim. Score
Austrian migrants .900
Spanish migrants .864
French migrants .803
Turkish-speaking migrants .746
Arabic-speaking migrants .643
A: Women, non-uni, 45-64 .461
A: Men, uni, 18-24 .677
• Experimental methodology: take with a ton, not just a grain of salt
• Needs to be validated externally
• Goals include finding “bridging” interests/patterns
• Importantly: should people assimilate?
37. Studied in X, Lives in Y
• Compile a list of all universities for European
countries
• Query number of LinkedIn users who studied
in country X who now live in country Y
• Disaggregate by gender, age, industry, …
42. Advertising Audience Estimates
+ Facebook, LinkedIn, Twitter, Snapchat, Google, ...
+ Real-time estimates
+ Uses anonymous and aggregate data
+ Gender, age, location, country of origin, ….
- Black box on how attributes are inferred
- Needs modeling for bias correction
- Usage patterns change over time
- No historic data available
- Risk of misuse
43. Selected Ongoing Work
• Using Twitter to Predict Social Network
Integration for International Migrants
– w/ Elise Wang Sonne at UN University and others
• Studying Inter-Generational Integration of
Hispanics in the US Through FB Surveys
– w/ Andre Grow at MPI Demographics and others