Presentation at the Annual Meeting of the Population Association of America (PAA 2019) in the Session Using Social Media in Population Research (http://paa2019.populationassociation.org/sessions/128). See https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0211350 for the full paper.
Correlated Impulses: Using Facebook Interests to Improve Predictions of Crime Rates in Urban Areas
1. Correlated Impulses
Using Facebook Interests to Improve
Predictions of Crime Rates in Urban Areas
Masoomali Fatehkia, Dan O’Brien, Ingmar Weber
@ingmarweber
PLOS ONE 14(2): e0211350, February 2019
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0211350
8. Advertising Audience Estimates
+ Global reach with over 2 billion users
+ FB, LinkedIn, Google, Snapchat, IG, ...
+ Real-time estimates
+ Uses anonymous and aggregate data
+ Gender, age, location, country of origin, ….
+ Non-traditional attributes such as interests
+ Accessible through APIs
9. Advertising Audience Estimates
- Black box on how attributes are inferred
- Usage patterns change over time
- Black boxes changes over time
- Only includes people who are online
- Somewhat coarse (was 20, now 1,000)
- Possibility to locate vulnerable populations
11. Background for Exploration
• Ecological theories for community-level variation (social
norms, economic inequalities, …)
• Individual level characteristics (self-control, risk taking, …)
• Could the spatial distribution of individual-level processes
explain the variation in crime between neighborhoods?
• Can we pick up the traits of perpetrators?
• Can we pick up the traits of victims
Exploratory study
12. Data Sources
• 9 large cities (out of 25) with “usable” crime data
• Three data sources: (i) Facebook data, (ii) ACS
2015 data, and (iii) crime incident data
• ZIP codes ≈ ZCTA, except parks/office buildings
– ZCTA codes > 10k population in ACS 2015 (65x)
– FB population < 1.5 ACS population (41x)
• Resulted in 432 ZIP codes
• Bias in which ZIP codes remain (bigger/denser)
13. Facebook Interests Data
• Online dating and relationship status
– m/f in different relationship statuses
– Planned: “open relationship” – too sparse
• Online gaming
– Action games, card games, FPS, …
• Music genres
– Hip hop, blues, electronic, country, …
• Movie genre
– Action, horror, comedy, ..
14. ACS 2015 Demographic Data
% of population aged 15-19
% of population aged 18-24
Median age
% of population one race White
% of population one race Black or African-American
% of households on food stamp benefits
Median family income
% households with income > 150K
% households with income <= 25K
% of population 18-24 with bachelors or associates degree
% of population 18-24 with less than high school degree
% of population 25+ with less than high school degree
% of population 25+ with bachelors or higher degree
RaceAgeIncomeEducation
15. Crime Rate Data
• Use geo-coded incident data
• Aggregate by ZIP codes
• Standardized cities reporting to National
Incident Based Reporting System (NIBRS)
• Compute rates per 100k (using ACS’15 pop)
• Reported crime rates, not crime itself
• Policing strategies
https://www.gimletmedia.com/reply-all/127-the-crime-machine-part-i
https://www.gimletmedia.com/reply-all/128-the-crime-machine-part-ii
17. Predictive Performance Across Age/Gender
Performance is mean absolute error (MAE) for the crime rate per 100k population
All aged 18+ always gave best CV performance
All other experiments done using only this (broad) subset
Using regularized linear regression (LASSO) with FB-only
features to identify which age/gender group is most predictive
18. Factor Analysis
• Too many features are selected (up to 23),
even with LASSO
• Group features into “factors” for sparser, more
interpretable model
• Do a factor analysis for each of (i) relationship,
(ii) music, (iii) movie, and (iv) gaming features
20. Model Performance
Demographics only Facebook only Demogr. + Facebook
Assault .639 .488 .604 .437 .656 .511
Burglary .562 .083 .601 .163 .598 .157
Robbery .558 .411 .528 .371 .581 .441
“Facebook only” not bad
“Demogr. + Facebook” is best
Adjusted R^2Marginal gain over city dummies
23. Discussion
• Based on “18+ all”: FB’s predictive power lies
less in the behaviors of particular individuals
and more in the overall behavioral ecology
• Hip hop: the only factor that remained in all
models. Indicates culture of crime or of
increased policing?
• “… correcting for demographics”: or just
unmodeled variation (re interaction terms)?
• What else?
Rock, Rap, or Reggaeton? Assessing Mexican Immigrants’
Cultural Assimilation Using Facebook Data
Today, Session 168, 1:00-2:30 PM, Brazos/206
24. See you in Doha for SocInfo’19!
Speakers include:
Francesco Billari, Emre Kiciman, Katy Börner, Yelena Mejova, Luca
Maria Aiello, Aniko Hannak, and Giovanni Luca Ciampaglia
Submit papers/abstracts by April 15, 2019
Submit tutorials/workshops by April 30, 2019