SlideShare a Scribd company logo
||
Cristina  Kadar,  Raquel  Rosés,  Irena  Pletikosa
PhD candidate – Information  Management,  D-­MTEC,  ETH  Zurich
www.im.ethz.ch/people/ckadar
@tweeting_cris
#  spatial/social data science,  #  urban  computing,  #  information systems
Measuring  ambient  population  from  location-­based  social  networks  (LBSN)
to  describe  urban  crime  
||
Social  disorganization  theory:  ecological  attributes  of  the  neighborhood
15.09.2017 2
Shaw,  Sampson,  ...:  Social  disorganization  theory  
Census  data:
static  &  obsolete
Resident  Population
||
Eyes  on  the  street  and  criminogenic  places:  population  dynamics  in  the  neighborhood
15.09.2017 3
LBSN  data:  
location,  time,  context
Jane  Jacobs:  Eyes  on  the  street
Branthingham &  Branthingham:  
Crime  attractors  &  crime  generators
Ambient  Population
|| 15.09.2017 4
Previous  work  in  the  Computational  Social  Science  and  Data  Mining  communities  using  
human dynamics  data  for  crime  modelling
§ Simple  correlations between  crime  rates  and  some  diversity  metrics  from  mobile  phone  data  [Traunmueller
et  al.,  SocInfo ’14]
§ Using  machine  learning  techniques  to  predict  short-­term  crime  using  features  crafted  from:  
§ Twitter  data  [Gerber,  Decision  Support  Systems  ‘14]
§ mobile  phone  data  [Bogomolov et  al.,  ICMI  ’14]
§ POI  and  taxi  data  [Wang  et  al.,  KDD’16]  
Spatial  econometrics  models  that  produce  a  multivariate,  yet  interpretable  model
Compare  &  contrast  the  (statistically  significant!)  contribution  of  the  resident  and  ambient  population
Use  census  and  Foursquare  data
||
2011                                                                                                                                                                                  2015
neighborhood (=  census  tract)
15.09.2017 5
Total  crime counts in  N  =  2167  neighborhoods
Long-­term  crime  in  New  York  City  is  analyzed  at  neighborhood  level
|| 15.09.2017 6
Craft  suitable  counterparts  of  the  established  resident  population  metrics  when  using  
LBSN  proxies  for  the  ambient  population!
Resident Population Ambient Population
Control area area
Density population venues/check-­ins
Diversity racial-­ethnic,  income-­based activity-­based
Risk stable population
vacant  &  rented  households
activity hot  spots
Social  Disorganization  Theory  (Sampson,  …)   Eyes  on  the  street  (Jacobs)
Crime  attractors  &  generators  (Brantingham^2)
||
Global  Moran’s  I = 0.56 (***)
15.09.2017 7
The  distribution  of  crime  across  New  York  City  exhibits  spatial  auto-­correlation,  so  we  opt  
for  a  spatial  econometrics  model
Spatial  Lag  Model
C -­ crime  counts  in  a  census  tract
A -­ census  tract’s  area
W -­ spatial weight  matrix  (Queen)
RP -­ variables  describing  the  resident population  
AP -­ variables  describing  the  ambient population
||
Control Area −0.12 (∗∗∗)  
Population
Density &  
Diversity
Population  count +0.50 (∗∗∗)  
Racial-­ethnic equitability index +0.14 (∗∗∗)  
Income  equitability index −0.10 (∗∗∗)  
Neighborhood
Stability
Fraction vacant households +0.13 (∗∗∗)  
Fraction rented households +0.55 (∗∗∗)  
Fraction stable population −0.12 (∗∗∗)  
Venues
Density &  
Diversity
Venues +0.59 (∗∗∗)  
Venues equitability index +0.25  (∗∗∗)  
Checkins Checkins vs.  population local quotiens +0.18  (∗∗∗)  
Correlations of neighborhood variables  and  crime
Pearson  correlation coefficient with the log-­transformed crime counts (independent variables  Box-­Cox  transformed &  standardized)
8
||
Control Area +0.02  (∗∗∗)  
Population
Density &  
Diversity
Population  count +0.09 (∗∗∗)  
Racial-­ethnic equitability index +0.01
Income  equitability index −0.07  (∗∗∗)  
Neighborhood
Stability
Fraction vacant households +0.04 (∗∗∗)  
Fraction rented households +0.18 (∗∗∗)  
Fraction stable population +0.05 (∗∗∗)  
Venues
Density &  
Diversity
Venues +0.46 (∗∗∗)  
Venues equitability index -­0.04  (∗∗∗)  
Checkins Checkins vs.  population local quotiens -­0.19  (∗∗∗)  
Spatial regression results
Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed)
Model Pseudo R2
Census only 0.44
LBSN  only 0.47
Census +  
LBSN
0.56
9
Constant +3.82 (∗∗∗)  
Spatial lag +0.31 (∗∗∗)  
||
In  general,  the  sign  and  relative  size  of  the  coefficients  stays  similar  across  the  the  different  
crime  types,  with  some  notable  exceptions
The  model  performs  best  for  grand  larcenies,  and  worst  for  robberies
10
Negative  coef.
Positive  coef.
||
Limitations  and  others  things  to  think  about
● NOT  causality  – observational  study,  not  controlled  experiment!
● Problem  definition  – dependent  variable  is  aggregated  over  a  long  time  period,  which  is  not  an  actionable  
insights  for  police  patrolling,  but  more  for  urban  planning
○ Mitigation:  construct  a  spatio-­temporal  prediction  model  for  short-­term  crime  description/prediction
● Generalization – study  analyses  only  data  from  one  particular  city  
○ Mitigations:  test  on  completely  new  data,  compare  and  contrast  different  cities  and  countries
● Bias  in  the  data  – Foursquare  data  exhibit  geographical  and  social  biases
○ Mitigation:  For  now:  NYC  is  the  most  active  city  on  Foursquare,  When  moving  to  new  geographies:  bias  needs  to  be  quantified?
11
||
Take-­aways
● The  novel  factors  are  significantly  related  to  the  long-­term  crime  levels  in  an  area
● The  novel  factors  and  the  geographical  influence  improve  the  baseline  models  based  on  census  factors
● Support  for  Jacob’s  Eyes  on  the  Street  theory  and  Brantingham^2    criminogenic  places  theory
12
+
www.im.ethz.ch/people/ckadar
@tweeting_cris
Thank you!  Questions or remarks?
||
Keep  the  variables  count  low  – rely  heavily  on  aggregate  metrics!
(counts,  equitability  indexes,  fractions,  local  quotients)
Equitability  indexes
○ Intuition:  lower  values  indicate  the  relative  abundance  of  a  given  activity,  while  higher  values  indicate  
equi-­probability  of  all  activities  (professional,  food,  resident,  commuting,  etc.)!
Local  quotients
○ Intuition:  neighborhoods  with  LQ  >>  1  can  be  regarded  as  (digital)  hot  spots!
14
||
Control Area +0.10  (∗∗∗)  
Population
Density &  
Diversity
Population  count +0.24 (∗∗∗)  
Racial-­ethnic equitability index +0.03  (∗∗∗)
Income  equitability index −0.02  
Neighborhood
Stability
Fraction vacant households +0.09 (∗∗∗)  
Fraction rented households +0.20 (∗∗∗)  
Fraction stable population −0.02  
Spatial regression results – census only
Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed)
Model Pseudo R2
Census only 0.44
15
Constant +2.50 (∗∗∗)  
Spatial lag +0.55 (∗∗∗)  
||
Control Area +0.02  (∗∗∗)  
Population
Density &  
Diversity
Population  count +0.09 (∗∗∗)  
Racial-­ethnic equitability index +0.01
Income  equitability index −0.07  (∗∗∗)  
Neighborhood
Stability
Fraction vacant households +0.04 (∗∗∗)  
Fraction rented households +0.18 (∗∗∗)  
Fraction stable population +0.05 (∗∗∗)  
Venues
Density &  
Diversity
Venues +0.46 (∗∗∗)  
Venues equitability index -­0.04  (∗∗∗)  
Checkins Checkins vs.  population local quotiens -­0.19  (∗∗∗)  
Spatial regression results – all  
Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed)
Model Pseudo R2
Census +  
LBSN
0.56
16
Constant +3.82 (∗∗∗)  
Spatial lag +0.31 (∗∗∗)  
||
Control Area -­0.05  (∗∗∗)  
Venues
Density &  
Diversity
Venues +0.63 (∗∗∗)  
Venues equitability index 0.00
Checkins Checkins vs.  population local quotiens -­0.33  (∗∗∗)  
Spatial regression results – LBSN  only
Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed)
Model Pseudo R2
LBSN  only 0.47
17
Constant +3.79 (∗∗∗)  
Spatial lag +0.32 (∗∗∗)  

More Related Content

Similar to Measuring ambient population from location-based social networks to describe urban crime

Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
Beniamino Murgante
 
Modeling Project
Modeling ProjectModeling Project
Modeling Project
Dion Rosete
 
Modeling Project
Modeling ProjectModeling Project
Modeling Project
Dion Rosete
 
Borruso Iccsa 2008
Borruso Iccsa 2008Borruso Iccsa 2008
Borruso Iccsa 2008
Beniamino Murgante
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Two Sigma
 
The Spatial Distribution of Population in 50 World Cities
The Spatial Distribution of Population in 50 World CitiesThe Spatial Distribution of Population in 50 World Cities
The Spatial Distribution of Population in 50 World Cities
Penn Institute for Urban Research
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
Parang Saraf
 

Similar to Measuring ambient population from location-based social networks to describe urban crime (7)

Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
Am I Safe in My Home? Fear of Crime Analyzed with Spatial Statistics Methods ...
 
Modeling Project
Modeling ProjectModeling Project
Modeling Project
 
Modeling Project
Modeling ProjectModeling Project
Modeling Project
 
Borruso Iccsa 2008
Borruso Iccsa 2008Borruso Iccsa 2008
Borruso Iccsa 2008
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
The Spatial Distribution of Population in 50 World Cities
The Spatial Distribution of Population in 50 World CitiesThe Spatial Distribution of Population in 50 World Cities
The Spatial Distribution of Population in 50 World Cities
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
 

Recently uploaded

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 

Recently uploaded (20)

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 

Measuring ambient population from location-based social networks to describe urban crime

  • 1. || Cristina  Kadar,  Raquel  Rosés,  Irena  Pletikosa PhD candidate – Information  Management,  D-­MTEC,  ETH  Zurich www.im.ethz.ch/people/ckadar @tweeting_cris #  spatial/social data science,  #  urban  computing,  #  information systems Measuring  ambient  population  from  location-­based  social  networks  (LBSN) to  describe  urban  crime  
  • 2. || Social  disorganization  theory:  ecological  attributes  of  the  neighborhood 15.09.2017 2 Shaw,  Sampson,  ...:  Social  disorganization  theory   Census  data: static  &  obsolete Resident  Population
  • 3. || Eyes  on  the  street  and  criminogenic  places:  population  dynamics  in  the  neighborhood 15.09.2017 3 LBSN  data:   location,  time,  context Jane  Jacobs:  Eyes  on  the  street Branthingham &  Branthingham:   Crime  attractors  &  crime  generators Ambient  Population
  • 4. || 15.09.2017 4 Previous  work  in  the  Computational  Social  Science  and  Data  Mining  communities  using   human dynamics  data  for  crime  modelling § Simple  correlations between  crime  rates  and  some  diversity  metrics  from  mobile  phone  data  [Traunmueller et  al.,  SocInfo ’14] § Using  machine  learning  techniques  to  predict  short-­term  crime  using  features  crafted  from:   § Twitter  data  [Gerber,  Decision  Support  Systems  ‘14] § mobile  phone  data  [Bogomolov et  al.,  ICMI  ’14] § POI  and  taxi  data  [Wang  et  al.,  KDD’16]   Spatial  econometrics  models  that  produce  a  multivariate,  yet  interpretable  model Compare  &  contrast  the  (statistically  significant!)  contribution  of  the  resident  and  ambient  population Use  census  and  Foursquare  data
  • 5. || 2011                                                                                                                                                                                  2015 neighborhood (=  census  tract) 15.09.2017 5 Total  crime counts in  N  =  2167  neighborhoods Long-­term  crime  in  New  York  City  is  analyzed  at  neighborhood  level
  • 6. || 15.09.2017 6 Craft  suitable  counterparts  of  the  established  resident  population  metrics  when  using   LBSN  proxies  for  the  ambient  population! Resident Population Ambient Population Control area area Density population venues/check-­ins Diversity racial-­ethnic,  income-­based activity-­based Risk stable population vacant  &  rented  households activity hot  spots Social  Disorganization  Theory  (Sampson,  …)   Eyes  on  the  street  (Jacobs) Crime  attractors  &  generators  (Brantingham^2)
  • 7. || Global  Moran’s  I = 0.56 (***) 15.09.2017 7 The  distribution  of  crime  across  New  York  City  exhibits  spatial  auto-­correlation,  so  we  opt   for  a  spatial  econometrics  model Spatial  Lag  Model C -­ crime  counts  in  a  census  tract A -­ census  tract’s  area W -­ spatial weight  matrix  (Queen) RP -­ variables  describing  the  resident population   AP -­ variables  describing  the  ambient population
  • 8. || Control Area −0.12 (∗∗∗)   Population Density &   Diversity Population  count +0.50 (∗∗∗)   Racial-­ethnic equitability index +0.14 (∗∗∗)   Income  equitability index −0.10 (∗∗∗)   Neighborhood Stability Fraction vacant households +0.13 (∗∗∗)   Fraction rented households +0.55 (∗∗∗)   Fraction stable population −0.12 (∗∗∗)   Venues Density &   Diversity Venues +0.59 (∗∗∗)   Venues equitability index +0.25  (∗∗∗)   Checkins Checkins vs.  population local quotiens +0.18  (∗∗∗)   Correlations of neighborhood variables  and  crime Pearson  correlation coefficient with the log-­transformed crime counts (independent variables  Box-­Cox  transformed &  standardized) 8
  • 9. || Control Area +0.02  (∗∗∗)   Population Density &   Diversity Population  count +0.09 (∗∗∗)   Racial-­ethnic equitability index +0.01 Income  equitability index −0.07  (∗∗∗)   Neighborhood Stability Fraction vacant households +0.04 (∗∗∗)   Fraction rented households +0.18 (∗∗∗)   Fraction stable population +0.05 (∗∗∗)   Venues Density &   Diversity Venues +0.46 (∗∗∗)   Venues equitability index -­0.04  (∗∗∗)   Checkins Checkins vs.  population local quotiens -­0.19  (∗∗∗)   Spatial regression results Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed) Model Pseudo R2 Census only 0.44 LBSN  only 0.47 Census +   LBSN 0.56 9 Constant +3.82 (∗∗∗)   Spatial lag +0.31 (∗∗∗)  
  • 10. || In  general,  the  sign  and  relative  size  of  the  coefficients  stays  similar  across  the  the  different   crime  types,  with  some  notable  exceptions The  model  performs  best  for  grand  larcenies,  and  worst  for  robberies 10 Negative  coef. Positive  coef.
  • 11. || Limitations  and  others  things  to  think  about ● NOT  causality  – observational  study,  not  controlled  experiment! ● Problem  definition  – dependent  variable  is  aggregated  over  a  long  time  period,  which  is  not  an  actionable   insights  for  police  patrolling,  but  more  for  urban  planning ○ Mitigation:  construct  a  spatio-­temporal  prediction  model  for  short-­term  crime  description/prediction ● Generalization – study  analyses  only  data  from  one  particular  city   ○ Mitigations:  test  on  completely  new  data,  compare  and  contrast  different  cities  and  countries ● Bias  in  the  data  – Foursquare  data  exhibit  geographical  and  social  biases ○ Mitigation:  For  now:  NYC  is  the  most  active  city  on  Foursquare,  When  moving  to  new  geographies:  bias  needs  to  be  quantified? 11
  • 12. || Take-­aways ● The  novel  factors  are  significantly  related  to  the  long-­term  crime  levels  in  an  area ● The  novel  factors  and  the  geographical  influence  improve  the  baseline  models  based  on  census  factors ● Support  for  Jacob’s  Eyes  on  the  Street  theory  and  Brantingham^2    criminogenic  places  theory 12 +
  • 14. || Keep  the  variables  count  low  – rely  heavily  on  aggregate  metrics! (counts,  equitability  indexes,  fractions,  local  quotients) Equitability  indexes ○ Intuition:  lower  values  indicate  the  relative  abundance  of  a  given  activity,  while  higher  values  indicate   equi-­probability  of  all  activities  (professional,  food,  resident,  commuting,  etc.)! Local  quotients ○ Intuition:  neighborhoods  with  LQ  >>  1  can  be  regarded  as  (digital)  hot  spots! 14
  • 15. || Control Area +0.10  (∗∗∗)   Population Density &   Diversity Population  count +0.24 (∗∗∗)   Racial-­ethnic equitability index +0.03  (∗∗∗) Income  equitability index −0.02   Neighborhood Stability Fraction vacant households +0.09 (∗∗∗)   Fraction rented households +0.20 (∗∗∗)   Fraction stable population −0.02   Spatial regression results – census only Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed) Model Pseudo R2 Census only 0.44 15 Constant +2.50 (∗∗∗)   Spatial lag +0.55 (∗∗∗)  
  • 16. || Control Area +0.02  (∗∗∗)   Population Density &   Diversity Population  count +0.09 (∗∗∗)   Racial-­ethnic equitability index +0.01 Income  equitability index −0.07  (∗∗∗)   Neighborhood Stability Fraction vacant households +0.04 (∗∗∗)   Fraction rented households +0.18 (∗∗∗)   Fraction stable population +0.05 (∗∗∗)   Venues Density &   Diversity Venues +0.46 (∗∗∗)   Venues equitability index -­0.04  (∗∗∗)   Checkins Checkins vs.  population local quotiens -­0.19  (∗∗∗)   Spatial regression results – all   Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed) Model Pseudo R2 Census +   LBSN 0.56 16 Constant +3.82 (∗∗∗)   Spatial lag +0.31 (∗∗∗)  
  • 17. || Control Area -­0.05  (∗∗∗)   Venues Density &   Diversity Venues +0.63 (∗∗∗)   Venues equitability index 0.00 Checkins Checkins vs.  population local quotiens -­0.33  (∗∗∗)   Spatial regression results – LBSN  only Spatial Lag  Model  results (independent variables  Box-­Cox  transformed and standardidized,  dependent variable  log-­transformed) Model Pseudo R2 LBSN  only 0.47 17 Constant +3.79 (∗∗∗)   Spatial lag +0.32 (∗∗∗)