SlideShare a Scribd company logo
1 of 1
Download to read offline
In 2009, the City of Chicago led the nation in homicides, beating out New York with only
one third of the population. Desperate to improve, the Chicago Police Department (CPD)
turned to funds available through the National Institute of Justice and secured two million
dollars to pursue experimental methods for integrating technology. In the years since, CPD
has fully embraced algorithmic analysis as a means of informing policing decisions. Chief
among these are predictive capabilities that suggest likely crime hot spots or potential
criminals. The nature of this problem is nontrivial because there exist significant
differences in crime rates among neighborhoods of the city, and the distribution of the
crime rate also depends on the crime types.
Our team is interested in predicting crime rates in various locations in Chicago, as well as
optimizing the distribution of the police force for future crime prevention.
Crime data from Chicago police department’s reporting system from 2001 to 2015
Fit Parameters
Optimize Allocation of Police
Select subset: narcotics data
Motivation
Methods and Framework
Data Exploration
Crime Geometry Prediction and Police Force Optimization in Chicago
Tian Lan, Arjun Sanghvi, Yaxiong (Jason) Cai
tianlan@g.harvard.edu, asanghvi@g.harvard.edu, yaxiongcai@g.harvard.edu
AM 207: Stochastic Optimization Ÿ Spring 2015
Optimal Allocation of Police Force
Bayesian Model Analysis
Conclusions
Define Bayesian Model
Parameter Fitting
MCMC Nelder-Mead
Expectation Maximization
for Gaussian Mixture Model
K-Means
•  Crime type, time, and location are publicly available for 5.5 million crimes in
the City of Chicago from 2001 to last week (continuously updated)
•  Out of 33 different crime types, narcotics crimes exhibited particularly
interesting clustering characteristics
•  Note that the shape of the distributions looks approximately like a mixture of
Gaussians
•  We refer to the locations of the modes as northwest (NW) and southeast (SE)
•  Given limited computational resources, we randomly selected 5000 samples of
narcotics crimes in 2013 for this analysis
We propose a Bayesian model for formalizing the probability distribution of narcotics crimes.
•  We tried two methods to fit the model parameters:
1.  Markov Chain Monte Carlo (MCMC)
2.  Nelder-Mead Optimization
1.  Markov Chain Monte Carlo (MCMC)
•  Metropolis algorithm to sample 11 parameters from the posterior
distribution
•  Proposal function: normal distribution with tuned step sizes
•  Component-wise update
•  Burn-in of 200 samples and a thinning factor of 15
•  Assessment of convergence
•  Calculated each parameter as the mean of its trace
•  Parameters were then used to draw 5000 samples from the posterior distribution
•  Comparison of the distribution of true longitude and latitude, the distributions from the
posterior using the initial parameter values, and the distributions from the posterior
using the parameters found by MCMC
2. Nelder-Mead Optimization
•  Explored optimization techniques for maximization of the posterior
•  Settled on the Nelder-Mead method for stability with given posterior
•  The comparison between true longitude/latitude distributions, the
distributions with initial parameters, and the distributions with parameters
after optimization:
•  Samples mimic the characteristics of the data
•  Outperformed the initial starting point
•  Results are slightly more consistent with the data as compared to MCMC
Prior =
1
∑NW ∑SE
Likelihood = wN xi | µNW ,∑NW( )+ (1− w)N xi | µSE ,∑SE( )
i=1
N
∏
Inverse of covariance to induce a preference for more concentrated clusters
GMM as informed from data
Visualizations of the sampling process are shown here for the parameter “Longitude West Mean”:
•  Parameterized a Bayesian model of narcotics crimes in Chicago by using a Gaussian
mixture assumption
•  Both the Metropolis algorithm and Nelder-Mead method successfully converged
and generated samples that captured key characteristics of the data set
•  Implemented two clustering algorithms to identify the optimal distribution of police
stations and police force allocation across stations
•  Average distance from stations to crimes is minimized under K-Means
•  Potential improvement over current police station locations
•  Practical problem: where should police stations be located?
•  How should the police force be allocated across stations?
•  Approach: clustering
1.  Expectation maximization of a Gaussian mixture model
2.  Hard K-Means
Log Posterior ∝ ln wN xi | µNW ,∑NW( )+ (1− w)N xi | µSE ,∑SE( )( )
i=1
N
∑ − ln ∑NW( )− ln ∑SE( )
K-­‐Means	
   GMM	
   Current	
  Loca1ons	
  
Train:	
  sampled	
  data	
  
Test:	
  2013	
  data	
  
1.311	
   1.656	
   1.449	
  
Train:	
  2013	
  data	
  
Test:	
  2014	
  data	
  
0.966	
   1.173	
   1.38	
  
Comparison of
methods: average
distance to crime
(miles)
K-Means Optimized Decision Boundary
School of Engineering and Applied Sciences • Institute for Applied Computational Science"

More Related Content

Viewers also liked

SMi Group's 3rd annual Meter Asset Management 2016 conference
SMi Group's 3rd annual Meter Asset Management 2016 conferenceSMi Group's 3rd annual Meter Asset Management 2016 conference
SMi Group's 3rd annual Meter Asset Management 2016 conferenceDale Butler
 
Grupo 1 diapositivas de informatica
Grupo 1 diapositivas de informaticaGrupo 1 diapositivas de informatica
Grupo 1 diapositivas de informaticaJarianna Perero
 
Fall-Winter Education Series Catalog final
Fall-Winter Education Series Catalog finalFall-Winter Education Series Catalog final
Fall-Winter Education Series Catalog finalbarnesjohn
 
Elementos comunes
Elementos comunesElementos comunes
Elementos comunesluistuitise
 
Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Bitmovin Inc
 
El cerebro humano y los procesos cognitivos
El cerebro humano y los procesos cognitivosEl cerebro humano y los procesos cognitivos
El cerebro humano y los procesos cognitivosgretix
 
Graphic Approach to Outlining Internet Addiction
Graphic Approach to Outlining Internet AddictionGraphic Approach to Outlining Internet Addiction
Graphic Approach to Outlining Internet AddictionRhesa Riley
 
Presentation - MA181 - Final
Presentation - MA181 - FinalPresentation - MA181 - Final
Presentation - MA181 - FinalAbraham Bedada
 

Viewers also liked (15)

SMi Group's 3rd annual Meter Asset Management 2016 conference
SMi Group's 3rd annual Meter Asset Management 2016 conferenceSMi Group's 3rd annual Meter Asset Management 2016 conference
SMi Group's 3rd annual Meter Asset Management 2016 conference
 
Grupo 1 diapositivas de informatica
Grupo 1 diapositivas de informaticaGrupo 1 diapositivas de informatica
Grupo 1 diapositivas de informatica
 
Fall-Winter Education Series Catalog final
Fall-Winter Education Series Catalog finalFall-Winter Education Series Catalog final
Fall-Winter Education Series Catalog final
 
SH(FRONT PAGE)
SH(FRONT PAGE)SH(FRONT PAGE)
SH(FRONT PAGE)
 
Elementos comunes
Elementos comunesElementos comunes
Elementos comunes
 
Pec
PecPec
Pec
 
SH(BACK PAGE)
SH(BACK PAGE)SH(BACK PAGE)
SH(BACK PAGE)
 
Jessenia revelo paredes
Jessenia revelo paredesJessenia revelo paredes
Jessenia revelo paredes
 
Savinon, Claudia
Savinon, ClaudiaSavinon, Claudia
Savinon, Claudia
 
LULU ADH PS TRAINING
LULU ADH PS TRAININGLULU ADH PS TRAINING
LULU ADH PS TRAINING
 
TEMA 1
TEMA 1TEMA 1
TEMA 1
 
Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?
 
El cerebro humano y los procesos cognitivos
El cerebro humano y los procesos cognitivosEl cerebro humano y los procesos cognitivos
El cerebro humano y los procesos cognitivos
 
Graphic Approach to Outlining Internet Addiction
Graphic Approach to Outlining Internet AddictionGraphic Approach to Outlining Internet Addiction
Graphic Approach to Outlining Internet Addiction
 
Presentation - MA181 - Final
Presentation - MA181 - FinalPresentation - MA181 - Final
Presentation - MA181 - Final
 

Similar to AM 207_Poster Final

Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 
Crime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoCrime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoStuti Deshpande
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Tarun Amarnath
 
San Francisco Crime Prediction Report
San Francisco Crime Prediction ReportSan Francisco Crime Prediction Report
San Francisco Crime Prediction ReportRohit Dandona
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningAnavadya Shibu
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSameer Darekar
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysisijdmtaiir
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysisijdmtaiir
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
V1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docV1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docpraveena06
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGCRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGIRJET Journal
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...ijaia
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...gerogepatton
 
crime rate pridicition using k-means.pdf
crime rate pridicition using k-means.pdfcrime rate pridicition using k-means.pdf
crime rate pridicition using k-means.pdfsaiKrishnaReddy558028
 
Predicting the Crimes in Chicago
Predicting the Crimes in Chicago Predicting the Crimes in Chicago
Predicting the Crimes in Chicago Swati Arora
 

Similar to AM 207_Poster Final (20)

PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Crime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoCrime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of Chicago
 
Bs4301396400
Bs4301396400Bs4301396400
Bs4301396400
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
 
San Francisco Crime Prediction Report
San Francisco Crime Prediction ReportSan Francisco Crime Prediction Report
San Francisco Crime Prediction Report
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data Mining
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contest
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysis
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysis
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
V1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docV1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.doc
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGCRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
 
crime rate pridicition using k-means.pdf
crime rate pridicition using k-means.pdfcrime rate pridicition using k-means.pdf
crime rate pridicition using k-means.pdf
 
Predicting the Crimes in Chicago
Predicting the Crimes in Chicago Predicting the Crimes in Chicago
Predicting the Crimes in Chicago
 

AM 207_Poster Final

  • 1. In 2009, the City of Chicago led the nation in homicides, beating out New York with only one third of the population. Desperate to improve, the Chicago Police Department (CPD) turned to funds available through the National Institute of Justice and secured two million dollars to pursue experimental methods for integrating technology. In the years since, CPD has fully embraced algorithmic analysis as a means of informing policing decisions. Chief among these are predictive capabilities that suggest likely crime hot spots or potential criminals. The nature of this problem is nontrivial because there exist significant differences in crime rates among neighborhoods of the city, and the distribution of the crime rate also depends on the crime types. Our team is interested in predicting crime rates in various locations in Chicago, as well as optimizing the distribution of the police force for future crime prevention. Crime data from Chicago police department’s reporting system from 2001 to 2015 Fit Parameters Optimize Allocation of Police Select subset: narcotics data Motivation Methods and Framework Data Exploration Crime Geometry Prediction and Police Force Optimization in Chicago Tian Lan, Arjun Sanghvi, Yaxiong (Jason) Cai tianlan@g.harvard.edu, asanghvi@g.harvard.edu, yaxiongcai@g.harvard.edu AM 207: Stochastic Optimization Ÿ Spring 2015 Optimal Allocation of Police Force Bayesian Model Analysis Conclusions Define Bayesian Model Parameter Fitting MCMC Nelder-Mead Expectation Maximization for Gaussian Mixture Model K-Means •  Crime type, time, and location are publicly available for 5.5 million crimes in the City of Chicago from 2001 to last week (continuously updated) •  Out of 33 different crime types, narcotics crimes exhibited particularly interesting clustering characteristics •  Note that the shape of the distributions looks approximately like a mixture of Gaussians •  We refer to the locations of the modes as northwest (NW) and southeast (SE) •  Given limited computational resources, we randomly selected 5000 samples of narcotics crimes in 2013 for this analysis We propose a Bayesian model for formalizing the probability distribution of narcotics crimes. •  We tried two methods to fit the model parameters: 1.  Markov Chain Monte Carlo (MCMC) 2.  Nelder-Mead Optimization 1.  Markov Chain Monte Carlo (MCMC) •  Metropolis algorithm to sample 11 parameters from the posterior distribution •  Proposal function: normal distribution with tuned step sizes •  Component-wise update •  Burn-in of 200 samples and a thinning factor of 15 •  Assessment of convergence •  Calculated each parameter as the mean of its trace •  Parameters were then used to draw 5000 samples from the posterior distribution •  Comparison of the distribution of true longitude and latitude, the distributions from the posterior using the initial parameter values, and the distributions from the posterior using the parameters found by MCMC 2. Nelder-Mead Optimization •  Explored optimization techniques for maximization of the posterior •  Settled on the Nelder-Mead method for stability with given posterior •  The comparison between true longitude/latitude distributions, the distributions with initial parameters, and the distributions with parameters after optimization: •  Samples mimic the characteristics of the data •  Outperformed the initial starting point •  Results are slightly more consistent with the data as compared to MCMC Prior = 1 ∑NW ∑SE Likelihood = wN xi | µNW ,∑NW( )+ (1− w)N xi | µSE ,∑SE( ) i=1 N ∏ Inverse of covariance to induce a preference for more concentrated clusters GMM as informed from data Visualizations of the sampling process are shown here for the parameter “Longitude West Mean”: •  Parameterized a Bayesian model of narcotics crimes in Chicago by using a Gaussian mixture assumption •  Both the Metropolis algorithm and Nelder-Mead method successfully converged and generated samples that captured key characteristics of the data set •  Implemented two clustering algorithms to identify the optimal distribution of police stations and police force allocation across stations •  Average distance from stations to crimes is minimized under K-Means •  Potential improvement over current police station locations •  Practical problem: where should police stations be located? •  How should the police force be allocated across stations? •  Approach: clustering 1.  Expectation maximization of a Gaussian mixture model 2.  Hard K-Means Log Posterior ∝ ln wN xi | µNW ,∑NW( )+ (1− w)N xi | µSE ,∑SE( )( ) i=1 N ∑ − ln ∑NW( )− ln ∑SE( ) K-­‐Means   GMM   Current  Loca1ons   Train:  sampled  data   Test:  2013  data   1.311   1.656   1.449   Train:  2013  data   Test:  2014  data   0.966   1.173   1.38   Comparison of methods: average distance to crime (miles) K-Means Optimized Decision Boundary School of Engineering and Applied Sciences • Institute for Applied Computational Science"