SlideShare a Scribd company logo
Crime Analysis and Predictions from historical Crime and Transportation Data∗
Valerii Klymchuk†
and Candacey Primo‡
Saint Peter’s University
2641 John F. Kennedy Boulevard, Jersey City, NJ 07306
(Dated: January 1, 2016)
Law enforcement agencies across the globe are employing a range of predictive policing methods.
Smart, effective, and proactive policing is clearly preferable to simply reacting to criminal acts.
This paper proposes a novel approach to model, visualize, and predict crime risk levels for con-
tinuous geographic domain based on demographic, historical and transportation data. The main
contribution of this approach lays in using passenger counts from major transportation hubs to
tackle a crime prediction problem.
While previous research efforts have used either background historical knowledge or offender’s
profiling, our findings show that aggregated data captured from Public Transit Infrastructure, in
combination with basic demographic information, and historical crime records can be used to predict
crime.
In our experimental results with real crime data from Bogota we obtained a set of statistics that
describe seasonality in crime patterns, which can accompany predictive machine learning models
assessing the risks of crime. Moreover, this work provides a discussion on implementation, design
and final findings, suggesting a prototype for cloud based crime analytics dashboard.
I. INTRODUCTION
The ability to predict the locations of future crime
events can serve as a valuable source of knowledge for
law enforcement, both from tactical and strategic per-
spectives. Predictive mapping holds a big promise for
identification of areas in which to focus interventions,
but it also may improve the way those interventions are
implemented, if one can examine both distributions of
past crimes and predictions of future concentrations[1].
The objective of this study was to develop a prototype
to illustrate the benefits of easy to use and, web-delivered
digital solution for visualization, analysis and exploration
of criminal activity data in space and time. A new pro-
totype was created as a web-based analytic dashboard to
illustrate the potential benefits of new interactive tools
for studying crime and transportation data provided by
city of Bogota.
This work primarily makes an attempt to investi-
gate the predictive power of people dynamics derived
from transportation data in combination with histori-
cal crime records within a place-centric and data driven
paradigm. While retrospective mapping efforts are use-
ful, the true promise of crime mapping for police lies in
its ability to identify early warning signs across time and
space, and inform a proactive problem solving and crime
prevention[1].
The current paper suggests a design for an interactive
prototype of Tableau analytic dashboard panel hosted on
a cloud computer server, equipped with an interactive
∗ A footnote to the article title
† Also at Computer and Information Sciences Department, Saint
Peter’s University.
‡ cprimo@mail.saintpeters.edu
map and a set of filtering controls, which support linear
and composite view, interactive temporal legends, and a
set of informative interactive map layers.
We believe that that proposed solution can assist
crime analysts it identifying new insights about spatio-
temporal crime patterns within urban city environment.
Preliminary finding can accompany further predictive
modeling efforts in crime prediction.
II. PROBLEM UNDERSTANDING AND
RELATED WORK
The use of statistical and geo-spatial analyses to fore-
cast crime levels has been around for decades. GIS out-
puts can be used to visually identify concentrations and
patterns and to communicate those findings. GIS has
been widely used to produce maps depicting crime ”hot
spots” as well as to conduct spatial analyses that sug-
gest relationships between crime and characteristics of
the social and physical environments in which crime con-
centrations occur[1].
In recent years, however, there has been a surge of in-
terest in analytical tools that draw on very large data sets
to make predictions in support of crime prevention[2].
Police agencies often use computer analysis of informa-
tion about past crimes, the local environment, and other
intelligence to ”predict” and prevent crime. Predictive
methods allow police to work more proactively with lim-
ited amount of resources. Improved situational awareness
at the tactical and strategic levels results, for example,
in more efficient and effective policing, capable to pre-
vent criminal activity by repeat offenders against repeat
victims.
Conventional approaches often apply human judge-
ment and start with mapping crime locations and de-
2
termining where crimes are concentrated (”hot spots”).
Such methods identify individuals at high risk of offend-
ing in the future and relate to assessing individuals risk
[2]. The most common method of ”forecasting” crime
in police departments is simply to assume that the hot
spots of yesterday are the hotspots of tomorrow
Meanwhile our approach relies rather on a new tech-
nique that adds up the number of various risk factors to
create an overall risk score. The objective of such data
driven technique is to forecast places and times with an
increased risk of crime in order to develop effective strate-
gies for police to prevent crime more proactively or make
investigation efforts more effective.
A. Why Crime is Predictable. Criminological
Justification for Predictive Policing
A notion that crime is predictable is supported by
major theories of criminal behavior, such as routine ac-
tivity theory, rational choice theory, and crime pattern
theory[2]. These theories can be consolidated into a
blended theory:
a. Criminals and victims follow common life pat-
terns; overlaps in those patterns indicate an increased
likelihood of crime.
b. Geographic and temporal features influence the
where and when of those patterns.
c. As they move within those patterns, criminals
make ”rational” decisions about whether to commit
crimes, taking into account such factors as the area, the
target’s suitability, and the risk of getting caught.
The blended theory best fits ”stranger offenses”,
such as robberies, burglaries, and thefts. It is less ap-
plicable to vice and relationship violence, both of which
involve human connections that both extend beyond
limited geographic boundaries and lead to decisions that
do not fit into traditional ”criminal rational choice”
frameworks[2].
It is more manageable for police agencies to allocate
resources to places that are most attractive to motivated
offenders and to where crime is most likely to occur given
certain characteristics of the environment.
B. Place centric crime paradigm
Studies suggest that individuals are at greater risk
to commit crime at the presence of criminogenic
opportunities[3]. Places and environments have charac-
teristics that encourage or discourage the occurrence of
crime and, therefore, raise or lower criminogenic risk.
In 2008 criminologist David Weinberg proposed switch
to switch from popular people-centric to place-centric
paradigm [4].
In contrast to previous research on the spatial distri-
bution of crime, in which explanations were sought in the
Figure 1. Subway stations depicted atop of crime intensity
base map of Bogota. Darker colors depict higher total crime
counts based on 2007 and 2013 data.
social, demographic, and economic characteristics of the
resident population, this study is mainly concerned with
examining block-to-block variations in land use.
One related study shows that the size of the resi-
dent population, the racial and ethnic composition of the
block, and the poverty level in the block group do have
additional effects beyond those of land use, but also sug-
gest that the effects of crime attractors, generators and
offender anchor points are general and apply indepen-
dently of whether they are located in residential blocks
and independently of the specific characteristics of the
people who live in the block [5]. Other spatially corre-
Figure 2. Homicide heat-map with hot-spots of predomi-
nantly concentrated in southern and some northern parts of
Bogota city.
3
Figure 3. Bus stops and police stations in parts of Bogota
city atop of street network base map.
lated factors also drive the distribution of crime incidents
at block level. The settings and times for which some fac-
tors become most relevant should also be considered by
predictive model.
The risk of crime at places that have criminogenic
attributes is sought to be higher than at other places
because these locations attract motivated offenders (are
more likely to concentrate them in close proximity) and
create a backcloth for certain events to occur. Attrac-
tors are those specific features that attract offenders to
places in which they commit crime. Generators can be
seen as greater opportunities for crime that burgeon from
increased volume of interaction occurring at these areas,
resulting in greater risk and likelihood of crime.
Various potentially relevant crime attractors and crime
generators may include, for example, bus stops, parking
places, supermarkets, warehouses, bookstores, clothing
stores, and pharmacies, as well such characteristics as
signs of physical deterioration and the lack of collective
efficacy [6]. Another study outlines the following key fac-
tors related to urban residential burglary[7]: social dis-
organization, proximity to pawn shops, proximity to bus
stops, time of the day, day of the week, proximity to
police stations, fire stations and hospitals.
Crime analytic dashboard was developed to identify
and communicate environmental attractors of crime. The
dashboard and an interactive map can be used to antic-
ipate places most suitable for illegal behavior, identify
where new crime incidents can emerge and/or cluster.
All environmental factors that determine specific vulner-
abilities are weighted by the model according to their
relative spatial influence on the outcome event, and final
map articulates the vulnerability in relative risk values
as for any given geographic location across the terrain.
1. Units of analysis
A common thread among opportunity theorists is that
the unit of analysis for ”opportunity” is a place, and that
the dynamic nature of that place constitutes opportuni-
ties for crime. The way in which features of a landscape
affect places throughout the landscape is referred to as
the concept of spatial influence. Effective predictive mod-
els need to incorporate small areas as units of analysis,
which more accurately reflect ”micro-places”, such as city
blocks, street segments, and intersections.
Not all census blocks in Bogota are on a perfect grid.
Some streets are diagonals and blocks along these streets
are triangles. Furthermore, census blocks are not only
bounded by streets but also by railways and natural bar-
riers such as rivers and lakes and the mountains. There-
fore using more detailed geographic features than census
block boundaries could provide a deeper understanding
of how spatial patterns of crime are influenced by the ge-
ography and land use. Also a face block, the two sides
of a street between two junctions, is a natural spatial
unit, not only because it is smaller than a block but also
because it is characterized by inter-visibility [8].
To model a continuous crime opportunity surface in a
geographic information system, we use a grid of equally
sized cells that comprise entire jurisdiction. Each cell rep-
resents a micro place throughout the landscape, while the
cell size is selected as a function of street segment/block
length, so that environmental backcloth is constructed
in a manner that reflects unique landscape of each ju-
risdiction and its street network on which people travel
and to which crimes are geo-coded. Half the mean block
length guided the choice of cell size in a micro cellular
grid, which is more consistent with expectations about
the continuous nature of criminogenic risk[3].
Figure 4. Spatial view of the city divided into city blocks,
coded with color.
4
2. Target variable
Choice of variables is critical to the success of the
model and must be informed by theory. Theory-based
modeling enables to identify which factors influence crime
target selection, and thus inform crime prevention efforts.
Risk of crime in this study is seen as a function of vul-
nerability within the context of other factors that carry
different weights relative one to another, therefore in the
absence of motivated offenders, the existent risk of crime
can be relatively low.
Analyzing risk terrains along with event-dependent as-
sessments is useful for generating a more complete un-
derstanding of crime problems. The study of spatial con-
centrations of crime relies upon evidence-based methods,
and do not rely on the physical or social characteristics
of individuals who enter and leave these locations, only
on the geographic factors that make these places risky.
Crime risk can be modeled as a dichotomous variable,
which is rarely or never zero. The risk posed by each
criminogenic feature is located at one or more places on a
landscape; their confluence at the same place contributes
to a risk value that, when raised a set amount, increases
the likelihood of crime.
Risk terrain modeling produces a metric for place-
based opportunity, as measured by the spatial influences
of place-based risk factors to be used for measuring vul-
nerable places. Opportunity can be measured in degrees
and changes over space and time as our perceptions about
the environments evolve; as new crimes occur; as police
intervene; or as motivated offenders and suitable targets
travel. Hotspots can be used as a proxy measure of places
where the dynamic interactions of underlying crimino-
genic factors exist or persist over time[3]. The risk values
serve as a measure of the clustering of the risk factors,
and forecast whether crime will occur or possibly cluster.
III. DATASETS
For our study we exploit datasets provided by the city
of Bogota, Colombia. Throughout the project we have
working access to the following data:
1. Crime dataset contains historical crime records
for the city of Bogota. Among other fields this dataset
contains geographic coordinates (X, Y ) for over 100k of
crimes commited between January, 2007 - March, 2013
and include such fields, as: types of crime, street address
for each of the crime scenes, along with ObjectId (ZIP -
like Code, presumably - police district, or city block) in
Bogota. Each ObjectId has its own Coordinates specified
as longitude and latitude.
2. Transportation dataset in .cvs format contains co-
ordinates for subway stations in Bogota, together with
counts of passengers entering each station every 15 min-
utes over the span of 2-3 years.
3. Bus Transit dataset lists dates, bus route numbers
with corresponding Destination (names of final stops) to
which each bus is heading, with counts of passengers.
4. Publicly accessible datasets in .csv format with co-
ordinates for Bus stops, Police Stations, Schools, Hospi-
tals, Churches in Bogota, etc.
5. Demographic dataset for the city of Bogota in .shp
format. A set of Bogota borough profiles related to home-
lessness, households, residential property sales, housing
market and societal well-being. Borough profiles datasets
contain area codes, area names, metrics about the pop-
ulation of a particular geographic area, such as statis-
tics about the population, households, demographics, mi-
grant population, ethnicity, employment, earnings, house
prices, indices of multiple deprivation, etc.
A. Data understanding
Rank ordered statistics is the most well known way of
looking at data. Figure 5 shows a frequency summaries
for 8 types of crime in descending order based on number
of counts. We see personal theft and personal injury as
being predominant types of crime in the dataset, with
over 55 and 36 thousand incidents accordingly between
years 2007 and 2013. Those are followed by burglar-
ies, commercial theft, vehicle and motorcycle theft. Also
homicide corresponds to over 5 thousand records in our
data. Type of crime is recorded as categorical variable
Delito, with 8 distinct levels, corresponding to each dis-
tinct type of crime.
Parallel coordinate plot in figure 7 represents seasonal
variability of crime counts for each year. In this chart
years are color-coded and each line represents yearly pat-
tern in total number of cases per month. Chart in figure7
demonstrates an evident deviation from regular pattern
in the years 2008 and 2012. Those two years compared
to others had most uncommon pattern due to some fac-
Figure 5. Types of crime, ordered in decreasing order based
on total counts over the period of observation.
5
Figure 6. Seasonal variability of crime counts.
tors, which could have influenced crime counts in speci-
fied time-frame. Considering, that presidential elections
in Colombia follow 4 year cycle, it might be safe to as-
sume that those patterns could have been influenced par-
tially or at large by the election campaigns taking place.
Time series chart in figure 6 provides a view of crime
counts variability over time. Stacked columns in this
chart represent monthly totals split and color-coded by
crime type. This chart exhibits a significant dip in crime
during first and second quarters of 2008, followed by spike
in third quarter; crime counts also dipped even faster be-
tween second and third quarters of 2012, and risen again
in forth quarter. Such facts indicate presence of a long
term pattern within a span of 4 years of data. Notice,
Figure 7. Yearly seasonalities for different types of crime.
Each line represents a year of data.
that this could support the hypothesis about correlation
with 4 year election cycles, mentioned before.
1. Data preparation
Following preprocessing steps were performed on data:
1. Referencing all geo-tagged data to the cells.
2. The features: for each of the variables related to the
cell the mathematical functions were computed to charac-
terize the distributions and properties of such variables,
e.g. mean, median, standard deviation, min and max
values, and Shannon entropy.
3. Bogota demographic profile feature has to be aggre-
gated to each of the cells.
Correlation measures the numerical relationship of one
variable to another and are good for identifying variables
which are related to each other. Pairs of highly corre-
lated variables indicate redundancy in the data which
should be considered, since redundant variables dont pro-
vide new information to help with prediction, and some
algorithms can be harmed numerically by variables that
are highly correlated one to another. For high correlation
values (greater than 0.95) a rule of thumb was applied:
variables were considered identical, and treated as redun-
dant variables by the model.
The final results can be very sensitive to the input
data. Thus, if there are seasonal effects in the data, such
sensitivity can lead to spurious patterns or trends, which
can make results difficult to reproduce. However, corre-
lations are linear measures, which are affected severely
by outliers and skewness. Visualization is a good idea
to verify that high correlation is real. Using parallel line
chart we can see immediately which pairs of variables are
highly correlated all in one plot.
Chart in figure 8 displays weekly patterns for different
6
types of crime in the dataset. Each line represents a to-
tal number of cases for each type of crime, based on the
day of the week. Notice, how pattern for personal theft
is different from pattern of personal injury, for example.
While personal theft counts clime up on weekdays, per-
sonal injury counts decrease; and spike on weekends for
personal injuries corresponds to decline in personal theft.
Such observations can indicate that weekly work/home
cycles largely affect individual’s vulnerability for some
types of crime.
2. Outliers and skew
Positive skew in variables results in mean values no
longer representing a typical value for the variable. Home
values, for example are reported using the median rather
than mean; because the mean gets inflated by the large
values to the right of the distribution[9]. For positively
skewed data log scale can be effective in changing the
data to see the data resolution and pattern better. Once
log transform is applied, data will be in log units instead
of original units. The first indication of positive skew is
when the mean is larger than the median. An additional
indicator is when the mode is less than the median, which
is also less than the mean.
Unsupervised learning technique - clustering was used
to identify multidimensional outlier. Clusters with rela-
tively few values in them are indicative of unusual groups
of data, and therefore may represent multidimensional
outliers. Dummy variables were created by the model to
indicate whether the interaction is an outlier, as a mem-
bership in a cluster of multidimensional outliers.
The assumption is that the outliers can distort the
model so much that they harm more than help. Therefore
we must remove outliers for linear regression, k-nearest
Figure 8. Weekly patterns of crime vary for different type of
crime, and should be considered by the predictive model.
neighbor, K-Means clustering, and PCA. It is also pos-
sible to separate the outliers and create separate models
just for outliers by relaxing the definition of an outlier
from 3 standard deviations from the mean down to 2
standard deviations from the mean. In some cases we
want to leave the outliers in the data without modifi-
cation purposely allowing model to be biased. Decision
trees, for example, are not affected by outliers[9].
Good features can make the patterns between inputs
and the target variable far more transparent. In this
work we generated a separate column dummy variable to
indicate that the value is an outlier. We also transform
the outliers so that they are no longer outliers by binning
the data; and convert the numeric variables to categorical
to mitigate the numeric affect on the algorithms.
Finally, new additional features were added to the data
as new variables created from one or more existing vari-
ables already in the data. Significant improvement in
model performance metrics can be achieved by intro-
ducing refined second order features, to allow capturing
inter-temporal dependencies of the problem and to make
the feature space more compact.
IV. METHODOLOGY
Analyzing risk terrains along with event-dependent as-
sessments is useful for generating a more complete under-
standing of crime. Empirical research demonstrates that
including a measure of environmental risk yields a better
model of future crime locations compared to predictions
made with past crime incidents alone[3]. Assuming that
every new crime incident is a potential instigator for near
repeats, priority can be given to new crimes that occur
at high risk places with other high risk places in close
proximity.
Defining vulnerable places, therefore, is a function of
the combined spatial influence of criminogenic features
throughout a landscape that contribute to crime by at-
tracting and concentrating illegal behavior. Extra spatial
factors, such as knowledge about potential victims or of-
fenders can also be considered. The steps in this work
were taken along the following road map:
1. Linking all information to the corresponding cells.
2. Data-set are randomly split into training (80% of
data) and testing (20% of data) sets. Each dimension of
the feature vector was normalized.
3. Pearson correlation analysis yields a large subset of
features with strong mutual correlations and a subset of
uncorrelated features.
4. Pipeline variable selection approach, based on fea-
ture ranking and feature subset selection is performed
using only data from the training set. Top variables-
predictors of crime are being identified by our model,
sorted by the mean reduction in accuracy.
Given the high skewness of the distribution and based
on previous research and ground-truth data about counts
7
Figure 9. Monthlly patterns for different types of crime reveal the nature of change in the total counts.
of crimes in each given sell, we split the criminal dataset
with respect to it’s median into these seven classes:
a) ”class −3”: median − 3σ ≤ N ≤ median − 2σ;
b) ”class −2”: median − 2σ ≤ N ≤ median − σ;
c) ”class −1”: median − σ ≤ N ≤ median;
d) ”class +1”: median ≤ N ≤ median + σ;
e) ”class +2”: median + σ ≤ N ≤ median + 2σ;
f) ”class +3”: median + 2σ ≤ N ≤ median + 3σ;
g) ”class 0 ”: median + 3σ ≤ N ≤ median − 3σ.
Algorithms automate most steps of risk terrain model-
ing. Clustering groups data records into similar groups,
and classification techniques assign data records to cate-
gories (commonly referred to as classes). The algorithm
tests a variety of spatial influences for every risk factor
input to identify the most grounded spatial associations
with known crime incident locations and considers only
the most appropriate risk factors with their spatial influ-
ences to produce risk terrain surface.
The exposure to personal crime in an area normally
includes both the resident and the transient population
(Andresen 2006), those who live in the area and those
who visit the area. Note that the transient population
size is not measured directly but is presumed to be incor-
porated in the effect of the other variables[10]. In other
words, crime generators generate crime by drawing an
enlarged transient population into the block. Shannon
entropy reflects the predictable structure of a place in
terms of the types of people present over the day[11].
Clustering algorithms are commonly used as part of
data exploration in order to find commonalities across
crimes [2]. For example, clustering was applied on a data
set of burglaries to find those exhibiting similar tactics.
These can be evidence of a serial burglar, or could suggest
interventions against some of the clusters. As a specific
example, Adderley and Musgrove had clustered perpetra-
tors of serious sexual assaults using self-organizing maps
[12]. SOM cluster observations with similar traits into
related groups to identify crimes committed by the same
individual because of such similarities.
The information could be fed into the hot spot mod-
els to identify the perpetrators next target, or it could be
used for geographic profiling methods to identify the per-
petrators home base. Unsupervised learning techniques
can project high dimensional data onto two dimensions.
Multidimensional scaling (MDS) algorithms, such as the
Sammon Projection, try to retain the relative distance of
data points in the 2D projection as found in higher di-
mensional space could be another option. A simpler way
is to use color, size, and shape.
The metric for feature ranking can be the mean de-
crease in the Gini coefficient of inequality. The feature
with maximum mean decrease in Gini coefficient is ex-
pected to have the maximum influence in minimizing the
out-of-the-bag error. Minimizing the out-of-the-bag er-
ror results in maximizing common performance metrics
used to evaluate models.
When there is considerable overdispersion in the data
(alpha value of 1.24), the negative binomial models might
fit the data better than a Poisson model. Therefore the-
Figure 10. Personal injury patterns by month and day of the
week sgow, that Sunday is the likeliest day for this crime.
8
oretical arguments were put forward to use (negative bi-
nomial) regression models with lagged independent vari-
ables rather than spatial error or spatial lag regression
models [5].
V. RESULTS AND EVALUATION
Place-based risk assessment with risk terrain modeling
permits real-time evaluation of the propensity for a new
crime to become an instigator for near repeats.
Some activities in adjacent blocks are correlated (i.e.,
spatial autocorrelation in the independent variable) and
are the effects of activities in adjacent focal block accord-
ing to the model’s interpretation. Thus, for example, the
presence of a gas station in the block increases robberies
in the block, but so does the presence of a gas station
in an adjacent block. Thus, the land uses that attract
crime have similar crime-attracting repercussions on the
adjacent blocks.
Some recent studies support the hypotheses that crime
generators, crime attractors, and offender anchor points
increase the number of robberies. For example, adding
a liquor store to the block may increase the expected
number of robberies by 67 percent, and the blocks with
a gas station have more than 4 times as many robberies
as similar blocks without a gas station.
Heatmap in figure 12 depicts clustering of personal
theft along south-east to central-north direction, which
is reminiscent with subway line path in figure 1. Another
cluster of personal injuries lays in south-west part of Bo-
gota, where other carcinogenic features might be strong.
Notice, that density of overlapping records is plotted in
figure1 to see darker colors for more common patterns
and lighter for less common ones.
Figure 11. Seasonality of commercial theft counts by month
and day of the week Sunday is the least likely day for such.
Figure 12. Interactive map view depicting concentraton of
near repeat incidents in central part of Bogota.
VI. DISCUSSION
In urban crime analytics domain, an important issue is
the development of collaborative crime information sys-
tems that integrate historical crime data, geographical
data with dynamics of people flow. Such systems can be
of great interest for many applications related to moni-
toring and analysis of crime dynamics, where people flow
as well as network properties are changing in fast and
almost continuous mode. This work aims to introduce a
framework consisting of crime analytics dashboard, GIS
and several specialized applications to support predictive
policing effort in real time with lack of resources.
All predictive techniques depend on data. Both the
volume and the quality of these data will determine
the usefulness of any approach. The saying garbage in,
garbage out strongly applies to these analyses. Efforts
should be made to ensure that data are accurate and
complete, though some techniques are less sensitive to
small errors than others [2].
Ideally GIS software should have a data creation ETL-
tool and/or wizard that simplifies data creation steps.
Crime analysts work in tactical environments where they
work with data in real time leaving little room for cum-
bersome multi-step processes [13]. Occasionally analysts
need to create new, unique data layers. Comprehen-
sive GIS software needs to include geoprocessing tools
for extracting and clipping features from larger collec-
tions, merging map layers, editing spatial and attribute
features of map layers.
It is recommended that the most efficient and useful
way to make use of RTM in investigating the risk of ur-
ban residential burglary based on the surrounding en-
vironmental context is by conducting an analysis at a
citylevel [7]. It is believed that at this level of spatial
analysis, the identification of crime correlates and tem-
9
poral features will produce information that is not only
insightful but also practical.
VII. IMPLICATIONS AND LIMITATIONS
Hotspot maps, for instance, show the concentration of
crime but offer little in the sense of context. By articulat-
ing the environmental context of crime incident locations,
risk terrain modeling helps to identify and prioritize spe-
cific areas and features of the landscape to be addressed
by a targeted intervention.
Strategies to address crime problems, therefore, must
incorporate both the spatial and temporal patterns of re-
cent known crime incidents and the environmental risks
of micro places if they are to yield the most efficient
and actionable information for police resource allocation.
Looking ahead, it could be useful to move beyond pre-
dictions towards offering explicit decision support for re-
source allocation and other decisions. To be useful in law
enforcement, predictive policing methods must be a part
of a comprehensive crime prevention strategy.
The speed at which analytic dashboard operates de-
pends heavily on the processing capabilities of the com-
puter on which it is installed. Therefore, it is highly
recommended that you install and operate this software
on a cloud computer or server with abundant memory.
Finally, the most important measure of a crime fore-
casting is whether it aids in crime control and prevention,
and how the outputs of models translate into practice [1].
In thinking about these needs, it is important that the
value of predictive policing tools is in their ability to pro-
vide situational awareness of risks and the information
needed to act on those risks and prevent crime [2].
VIII. CONCLUSION
In conclusion many analysts and investigators have a
professional interest in how these approaches improve
their work and make it more useful; police chiefs are
eager to find new techniques to reduce crime without
adding to their workforce; and the private sector sees
potential funding from research grants, consulting, and
software development. Policing methods, including pre-
dictive analyses, could be useful for peacekeeping appli-
cations as well.
Large agencies with large volumes of incident and in-
telligence data that need to be analyzed and shared will
want to consider more advanced and, therefore, more ef-
fective systems. It is helpful to think of these as enter-
prise information technology systems [2]. Such systems
make sense of large datasets to provide situational aware-
ness to decision makers and to the public. These systems
help agencies understand the where, when, of crime and
identify the specific problems that drive crime in order
to take action against them. The predictive policing sys-
tems provide the shared situational awareness needed to
make decisions about where and how to take action. De-
signing intervention programs that take these attributes
into account, in combination with solid predictive ana-
lytics, can go a long way ensuring that predicted crime
risks do not become real crimes.
[1] Elizabeth R. Groff and Nancy G. La Vigne. Forecasting
the future of predictive crime mapping. Crime Preven-
tion Studies, 13(1):29–57, 2002.
[2] Walter L. Perry, Brian McInnis, Carter C. Price, Su-
san C. Smith, and John S. Hollywood. Predictive Polic-
ing. The Role of Crime Forecasting in Law Enforcement
Operations. RAND Corporation, 2013.
[3] Leslie W. Kennedy, Joel M. Caplan, and Eric L. Piza. A
Primer on the Spatial Dynamics of Crime Emergence and
Persistence. Rutgers Center on Public Security, Newark,
NJ, 2012.
[4] Samuel Kirson Weinberg. Theories of Criminality and
Problems of Prediction. The Journal of Criminal Law,
Criminology, and Police Science, 45(4):412–424, 1954.
[5] Richard Block Wim Bernasco. Robberies in chicago: A
block-level analysis of the influence of crime generators,
crime attractors, and offender anchor points. Journal of
Research in Crime and Delinquency, 48(1):33–57, 2011.
[6] Peter K. B. St. Jean. Pockets of Crime: Broken Win-
dows, Collective Efficacy, and the Criminal Point of
View. University Of Chicago Press, Chicago, IL, 2007.
[7] William D. Moreto. Risk Factors of Urban Residen-
tial Burglary. [Research Brief Series Dedicated to Shared
Knowledge]. RTM Insights, 4(4):1–3, 2010.
[8] William R. Smith, Sharon Glave Frazee, and Elisabeth L.
Davison. Furthering the Integration of Routine Activ-
ity and Social Disorganization Theories: Small Units of
Analysis and the Study of Street Robbery as a Diffusion
Process. Criminology, 38(2):489–524, 2000.
[9] Dean Abbott. Applied Predictive Analytics. Principles
and Techniques for the Professional Data Analyst. John
Wiley Sons, Inc., Indianapolis, IN, 2014.
[10] David Weisburd, Wim Bernasco, and Gerben J. N. Bru-
insma. Units of analysis in geographic criminology: His-
torical development, critical issues, and open questions.
Putting Crime in Its Place: Units of Analysis in Geo-
graphic Criminology, pages 3–34, 2009.
[11] Andrey Bogomolov. Once Upon a Crime: Towards Crime
Prediction from Demographics and Mobile Data. The
Journal of Criminal Law, Criminology, and Police Sci-
ence, 1409(1):2983, 2014.
[12] Richard Adderley and Peter B. Musgrove. Data min-
ing case study: Modeling the behavior of offenders who
commit serious sexual assaults. Proceedings of the 7th
ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining by Association for Comput-
ing Machinery, 2001.
[13] International Association of Crime Analysts. GIS Soft-
10
ware Requirements for Crime Analysis (White Paper
2012-01). Standards, Methods, Technology (SMT) Com-
mittee, Overland Park, KS, 2012.

More Related Content

What's hot

EllisPredictivePolicing
EllisPredictivePolicingEllisPredictivePolicing
EllisPredictivePolicingDennis Ellis
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
nikita kapil
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
Monisha100
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
IOSR Journals
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Osokop
 
PredPol: How Predictive Policing Works
PredPol: How Predictive Policing WorksPredPol: How Predictive Policing Works
PredPol: How Predictive Policing Works
PredPol, Inc
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
theijes
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
Zakaria Zubi
 
RESEARCH PROPOSAL 2016 @ FINAL VERSION
RESEARCH PROPOSAL 2016 @ FINAL VERSIONRESEARCH PROPOSAL 2016 @ FINAL VERSION
RESEARCH PROPOSAL 2016 @ FINAL VERSIONGabriel Jr Matthew
 
Predictive policing computational thinking show and tell
Predictive policing computational thinking show and tellPredictive policing computational thinking show and tell
Predictive policing computational thinking show and tell
Archit Sharma
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
eSAT Journals
 
Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project report
sarthakkhare3
 
Predictive Policing on Gun Violence Using Open Data
Predictive Policing on Gun Violence Using Open DataPredictive Policing on Gun Violence Using Open Data
Predictive Policing on Gun Violence Using Open Data
PredPol, Inc
 
Analytics anecdotes
Analytics anecdotesAnalytics anecdotes
Analytics anecdotes
Ramkumar Ravichandran
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-mining
mohammed albash
 
Towards simulating criminal offender movement based on insights from human dy...
Towards simulating criminal offender movement based on insights from human dy...Towards simulating criminal offender movement based on insights from human dy...
Towards simulating criminal offender movement based on insights from human dy...
Raquel Rosés
 
Crime
CrimeCrime
Artificial Intelligence Models for Crime Prediction in Urban Spaces
Artificial Intelligence Models for Crime Prediction in Urban SpacesArtificial Intelligence Models for Crime Prediction in Urban Spaces
Artificial Intelligence Models for Crime Prediction in Urban Spaces
mlaij
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
APNIC
 

What's hot (19)

EllisPredictivePolicing
EllisPredictivePolicingEllisPredictivePolicing
EllisPredictivePolicing
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
 
PredPol: How Predictive Policing Works
PredPol: How Predictive Policing WorksPredPol: How Predictive Policing Works
PredPol: How Predictive Policing Works
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
RESEARCH PROPOSAL 2016 @ FINAL VERSION
RESEARCH PROPOSAL 2016 @ FINAL VERSIONRESEARCH PROPOSAL 2016 @ FINAL VERSION
RESEARCH PROPOSAL 2016 @ FINAL VERSION
 
Predictive policing computational thinking show and tell
Predictive policing computational thinking show and tellPredictive policing computational thinking show and tell
Predictive policing computational thinking show and tell
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project report
 
Predictive Policing on Gun Violence Using Open Data
Predictive Policing on Gun Violence Using Open DataPredictive Policing on Gun Violence Using Open Data
Predictive Policing on Gun Violence Using Open Data
 
Analytics anecdotes
Analytics anecdotesAnalytics anecdotes
Analytics anecdotes
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-mining
 
Towards simulating criminal offender movement based on insights from human dy...
Towards simulating criminal offender movement based on insights from human dy...Towards simulating criminal offender movement based on insights from human dy...
Towards simulating criminal offender movement based on insights from human dy...
 
Crime
CrimeCrime
Crime
 
Artificial Intelligence Models for Crime Prediction in Urban Spaces
Artificial Intelligence Models for Crime Prediction in Urban SpacesArtificial Intelligence Models for Crime Prediction in Urban Spaces
Artificial Intelligence Models for Crime Prediction in Urban Spaces
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 

Similar to Crime Analysis based on Historical and Transportation Data

ACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdfACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdf
KiranKumar757501
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
Tom Donoghue
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
ijaia
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
gerogepatton
 
GIS based Decision Support System for Crime Mapping, Analysis and identify H...
GIS based Decision Support System for Crime Mapping, Analysis  and identify H...GIS based Decision Support System for Crime Mapping, Analysis  and identify H...
GIS based Decision Support System for Crime Mapping, Analysis and identify H...
IJMER
 
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
AIRCC Publishing Corporation
 
crime reports by address
crime reports by addresscrime reports by address
crime reports by address
Geonomics
 
Write a One-Page Reaction Paper on the concept of why you believe it.docx
Write a One-Page Reaction Paper on the concept of why you believe it.docxWrite a One-Page Reaction Paper on the concept of why you believe it.docx
Write a One-Page Reaction Paper on the concept of why you believe it.docx
MelvinaLeepercy
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET Journal
 
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docxJournal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
priestmanmable
 
250 words agree or disagreeWhile I have mixed opinions about p.docx
250 words agree or disagreeWhile I have mixed opinions about p.docx250 words agree or disagreeWhile I have mixed opinions about p.docx
250 words agree or disagreeWhile I have mixed opinions about p.docx
vickeryr87
 
GEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCESGEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCES
Expert Writing Help
 
Investigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysisInvestigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysis
ZhongLI28
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACESARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
mlaij
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime Prediction
Prodapt Solutions
 
Predective policingachtergrond
Predective policingachtergrondPredective policingachtergrond
Predective policingachtergrondFrank Smilda
 
Application of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence IntelligenceApplication of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence Intelligence
Sabaragamuwa University of Sri Lanka
 
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
drennanmicah
 
FINALProject Report for public
FINALProject Report for publicFINALProject Report for public
FINALProject Report for publicHina Pandya
 

Similar to Crime Analysis based on Historical and Transportation Data (20)

ACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdfACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdf
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
 
GIS based Decision Support System for Crime Mapping, Analysis and identify H...
GIS based Decision Support System for Crime Mapping, Analysis  and identify H...GIS based Decision Support System for Crime Mapping, Analysis  and identify H...
GIS based Decision Support System for Crime Mapping, Analysis and identify H...
 
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
Inflation-Crime Nexus: A Predictive Analysis of Crime Rate Using Inflationary...
 
crime reports by address
crime reports by addresscrime reports by address
crime reports by address
 
Write a One-Page Reaction Paper on the concept of why you believe it.docx
Write a One-Page Reaction Paper on the concept of why you believe it.docxWrite a One-Page Reaction Paper on the concept of why you believe it.docx
Write a One-Page Reaction Paper on the concept of why you believe it.docx
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
 
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docxJournal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
 
250 words agree or disagreeWhile I have mixed opinions about p.docx
250 words agree or disagreeWhile I have mixed opinions about p.docx250 words agree or disagreeWhile I have mixed opinions about p.docx
250 words agree or disagreeWhile I have mixed opinions about p.docx
 
GEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCESGEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCES
 
Investigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysisInvestigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysis
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACESARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
ARTIFICIAL INTELLIGENCE MODELS FOR CRIME PREDICTION IN URBAN SPACES
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime Prediction
 
Predective policingachtergrond
Predective policingachtergrondPredective policingachtergrond
Predective policingachtergrond
 
Application of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence IntelligenceApplication of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence Intelligence
 
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
150 words agree or disagreeFuture Criminal Intelligence Enviro.docx
 
FINALProject Report for public
FINALProject Report for publicFINALProject Report for public
FINALProject Report for public
 

More from Valerii Klymchuk

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides template
Valerii Klymchuk
 
Toronto Capstone
Toronto CapstoneToronto Capstone
Toronto Capstone
Valerii Klymchuk
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
Valerii Klymchuk
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
Valerii Klymchuk
 
05 Scalar Visualization
05 Scalar Visualization05 Scalar Visualization
05 Scalar Visualization
Valerii Klymchuk
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
Valerii Klymchuk
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
Valerii Klymchuk
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
Valerii Klymchuk
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
Valerii Klymchuk
 
Database Project
Database ProjectDatabase Project
Database Project
Valerii Klymchuk
 

More from Valerii Klymchuk (14)

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides template
 
Toronto Capstone
Toronto CapstoneToronto Capstone
Toronto Capstone
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
 
05 Scalar Visualization
05 Scalar Visualization05 Scalar Visualization
05 Scalar Visualization
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Database Project
Database ProjectDatabase Project
Database Project
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Crime Analysis based on Historical and Transportation Data

  • 1. Crime Analysis and Predictions from historical Crime and Transportation Data∗ Valerii Klymchuk† and Candacey Primo‡ Saint Peter’s University 2641 John F. Kennedy Boulevard, Jersey City, NJ 07306 (Dated: January 1, 2016) Law enforcement agencies across the globe are employing a range of predictive policing methods. Smart, effective, and proactive policing is clearly preferable to simply reacting to criminal acts. This paper proposes a novel approach to model, visualize, and predict crime risk levels for con- tinuous geographic domain based on demographic, historical and transportation data. The main contribution of this approach lays in using passenger counts from major transportation hubs to tackle a crime prediction problem. While previous research efforts have used either background historical knowledge or offender’s profiling, our findings show that aggregated data captured from Public Transit Infrastructure, in combination with basic demographic information, and historical crime records can be used to predict crime. In our experimental results with real crime data from Bogota we obtained a set of statistics that describe seasonality in crime patterns, which can accompany predictive machine learning models assessing the risks of crime. Moreover, this work provides a discussion on implementation, design and final findings, suggesting a prototype for cloud based crime analytics dashboard. I. INTRODUCTION The ability to predict the locations of future crime events can serve as a valuable source of knowledge for law enforcement, both from tactical and strategic per- spectives. Predictive mapping holds a big promise for identification of areas in which to focus interventions, but it also may improve the way those interventions are implemented, if one can examine both distributions of past crimes and predictions of future concentrations[1]. The objective of this study was to develop a prototype to illustrate the benefits of easy to use and, web-delivered digital solution for visualization, analysis and exploration of criminal activity data in space and time. A new pro- totype was created as a web-based analytic dashboard to illustrate the potential benefits of new interactive tools for studying crime and transportation data provided by city of Bogota. This work primarily makes an attempt to investi- gate the predictive power of people dynamics derived from transportation data in combination with histori- cal crime records within a place-centric and data driven paradigm. While retrospective mapping efforts are use- ful, the true promise of crime mapping for police lies in its ability to identify early warning signs across time and space, and inform a proactive problem solving and crime prevention[1]. The current paper suggests a design for an interactive prototype of Tableau analytic dashboard panel hosted on a cloud computer server, equipped with an interactive ∗ A footnote to the article title † Also at Computer and Information Sciences Department, Saint Peter’s University. ‡ cprimo@mail.saintpeters.edu map and a set of filtering controls, which support linear and composite view, interactive temporal legends, and a set of informative interactive map layers. We believe that that proposed solution can assist crime analysts it identifying new insights about spatio- temporal crime patterns within urban city environment. Preliminary finding can accompany further predictive modeling efforts in crime prediction. II. PROBLEM UNDERSTANDING AND RELATED WORK The use of statistical and geo-spatial analyses to fore- cast crime levels has been around for decades. GIS out- puts can be used to visually identify concentrations and patterns and to communicate those findings. GIS has been widely used to produce maps depicting crime ”hot spots” as well as to conduct spatial analyses that sug- gest relationships between crime and characteristics of the social and physical environments in which crime con- centrations occur[1]. In recent years, however, there has been a surge of in- terest in analytical tools that draw on very large data sets to make predictions in support of crime prevention[2]. Police agencies often use computer analysis of informa- tion about past crimes, the local environment, and other intelligence to ”predict” and prevent crime. Predictive methods allow police to work more proactively with lim- ited amount of resources. Improved situational awareness at the tactical and strategic levels results, for example, in more efficient and effective policing, capable to pre- vent criminal activity by repeat offenders against repeat victims. Conventional approaches often apply human judge- ment and start with mapping crime locations and de-
  • 2. 2 termining where crimes are concentrated (”hot spots”). Such methods identify individuals at high risk of offend- ing in the future and relate to assessing individuals risk [2]. The most common method of ”forecasting” crime in police departments is simply to assume that the hot spots of yesterday are the hotspots of tomorrow Meanwhile our approach relies rather on a new tech- nique that adds up the number of various risk factors to create an overall risk score. The objective of such data driven technique is to forecast places and times with an increased risk of crime in order to develop effective strate- gies for police to prevent crime more proactively or make investigation efforts more effective. A. Why Crime is Predictable. Criminological Justification for Predictive Policing A notion that crime is predictable is supported by major theories of criminal behavior, such as routine ac- tivity theory, rational choice theory, and crime pattern theory[2]. These theories can be consolidated into a blended theory: a. Criminals and victims follow common life pat- terns; overlaps in those patterns indicate an increased likelihood of crime. b. Geographic and temporal features influence the where and when of those patterns. c. As they move within those patterns, criminals make ”rational” decisions about whether to commit crimes, taking into account such factors as the area, the target’s suitability, and the risk of getting caught. The blended theory best fits ”stranger offenses”, such as robberies, burglaries, and thefts. It is less ap- plicable to vice and relationship violence, both of which involve human connections that both extend beyond limited geographic boundaries and lead to decisions that do not fit into traditional ”criminal rational choice” frameworks[2]. It is more manageable for police agencies to allocate resources to places that are most attractive to motivated offenders and to where crime is most likely to occur given certain characteristics of the environment. B. Place centric crime paradigm Studies suggest that individuals are at greater risk to commit crime at the presence of criminogenic opportunities[3]. Places and environments have charac- teristics that encourage or discourage the occurrence of crime and, therefore, raise or lower criminogenic risk. In 2008 criminologist David Weinberg proposed switch to switch from popular people-centric to place-centric paradigm [4]. In contrast to previous research on the spatial distri- bution of crime, in which explanations were sought in the Figure 1. Subway stations depicted atop of crime intensity base map of Bogota. Darker colors depict higher total crime counts based on 2007 and 2013 data. social, demographic, and economic characteristics of the resident population, this study is mainly concerned with examining block-to-block variations in land use. One related study shows that the size of the resi- dent population, the racial and ethnic composition of the block, and the poverty level in the block group do have additional effects beyond those of land use, but also sug- gest that the effects of crime attractors, generators and offender anchor points are general and apply indepen- dently of whether they are located in residential blocks and independently of the specific characteristics of the people who live in the block [5]. Other spatially corre- Figure 2. Homicide heat-map with hot-spots of predomi- nantly concentrated in southern and some northern parts of Bogota city.
  • 3. 3 Figure 3. Bus stops and police stations in parts of Bogota city atop of street network base map. lated factors also drive the distribution of crime incidents at block level. The settings and times for which some fac- tors become most relevant should also be considered by predictive model. The risk of crime at places that have criminogenic attributes is sought to be higher than at other places because these locations attract motivated offenders (are more likely to concentrate them in close proximity) and create a backcloth for certain events to occur. Attrac- tors are those specific features that attract offenders to places in which they commit crime. Generators can be seen as greater opportunities for crime that burgeon from increased volume of interaction occurring at these areas, resulting in greater risk and likelihood of crime. Various potentially relevant crime attractors and crime generators may include, for example, bus stops, parking places, supermarkets, warehouses, bookstores, clothing stores, and pharmacies, as well such characteristics as signs of physical deterioration and the lack of collective efficacy [6]. Another study outlines the following key fac- tors related to urban residential burglary[7]: social dis- organization, proximity to pawn shops, proximity to bus stops, time of the day, day of the week, proximity to police stations, fire stations and hospitals. Crime analytic dashboard was developed to identify and communicate environmental attractors of crime. The dashboard and an interactive map can be used to antic- ipate places most suitable for illegal behavior, identify where new crime incidents can emerge and/or cluster. All environmental factors that determine specific vulner- abilities are weighted by the model according to their relative spatial influence on the outcome event, and final map articulates the vulnerability in relative risk values as for any given geographic location across the terrain. 1. Units of analysis A common thread among opportunity theorists is that the unit of analysis for ”opportunity” is a place, and that the dynamic nature of that place constitutes opportuni- ties for crime. The way in which features of a landscape affect places throughout the landscape is referred to as the concept of spatial influence. Effective predictive mod- els need to incorporate small areas as units of analysis, which more accurately reflect ”micro-places”, such as city blocks, street segments, and intersections. Not all census blocks in Bogota are on a perfect grid. Some streets are diagonals and blocks along these streets are triangles. Furthermore, census blocks are not only bounded by streets but also by railways and natural bar- riers such as rivers and lakes and the mountains. There- fore using more detailed geographic features than census block boundaries could provide a deeper understanding of how spatial patterns of crime are influenced by the ge- ography and land use. Also a face block, the two sides of a street between two junctions, is a natural spatial unit, not only because it is smaller than a block but also because it is characterized by inter-visibility [8]. To model a continuous crime opportunity surface in a geographic information system, we use a grid of equally sized cells that comprise entire jurisdiction. Each cell rep- resents a micro place throughout the landscape, while the cell size is selected as a function of street segment/block length, so that environmental backcloth is constructed in a manner that reflects unique landscape of each ju- risdiction and its street network on which people travel and to which crimes are geo-coded. Half the mean block length guided the choice of cell size in a micro cellular grid, which is more consistent with expectations about the continuous nature of criminogenic risk[3]. Figure 4. Spatial view of the city divided into city blocks, coded with color.
  • 4. 4 2. Target variable Choice of variables is critical to the success of the model and must be informed by theory. Theory-based modeling enables to identify which factors influence crime target selection, and thus inform crime prevention efforts. Risk of crime in this study is seen as a function of vul- nerability within the context of other factors that carry different weights relative one to another, therefore in the absence of motivated offenders, the existent risk of crime can be relatively low. Analyzing risk terrains along with event-dependent as- sessments is useful for generating a more complete un- derstanding of crime problems. The study of spatial con- centrations of crime relies upon evidence-based methods, and do not rely on the physical or social characteristics of individuals who enter and leave these locations, only on the geographic factors that make these places risky. Crime risk can be modeled as a dichotomous variable, which is rarely or never zero. The risk posed by each criminogenic feature is located at one or more places on a landscape; their confluence at the same place contributes to a risk value that, when raised a set amount, increases the likelihood of crime. Risk terrain modeling produces a metric for place- based opportunity, as measured by the spatial influences of place-based risk factors to be used for measuring vul- nerable places. Opportunity can be measured in degrees and changes over space and time as our perceptions about the environments evolve; as new crimes occur; as police intervene; or as motivated offenders and suitable targets travel. Hotspots can be used as a proxy measure of places where the dynamic interactions of underlying crimino- genic factors exist or persist over time[3]. The risk values serve as a measure of the clustering of the risk factors, and forecast whether crime will occur or possibly cluster. III. DATASETS For our study we exploit datasets provided by the city of Bogota, Colombia. Throughout the project we have working access to the following data: 1. Crime dataset contains historical crime records for the city of Bogota. Among other fields this dataset contains geographic coordinates (X, Y ) for over 100k of crimes commited between January, 2007 - March, 2013 and include such fields, as: types of crime, street address for each of the crime scenes, along with ObjectId (ZIP - like Code, presumably - police district, or city block) in Bogota. Each ObjectId has its own Coordinates specified as longitude and latitude. 2. Transportation dataset in .cvs format contains co- ordinates for subway stations in Bogota, together with counts of passengers entering each station every 15 min- utes over the span of 2-3 years. 3. Bus Transit dataset lists dates, bus route numbers with corresponding Destination (names of final stops) to which each bus is heading, with counts of passengers. 4. Publicly accessible datasets in .csv format with co- ordinates for Bus stops, Police Stations, Schools, Hospi- tals, Churches in Bogota, etc. 5. Demographic dataset for the city of Bogota in .shp format. A set of Bogota borough profiles related to home- lessness, households, residential property sales, housing market and societal well-being. Borough profiles datasets contain area codes, area names, metrics about the pop- ulation of a particular geographic area, such as statis- tics about the population, households, demographics, mi- grant population, ethnicity, employment, earnings, house prices, indices of multiple deprivation, etc. A. Data understanding Rank ordered statistics is the most well known way of looking at data. Figure 5 shows a frequency summaries for 8 types of crime in descending order based on number of counts. We see personal theft and personal injury as being predominant types of crime in the dataset, with over 55 and 36 thousand incidents accordingly between years 2007 and 2013. Those are followed by burglar- ies, commercial theft, vehicle and motorcycle theft. Also homicide corresponds to over 5 thousand records in our data. Type of crime is recorded as categorical variable Delito, with 8 distinct levels, corresponding to each dis- tinct type of crime. Parallel coordinate plot in figure 7 represents seasonal variability of crime counts for each year. In this chart years are color-coded and each line represents yearly pat- tern in total number of cases per month. Chart in figure7 demonstrates an evident deviation from regular pattern in the years 2008 and 2012. Those two years compared to others had most uncommon pattern due to some fac- Figure 5. Types of crime, ordered in decreasing order based on total counts over the period of observation.
  • 5. 5 Figure 6. Seasonal variability of crime counts. tors, which could have influenced crime counts in speci- fied time-frame. Considering, that presidential elections in Colombia follow 4 year cycle, it might be safe to as- sume that those patterns could have been influenced par- tially or at large by the election campaigns taking place. Time series chart in figure 6 provides a view of crime counts variability over time. Stacked columns in this chart represent monthly totals split and color-coded by crime type. This chart exhibits a significant dip in crime during first and second quarters of 2008, followed by spike in third quarter; crime counts also dipped even faster be- tween second and third quarters of 2012, and risen again in forth quarter. Such facts indicate presence of a long term pattern within a span of 4 years of data. Notice, Figure 7. Yearly seasonalities for different types of crime. Each line represents a year of data. that this could support the hypothesis about correlation with 4 year election cycles, mentioned before. 1. Data preparation Following preprocessing steps were performed on data: 1. Referencing all geo-tagged data to the cells. 2. The features: for each of the variables related to the cell the mathematical functions were computed to charac- terize the distributions and properties of such variables, e.g. mean, median, standard deviation, min and max values, and Shannon entropy. 3. Bogota demographic profile feature has to be aggre- gated to each of the cells. Correlation measures the numerical relationship of one variable to another and are good for identifying variables which are related to each other. Pairs of highly corre- lated variables indicate redundancy in the data which should be considered, since redundant variables dont pro- vide new information to help with prediction, and some algorithms can be harmed numerically by variables that are highly correlated one to another. For high correlation values (greater than 0.95) a rule of thumb was applied: variables were considered identical, and treated as redun- dant variables by the model. The final results can be very sensitive to the input data. Thus, if there are seasonal effects in the data, such sensitivity can lead to spurious patterns or trends, which can make results difficult to reproduce. However, corre- lations are linear measures, which are affected severely by outliers and skewness. Visualization is a good idea to verify that high correlation is real. Using parallel line chart we can see immediately which pairs of variables are highly correlated all in one plot. Chart in figure 8 displays weekly patterns for different
  • 6. 6 types of crime in the dataset. Each line represents a to- tal number of cases for each type of crime, based on the day of the week. Notice, how pattern for personal theft is different from pattern of personal injury, for example. While personal theft counts clime up on weekdays, per- sonal injury counts decrease; and spike on weekends for personal injuries corresponds to decline in personal theft. Such observations can indicate that weekly work/home cycles largely affect individual’s vulnerability for some types of crime. 2. Outliers and skew Positive skew in variables results in mean values no longer representing a typical value for the variable. Home values, for example are reported using the median rather than mean; because the mean gets inflated by the large values to the right of the distribution[9]. For positively skewed data log scale can be effective in changing the data to see the data resolution and pattern better. Once log transform is applied, data will be in log units instead of original units. The first indication of positive skew is when the mean is larger than the median. An additional indicator is when the mode is less than the median, which is also less than the mean. Unsupervised learning technique - clustering was used to identify multidimensional outlier. Clusters with rela- tively few values in them are indicative of unusual groups of data, and therefore may represent multidimensional outliers. Dummy variables were created by the model to indicate whether the interaction is an outlier, as a mem- bership in a cluster of multidimensional outliers. The assumption is that the outliers can distort the model so much that they harm more than help. Therefore we must remove outliers for linear regression, k-nearest Figure 8. Weekly patterns of crime vary for different type of crime, and should be considered by the predictive model. neighbor, K-Means clustering, and PCA. It is also pos- sible to separate the outliers and create separate models just for outliers by relaxing the definition of an outlier from 3 standard deviations from the mean down to 2 standard deviations from the mean. In some cases we want to leave the outliers in the data without modifi- cation purposely allowing model to be biased. Decision trees, for example, are not affected by outliers[9]. Good features can make the patterns between inputs and the target variable far more transparent. In this work we generated a separate column dummy variable to indicate that the value is an outlier. We also transform the outliers so that they are no longer outliers by binning the data; and convert the numeric variables to categorical to mitigate the numeric affect on the algorithms. Finally, new additional features were added to the data as new variables created from one or more existing vari- ables already in the data. Significant improvement in model performance metrics can be achieved by intro- ducing refined second order features, to allow capturing inter-temporal dependencies of the problem and to make the feature space more compact. IV. METHODOLOGY Analyzing risk terrains along with event-dependent as- sessments is useful for generating a more complete under- standing of crime. Empirical research demonstrates that including a measure of environmental risk yields a better model of future crime locations compared to predictions made with past crime incidents alone[3]. Assuming that every new crime incident is a potential instigator for near repeats, priority can be given to new crimes that occur at high risk places with other high risk places in close proximity. Defining vulnerable places, therefore, is a function of the combined spatial influence of criminogenic features throughout a landscape that contribute to crime by at- tracting and concentrating illegal behavior. Extra spatial factors, such as knowledge about potential victims or of- fenders can also be considered. The steps in this work were taken along the following road map: 1. Linking all information to the corresponding cells. 2. Data-set are randomly split into training (80% of data) and testing (20% of data) sets. Each dimension of the feature vector was normalized. 3. Pearson correlation analysis yields a large subset of features with strong mutual correlations and a subset of uncorrelated features. 4. Pipeline variable selection approach, based on fea- ture ranking and feature subset selection is performed using only data from the training set. Top variables- predictors of crime are being identified by our model, sorted by the mean reduction in accuracy. Given the high skewness of the distribution and based on previous research and ground-truth data about counts
  • 7. 7 Figure 9. Monthlly patterns for different types of crime reveal the nature of change in the total counts. of crimes in each given sell, we split the criminal dataset with respect to it’s median into these seven classes: a) ”class −3”: median − 3σ ≤ N ≤ median − 2σ; b) ”class −2”: median − 2σ ≤ N ≤ median − σ; c) ”class −1”: median − σ ≤ N ≤ median; d) ”class +1”: median ≤ N ≤ median + σ; e) ”class +2”: median + σ ≤ N ≤ median + 2σ; f) ”class +3”: median + 2σ ≤ N ≤ median + 3σ; g) ”class 0 ”: median + 3σ ≤ N ≤ median − 3σ. Algorithms automate most steps of risk terrain model- ing. Clustering groups data records into similar groups, and classification techniques assign data records to cate- gories (commonly referred to as classes). The algorithm tests a variety of spatial influences for every risk factor input to identify the most grounded spatial associations with known crime incident locations and considers only the most appropriate risk factors with their spatial influ- ences to produce risk terrain surface. The exposure to personal crime in an area normally includes both the resident and the transient population (Andresen 2006), those who live in the area and those who visit the area. Note that the transient population size is not measured directly but is presumed to be incor- porated in the effect of the other variables[10]. In other words, crime generators generate crime by drawing an enlarged transient population into the block. Shannon entropy reflects the predictable structure of a place in terms of the types of people present over the day[11]. Clustering algorithms are commonly used as part of data exploration in order to find commonalities across crimes [2]. For example, clustering was applied on a data set of burglaries to find those exhibiting similar tactics. These can be evidence of a serial burglar, or could suggest interventions against some of the clusters. As a specific example, Adderley and Musgrove had clustered perpetra- tors of serious sexual assaults using self-organizing maps [12]. SOM cluster observations with similar traits into related groups to identify crimes committed by the same individual because of such similarities. The information could be fed into the hot spot mod- els to identify the perpetrators next target, or it could be used for geographic profiling methods to identify the per- petrators home base. Unsupervised learning techniques can project high dimensional data onto two dimensions. Multidimensional scaling (MDS) algorithms, such as the Sammon Projection, try to retain the relative distance of data points in the 2D projection as found in higher di- mensional space could be another option. A simpler way is to use color, size, and shape. The metric for feature ranking can be the mean de- crease in the Gini coefficient of inequality. The feature with maximum mean decrease in Gini coefficient is ex- pected to have the maximum influence in minimizing the out-of-the-bag error. Minimizing the out-of-the-bag er- ror results in maximizing common performance metrics used to evaluate models. When there is considerable overdispersion in the data (alpha value of 1.24), the negative binomial models might fit the data better than a Poisson model. Therefore the- Figure 10. Personal injury patterns by month and day of the week sgow, that Sunday is the likeliest day for this crime.
  • 8. 8 oretical arguments were put forward to use (negative bi- nomial) regression models with lagged independent vari- ables rather than spatial error or spatial lag regression models [5]. V. RESULTS AND EVALUATION Place-based risk assessment with risk terrain modeling permits real-time evaluation of the propensity for a new crime to become an instigator for near repeats. Some activities in adjacent blocks are correlated (i.e., spatial autocorrelation in the independent variable) and are the effects of activities in adjacent focal block accord- ing to the model’s interpretation. Thus, for example, the presence of a gas station in the block increases robberies in the block, but so does the presence of a gas station in an adjacent block. Thus, the land uses that attract crime have similar crime-attracting repercussions on the adjacent blocks. Some recent studies support the hypotheses that crime generators, crime attractors, and offender anchor points increase the number of robberies. For example, adding a liquor store to the block may increase the expected number of robberies by 67 percent, and the blocks with a gas station have more than 4 times as many robberies as similar blocks without a gas station. Heatmap in figure 12 depicts clustering of personal theft along south-east to central-north direction, which is reminiscent with subway line path in figure 1. Another cluster of personal injuries lays in south-west part of Bo- gota, where other carcinogenic features might be strong. Notice, that density of overlapping records is plotted in figure1 to see darker colors for more common patterns and lighter for less common ones. Figure 11. Seasonality of commercial theft counts by month and day of the week Sunday is the least likely day for such. Figure 12. Interactive map view depicting concentraton of near repeat incidents in central part of Bogota. VI. DISCUSSION In urban crime analytics domain, an important issue is the development of collaborative crime information sys- tems that integrate historical crime data, geographical data with dynamics of people flow. Such systems can be of great interest for many applications related to moni- toring and analysis of crime dynamics, where people flow as well as network properties are changing in fast and almost continuous mode. This work aims to introduce a framework consisting of crime analytics dashboard, GIS and several specialized applications to support predictive policing effort in real time with lack of resources. All predictive techniques depend on data. Both the volume and the quality of these data will determine the usefulness of any approach. The saying garbage in, garbage out strongly applies to these analyses. Efforts should be made to ensure that data are accurate and complete, though some techniques are less sensitive to small errors than others [2]. Ideally GIS software should have a data creation ETL- tool and/or wizard that simplifies data creation steps. Crime analysts work in tactical environments where they work with data in real time leaving little room for cum- bersome multi-step processes [13]. Occasionally analysts need to create new, unique data layers. Comprehen- sive GIS software needs to include geoprocessing tools for extracting and clipping features from larger collec- tions, merging map layers, editing spatial and attribute features of map layers. It is recommended that the most efficient and useful way to make use of RTM in investigating the risk of ur- ban residential burglary based on the surrounding en- vironmental context is by conducting an analysis at a citylevel [7]. It is believed that at this level of spatial analysis, the identification of crime correlates and tem-
  • 9. 9 poral features will produce information that is not only insightful but also practical. VII. IMPLICATIONS AND LIMITATIONS Hotspot maps, for instance, show the concentration of crime but offer little in the sense of context. By articulat- ing the environmental context of crime incident locations, risk terrain modeling helps to identify and prioritize spe- cific areas and features of the landscape to be addressed by a targeted intervention. Strategies to address crime problems, therefore, must incorporate both the spatial and temporal patterns of re- cent known crime incidents and the environmental risks of micro places if they are to yield the most efficient and actionable information for police resource allocation. Looking ahead, it could be useful to move beyond pre- dictions towards offering explicit decision support for re- source allocation and other decisions. To be useful in law enforcement, predictive policing methods must be a part of a comprehensive crime prevention strategy. The speed at which analytic dashboard operates de- pends heavily on the processing capabilities of the com- puter on which it is installed. Therefore, it is highly recommended that you install and operate this software on a cloud computer or server with abundant memory. Finally, the most important measure of a crime fore- casting is whether it aids in crime control and prevention, and how the outputs of models translate into practice [1]. In thinking about these needs, it is important that the value of predictive policing tools is in their ability to pro- vide situational awareness of risks and the information needed to act on those risks and prevent crime [2]. VIII. CONCLUSION In conclusion many analysts and investigators have a professional interest in how these approaches improve their work and make it more useful; police chiefs are eager to find new techniques to reduce crime without adding to their workforce; and the private sector sees potential funding from research grants, consulting, and software development. Policing methods, including pre- dictive analyses, could be useful for peacekeeping appli- cations as well. Large agencies with large volumes of incident and in- telligence data that need to be analyzed and shared will want to consider more advanced and, therefore, more ef- fective systems. It is helpful to think of these as enter- prise information technology systems [2]. Such systems make sense of large datasets to provide situational aware- ness to decision makers and to the public. These systems help agencies understand the where, when, of crime and identify the specific problems that drive crime in order to take action against them. The predictive policing sys- tems provide the shared situational awareness needed to make decisions about where and how to take action. De- signing intervention programs that take these attributes into account, in combination with solid predictive ana- lytics, can go a long way ensuring that predicted crime risks do not become real crimes. [1] Elizabeth R. Groff and Nancy G. La Vigne. Forecasting the future of predictive crime mapping. Crime Preven- tion Studies, 13(1):29–57, 2002. [2] Walter L. Perry, Brian McInnis, Carter C. Price, Su- san C. Smith, and John S. Hollywood. Predictive Polic- ing. The Role of Crime Forecasting in Law Enforcement Operations. RAND Corporation, 2013. [3] Leslie W. Kennedy, Joel M. Caplan, and Eric L. Piza. A Primer on the Spatial Dynamics of Crime Emergence and Persistence. Rutgers Center on Public Security, Newark, NJ, 2012. [4] Samuel Kirson Weinberg. Theories of Criminality and Problems of Prediction. The Journal of Criminal Law, Criminology, and Police Science, 45(4):412–424, 1954. [5] Richard Block Wim Bernasco. Robberies in chicago: A block-level analysis of the influence of crime generators, crime attractors, and offender anchor points. Journal of Research in Crime and Delinquency, 48(1):33–57, 2011. [6] Peter K. B. St. Jean. Pockets of Crime: Broken Win- dows, Collective Efficacy, and the Criminal Point of View. University Of Chicago Press, Chicago, IL, 2007. [7] William D. Moreto. Risk Factors of Urban Residen- tial Burglary. [Research Brief Series Dedicated to Shared Knowledge]. RTM Insights, 4(4):1–3, 2010. [8] William R. Smith, Sharon Glave Frazee, and Elisabeth L. Davison. Furthering the Integration of Routine Activ- ity and Social Disorganization Theories: Small Units of Analysis and the Study of Street Robbery as a Diffusion Process. Criminology, 38(2):489–524, 2000. [9] Dean Abbott. Applied Predictive Analytics. Principles and Techniques for the Professional Data Analyst. John Wiley Sons, Inc., Indianapolis, IN, 2014. [10] David Weisburd, Wim Bernasco, and Gerben J. N. Bru- insma. Units of analysis in geographic criminology: His- torical development, critical issues, and open questions. Putting Crime in Its Place: Units of Analysis in Geo- graphic Criminology, pages 3–34, 2009. [11] Andrey Bogomolov. Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data. The Journal of Criminal Law, Criminology, and Police Sci- ence, 1409(1):2983, 2014. [12] Richard Adderley and Peter B. Musgrove. Data min- ing case study: Modeling the behavior of offenders who commit serious sexual assaults. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining by Association for Comput- ing Machinery, 2001. [13] International Association of Crime Analysts. GIS Soft-
  • 10. 10 ware Requirements for Crime Analysis (White Paper 2012-01). Standards, Methods, Technology (SMT) Com- mittee, Overland Park, KS, 2012.