report

Analysis on crimes in Atlanta
Undergraduate Research Team
Georgia Institute of Technology
ISYE 4699
December 7 2014
1 Abstract
In this report, we will include what conclusions we made using the Geor-
gia Tech and Atlanta crime data. Areas of studies we were interested in
were patrol analysis, hot spots, and correlations among crimes. Using patrol
analysis, we studied about how current patrol routes could be changed. Im-
provements would result in less arrival times by police officers, lower crime
rates, less economic waste, and many more. A hot spot is an area with
concentrated crimes. We developed an algorithm on spotting hot spots in
Atlanta. This program needs improvements, but once we upgrade it to locate
more accurate hot spots, we will be able to compare hot spots of types of
crimes and find overlaps among them. Finally, we studied about how one
crime led to other crimes. We focused on finding the relationship between
auto theft and burglary.
2 Patrol Analysis
After we received the data from the Georgia Tech Police Department, we
filtered out unnecessary variables to conduct our preliminary research. We
were left with 18 most useful variables, and our research involved using these
information to come up with an improved solution. The variables are listed
below for reference:
1

2.1 Overview ISYE 4699
• OCANumber
• IncidentFromDate
• IncidentFromTime
• IncidentToDate
• IncidentToTime
• OffenseCode
• Offense Description
• CaseStatus CaseDisposition
• LocationCode
• PatrolZone
• Location
• Landmark
• LocationStreetNumber
• LocationDirectional
• LocationStreet
• LocationLatitude
• LocationLongitude
• CreatedSource
2.1 Overview
A total of 11578 crimes were recorded in Georgia Tech and nearby regions.
By looking at the distribution of time, day, and month, we acknowledged that
crimes occurred most often at around 1 am, and the frequency gradually
dropped until 6 am, when the crime was the least likely to occur. The crime
rate fluctuated smoothly between 400 to 700 crimes per hour from 8 am to
11 pm. April and September, which are parts of Spring and Fall Semesters,
had the highest number of crimes during the year. Offense codes 2700, 3657,
and 2751 topped the list of crimes. Approximately three-fourths of cases
were closed or cleared, and even the remaining cases were mostly inactive.
From the analysis of the four patrol zones divided by the Georgia Tech police,
Zone 2 was found to be the most dangerous. The number of crimes there
was almost twice compared to any other zone. Detailed crime type analysis
will be mentioned later.
2.2 Urban Police Patrol Model
Initially, the police department calculated patrol efficiency only consid-
ering the patrol time, and it even contained many errors. It was computed
by taking the difference of the total work time of police officers and the time
they spent on other duties, such as answering radio calls, taking care of the
traffic, or having a meal. This calculation gave inaccurate efficiency values
2/22

2.3 Further Questions ISYE 4699
since many police officers patrolled at a particular period of time when the
crimes were mostly likely to happen. Therefore, there was a lack of police
officers to patrol in other times. The goal was to have the right number of
patrol officers.
To fix these problems, in 1960’s, Dr. Richard Larson developed a system-
atic approach to study police patrol efficiency. He cooperated with the NYC
police department to develop the first version of the Urban Police Patrol
Model. He used the same 18 variables we listed above and came up with the
most ”accurate” model. The key idea of Dr. Larson’s model was as follows:
Given the pattern of crimes and limited amount of preventive patrol, how
should the effort be allocated along the streets to best achieve highest effi-
ciency? We were influenced by his study and decided to take on his model to
analyze the Atlanta crime data and optimize the patrol resource allocation.
Figure 1: Patrol Time = Total Time − Time for other duties
2.3 Further Questions
Dr. Larson’s model gave a rough approximation of the behavior of po-
lice preventive patrol. Qualitatively, Koopman’s method suggested that the
3/22

ISYE 4699
patrol effort should grow as the logarithm of the crime density increased
and further advised that areas with low likelihood of crimes should not be
patrolled at all. Refinement of this model was required before it could be
implemented by the police and a few more questions were asked by Dr. Lar-
son:
1. To what extent is an optimal patrol coverage function realizable?
2. How closely does a unit have to approach the optimal coverage in order
to achieve satisfactory result? Or, equivalently, what is the sensitivity
of the solution about the optimum?
3. To what extent is the crime distribution modified by patrol strategies?
4. How should each crime type be evaluated to reflect its relative serious-
ness?
5. What is the conditional probability that a crime will be detected given
its pattern?
We believe that these questions contain valuable insights and will continue
our research to answer questions in these areas.
3 Georgia Tech Crime data
One interesting phenomena was that while the crime rate of Atlanta kept
decreasing at a rate of 5.324%, the crime rate of Georgia Tech fluctuated
over the years. As the graphs suggested, the crime rate was higher in 2011
compared to 2010, then it decreased in 2012 and reached at its peak in
2013. We were unsure if the crime rates actually changed, or if the change
of benchmark on identification of crimes at Georgia Tech was the reason for
this. For example, some crimes in 2010 were categorized differently in later
years.
4/22

ISYE 4699
2010 2011 2012 2013 2014
0
1,000
2,000
Year
Numberofcrimes
Figure 2: Georgia Tech Crime – fluctuates
2010 2011 2012 2013 2014
0
1
2
3
·104
Year
Numberofcrimes
Figure 3: Atlanta Crime – decreases
We also observed the crime patterns using time series. In order to find the
relationship between Atlanta crimes and Georgia Tech crimes, we compared
the annual data and realized that the crime number patterns did not have a
recognizable similarity. In fact, there were some notable difference in their
patterns.
5/22

ISYE 4699
Figure 4: Montly number of crimes for Georgia Tech and Atlatna
Compared to Georiga Tech that had fewer crime in the summer, Atlanta
had even more. It was obvious that Georgia Tech’s summer crime rate was
low because most students left the campus for vacation. However, we did
not have an easy explanation for why the Atlanta data had an increase in
summer. Even when we took the average rate of years and put the graph
for GT and Atlanta together, we could see that Georgia Tech was more
dangerous during the semesters, but it was the opposite for Atlanta. Also,
it was interesting that the crime rate in Georgia Tech generally decreased
in between late August and December. We tried to come up with a few
reasons for the phenomenon. First, most freshmen came in at the end of
August of every year, and they lacked the sense of safety, and were thus
much more vulnerable to crimes. Second, September was the pledge month
for fraternities and sororities. Students were asked to do crazy stuﬀ and were
under the risk of being targeted, especially when they were drunk or walked
outside late at night.
6/22

3.1 Geographical Relationship ISYE 4699
Figure 5: Average number of crimes for Georgia Tech and Atlatna
3.1 Geographical Relationship
We analyzed crimes geographically by using offense codes and patrol
zones. This was easily done by making a pivot table and looking at the
results. It showed an overall trend of the data and gave insights on which
other techniques to apply to achieve even better results.
Our objective was to prove or disprove that there exists a clear relevance
between the GTPD patrol zones (Zone 1 - 4) and the offense codes used by
the NCIC. Furthermore, if such process proved to be efficient, then we could
further apply the same procedure to analyze the Atlanta crime data.
We used all of the GTPD data from 2010 to 2014. To neglect unnecessary
information, we only took account of two variables: Patrol Zone and Offense
Code. For every crime, both its offense code and its location were given, so
we had enough data for analysis. We programmed Excel to give the output
in the following way: Z1 = [22 : 24, 23 : 325, 29 : 84, . . . ]
• The first two numbers represented the two numbers of the offense code
• The remaining numbers were the number of such incidents
• For example, there were 24 crimes that was coded ”22”
We played around with the NCIC code list before we proceeded with the
test.
• There were many different types of offense on the offense code list, but
we could categorize them nicely based on their first two numbers
7/22

3.1 Geographical Relationship ISYE 4699
• We excluded some offenses from the data because they were student
conducts, public order crimes, juvenile, invalid, or trivial to the overall
data
Using our manipulated data set, we generated a pivot table.
Figure 6: The pivot table (Location versus type of crimes)
Based on the table, we found out that Zone 2 had the most number
of crimes. Particularly, Zone 2 had the most number of assaults, burglary,
damage property, and stolen vehicles compared to other zones. In conclusion,
our approach could have worked better with more data. We will apply this
method on the Atlanta data later since we believe there will be enough data.
However, we concluded that we could not infer more information about the
relationship between location and type of crimes at Georgia Tech.
Crime Type Most frequent (# of Crimes) 2nd
most frequent (# of Crimes)
Assault Zone 2 (94) Zone 3 (24)
Burglary Zone 2 (79) Zone 4 (32)
Damage Property Zone 2 (171) Zone 1 (84)
Stolen Vehicle Zone 2 (43) Zone 1 (27)
Based on this approach, we could conclude that Zone 2 was the most dan-
gerous zone. It was difficult to find a relevance between types of crimes and
patrol zones because Zone 2 had so many more crimes than other zones –
there were not enough information about crimes in other zones. There were
explanations why there could not be enough data. First, the Georgia Tech
8/22

3.2 Questions ISYE 4699
campus was considered safe and did not have many crimes to record. Sec-
ond, many of the recorded crimes were minor, and after we filtered out them,
we only left with a few data. Last, there was not enough variables to take
account. There could have been more significant factors that contribute to
the result.
3.2 Questions
We have come up with some questions that needed to be answered in
order to continue our research. We will list them here:
1. How are the 4 zones divided into? Can we have a detailed description
of where each zone is?
2. There are 4 zones within Georgia Tech, and there are 2 more zones: off
campus and SAV. What does SAV mean?
3. There were many incidents that counted as ”minor” crimes. Are they
really insignificant enough to be excluded from our research, or should
we give more attention to them?
4 Atlanta Crime data
4.1 Time series and Seasonality
Time study of criminal data was helpful in revealing crime patterns on
time scale. With the 2011-2014 crime data, we grouped the entries by date
(occur_date) and crime type (UC2 Literal). We returned the count of each
crime type on every reported date and performed the time series analysis.
9/22

4.1 Time series and Seasonality ISYE 4699
In the time series plot of total crimes each day, we could observe a rough
seasonal pattern. We were unsure if we could detect this seasonal pattern
on all crime types or just on a few that influenced the result on total crime
rate. To figure out, we decomposed the data into different crime types and
performed the time series analysis. There were two notable crime types
that returned interesting patterns: aggravated assault (AGG_ASSULT) and
larceny (LARCENY). Therefore, we decided to investigate more on these
types of crimes. Below are the time series plots for them.
10/22

Moreover, using the additive single exponential method, we were able to
smooth the data and come up with cleaner diagrams:
The smoothened data plots showed us the trend of crime data. The
frequencies of both aggravated assault and larceny tended to peak around in
September, and they slowly dropped down to bottom in March.
Using the Holt-Winter’s method, we were able to apply weight on data
points, and we came up with an applicable model for current data points.
The diagram below shows our result of applying the Holt-Winters’ method
on larceny data. Red points represented the smoothed data points of our
model.
11/22

With the smoothed model, we were able to make a prediction on future
data points. We used this method on the larceny data points and made
a prediction on 100 more data points with a 95% prediction interval. The
residual plots are shown below.
12/22

In the following diagram, blue points represented the actual data, red
points showed the smoothed data points with lower weight on older data
and higher weight on later data, the green points gave the prediction for the
next 100 data points, and purple points were the upper and lower bounds
of 95% prediction interval of green points. The residual analysis plot of this
method was as follows. P-value of the Anderson-Darling Test was 0.009 –
it indicated that the residuals agreed with normality assumption. Residual
versus ﬁts plot showed that the residuals were randomly distributed, and it
supported our identical variance assumption. Hold-Winters’ method had a
mean absolute percentage error (MAPE) of 14.4879, mean absolute deviation
(MAD) of 6.1235, and mean squared deviation (MSD) of 60.0624. These
results were lower than those of single exponential method, which indicated
that the Holt-Winters’ method was an appropriate choice in this time study.
13/22

4.2 Hot spots ISYE 4699
4.2 Hot spots
We began our data analysis by checking whether there were areas of
concentrated crime in Atlanta. In order to locate these areas, called ”hot
spots,” we used four basic statistical tests: mean center, standard deviation,
standard deviation ellipses, and the test for clustering. The mean center
gave us the mean longitude and latitude of crimes, the standard deviation
showed how deviated the crimes were with respect to the mean center, and
the standard deviation ellipses visually showed which crimes were one stan-
dard deviation away from the mean center. Most importantly, the test for
clustering gave information on the closeness of crime locations.
The mean center we found was near the Fulton County Juvenile Court.
We ﬁgured that the mean center itself did not give much information about
hot spots. It was not necessarily true that crimes near the mean center
occurred with a high probability; however, it was useful as a comparison.
We could check where other crimes occurred in relation to the mean center.
Also, the result in standard deviation and standard deviation ellipses were
vague. The standard deviation ellipses did not map the concentrated area
of crimes – some areas of an ellipse had frequent occurrence of crimes, while
14/22

other areas within the same ellipse did not have many crimes. On the other
hand, the values obtained from the test for clustering were relative, and thus
were comparable. Therefore, we concluded that the test for clustering gave
the most accurate representation of hot spots among the four tests.
To test for clustering, we used the nearest neighbor index method. Simply
put, we generated random crime spots in Atlanta and compared how close
those spots were to how close actual crime spots were. The ratio between
the distances among observed data to distances among random data was
called the Nearest Neighbor Index (NNI). The smaller the NNI was, the more
clustered the data was. We could safely assume that data was clustered if
NNI was close to 0.5.
The NNI for all crimes in Atlanta was 0.543. This showed that there was
deﬁnitely a correlation between locations and crimes. Then we found NNI for
each types of crime. To minimize error, we calculated NNI several times and
took the average. Table 1 shows NNI for each type of crime. Since no NNI
was less than 1, all crimes were somehow clustered. Note that robbery was
most clustered and murder was least clustered. Except for murder and rape,
all other crimes’ NNI were below 0.5, which implied that it was worthy to
investigate the hotspots. One reason for robbery and theft having the lowest
NNI was their relatively frequent occurrence. The data showed that these
types of crimes appeared more frequently than the others. It was natural
that there were hot spots where victims were more vulnerable to robbery
and theft. On the contrary, since rape and murder took place less frequently
than other crimes, it was not surprising to observe more scatterings of data.
Type of crime NNI
Total 0.543
Assault 0.416
Burglary 0.448
Murder/Homicide 0.823
Rape 0.694
Robbery 0.258
Theft 0.371
Vehicle 0.414
Table 1: NNI for diﬀerent types of crimes
Some notable regions of hotspots for all types of crimes included the
areas along 10th street NW and along Peachtree street SW. Although not
15/22

many crimes occurred inside schools, there were many crimes reported near
colleges, including Georgia Tech, Georgia State University, Clark Atlanta
University, and Spelman College. Since we were specifically interested in the
relationship between robbery and auto theft, we compared the hot spots of
auto theft to the hot spots of robbery. We could observe that there were
some overlaps. We have yet to conduct a statistical test on the correlation
between the two crime types, but this seemed like a notable topic to study,
and we decided to do more research to figure out whether stolen cars were
used to commit other crimes.
Errors in analysis came from crimes not having the same amount of data.
A crime with the most data will most likely produce an accurate NNI, while
a crime with the least data will not be able to produce an accurate NNI.
Another error appeared when generating random crime spots on the map.
We had difficulty setting an exact boundary and instead generated random
points inside a rectangle that approximately resemble the border of Atlanta.
Furthermore, we assumed that the Earth was a 2 dimensional plane and used
the inappropriate formula for finding the distance between points. Instead,
our results would have been improved with a help of the Haversine formula:
d = 2r arcsin sin2
φ2 − φ1
2
+ cos(φ1)cos(φ2)sin2 λ2 − λ1
2
Even though there were many ways to compute more accurate NNI, cur-
rently calculated NNI will be sufficient when comparing clustering of a crime
to other crimes. However, we could develop a better algorithm to compute
the NNI, as the current algorithm computed numerous unnecessary infor-
mations. For instance, it calculated the distance between all points and
compared all values when we could have smartly selected a few points to
compare. We would improve our algorithm to incorporate the Voronoi dia-
gram and Fortune’s algorithm to reduce the computational time. This would
allow us to analyze more data in less time, and we will also be able to calcu-
late multidimensional data more efficiently.
As shown in Figure 1, the Voronoi diagram is a plot with points divided
up by half-planes. Subspaces are divided up such that each subspace con-
tains one point and that an imaginary line segment that connects two near
points are perpendicular to a borderline. Since the points are now somewhat
sorted, this diagram can find the nearest neighbor intelligently and has the
computational time of O(logn). The problem is that generating half-planes
16/22

take a long time O(n2
logn), and thus will slow down the process. Luckily, the
Fortune algorithm can find half-planes faster, and the big O of it is O(nlogn).
Therefore, combining two algorithms, we end up with the computation time
of O(nlogn).
(a) Voronoi diagram step 1 (b) Voronoi diagram step 2
(c) Fortune algorithm
Figure 9: Caption place holder
In addition to those improvements, we can also filter out avoidable calcu-
lations by identifying the unstable queries. An unstable query arbitrarily sets
a border around each point so that the algorithm determines which points to
include in its process. Along with the integration of algorithms stated above,
this improvement will further reduce the computational time. Additionally,
the algorithm can be used to find which points are located near a certain
point.
Finally, we will perform more statistical test on the data set. Our focus
will be to reduce errors and computation time as well as to locate zones
that need more attention by the officers. Once we have the algorithm, we
will be able to suggest new patrol routes to minimize the arrival time at the
crime site or the optimized number of officers in each patrol zone. Then by
17/22

4.3 Auto Theft ISYE 4699
comparing with the optimized solution, we can check how efficient current
resource allocation is.
4.3 Auto Theft
When we checked the hot spots, we noticed that the hot spots for auto
theft and for robbery had a lot of overlaps. We were interested in this obser-
vation and decided to test the relationship between auto thefts and robbery.
Then we realized that criminals’ primary goal of auto thefts was not to com-
mit robberies, but rather to sell those cars. If they did not sell cars right
away, however, they used that car to commit other crimes, including joyriding
(driving around freely), drug dealing, or robbing.
One way we used to find the correlation between auto thefts and other
crimes was by tracking a stolen car and checking if it was recorded again as
a suspect’s car. The most obvious way to do so was by comparing the license
plates. However, there was not enough information; many times, it was not
viable for witnesses to remember the license plate numbers. Instead, we
compared other attributes of the stolen vehicles and suspect vehicles. There
were too many information, so we filtered out less important information and
ended up with 60 variables. We reconstructed two data sets using them and
started our research.
One file contained all necessary information about auto theft, such as the
offense code and date of crime. Unfortunately, one fourth of crime data did
not contain any information about the stolen car, and could have been more
helpful with consistency and completeness of documentation. The other file
included the information about suspect vehicles. Here, we listed any vehicle
that was used to commit any type of crime. This data set also had insufficient
amount of data, but we wrote a code to use the best out of these two files.
While examining criminals’ habits, we came up with more questions, such
as the time delay between car stolen time and robbery time, car types that
were vulnerable to crimes, and how the stolen cars were used. In follow-
ing paragraphs, we will provide an analysis of police crime data along with
derived questions.
The easiest way to categorize cars was by their colors and makers, so we
made a color versus maker pivot table. The most noticeable information we
got from the pivot table was that Dodge, Chevrolet, and Ford were the most
popular car types and white, black, and silver were the most vulnerable
colors. Particularly, old models in 1990’s were targeted frequently. These
18/22

results were pretty intuitive, as those cars had weaker security systems and
criminals did not want to get noticed by robbing fancy cars. However, to our
surprise, the thieves showed more interest in luxury cars than we thought
they would do. We found out that the main reason for stealing those cars
despite their diﬃculty to do so was because those cars could be sold for high
prices.
Dodge Chevrolet Ford Honda
0
20
40
60
80
100
120
140
160
Types of cars
Numberofcars
White
Black
Silver
19/22

Figure 10: The popularities of car types and their years
It was obvious that criminals targeted old, common cars for easy theft
and expensive cars for high return. What was interesting, however, was that
criminals tended to take cars that were less valuable than cars they used
for robbery. In other words, they used newer cars to steal older cars. This
could be interpreted in two ways: they wanted small, easy money, or they
needed a new car to commit a new crime. We had to know what they did
with the stole cars. To do so, we found out how much time criminals spent
before committing a crime with their stolen cars. Out of 5270 auto stolen
oﬀenses and 4237 suspect vehicle cases, we found 48 exact matches. In these
48 cases, the average time a stolen car was spotted in another crime scene
20/22

4.4 Questions and Goals ISYE 4699
was about 4 hours, if we did not count for some cars that reappeared several
days later. Particularly, among the 48 cases, two cars were used to commit
multiple crimes in a short time period. Since the cars were used in crime
only a few hours after they were stolen and then were sold, we could infer
that the reason criminals stole cars was to make their crimes less traceable
and to earn some quick cash.
Suspect
vehicle
year
Suspect vehicle
maker
Stolen
vehicle
year
Stolen vehicle
maker
2002 Chevrolet 1999 Ford
2010 Dodge 2004 Dodge
2011 Toyota 1996 Honda
2001 Ford 1996 Honda
2008 Nissan 1984 Oldsmobilie
Table 2: Examples of suspect stealing less valuable cars
4.4 Questions and Goals
Our goals for next semester will be as follows.
1. Upgrade the model used for Georgia Tech crimes to use for Atlanta
crimes.
2. Develop a better algorithm on locating the hot spots
3. Find geographic matches and correlations among crimes
4. Suggest an optimized way of allocating resources.
5. Recognize crime patterns
To continue with our research, we needed more information about crimes.
We will list some questions that are preventing us from advancing.
1. Atlanta is divided up into 5 zones, but the Excel data shows the place
of crime by latitude and longitude, not by zones. Given the coordinate
of a place, is there a way of telling which zone that place is in?
21/22

4.5 Reference ISYE 4699
2. Has the crime criteria changed over the past few years? In other words,
is there a crime that was considered ”type A” crime but now ”type B?”
Now that we are familiar with the data and gained insights on crimes
in Atlanta, we are certain that our progress will speed up. There was a
limitation on the amount of data; however, we learned to make use of small
data to come up with noteworthy conclusions. We hope that we establish a
generalized algorithm that could be used in many cities.
4.5 Reference
Pictures of Voronoi Diagrams: https://www.youtube.com/watch?v=7eCrHAv6sYY
22/22

report

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to report

Similar to report (20)

report