- Project Title: Chicago crime analysis
- Course name: Principles and Practice in Data Mining
- Semester: Autumn 2016
- Professor: Yuran SEO
- Sungkyunkwan University
- Department: philosophy
- Name: jangyoung seo
- Contact: laiha10@naver.com
3. 2. Data preparation
I deleted some useless variables
(like ID, block, iUCR, Beat, FBI cord, etc..)
I change my record to analysis easy.
(like change TRUE to 1, change FALSE to 0)
4. 1. Proportion of Domestic crime
Domestic crime is just 15%
X axis = domestic or not
Y axis = proportion
5. 2. Crime occurrence – community
X axis = Community of Chicago
Y axis = Crime occurrence
6. 3. Crime occurrence – Primary.Type
X axis = crime type ( but omitted because of axis length )
Y axis = crime occurrence
7. 4. Crime type visualization
Used package :
“ggplot2”
X axis = Frequency
Y axis = crime type
8. 5. Crime description word cloud
Used packages :
“wordcloud”
“KoNlp”
“tm”
There are too many value
in Crime Description
So, I make it into wordcloud
Wordcloud is good tool to do
textmining.
9. 6. Location wordcloud
I used wordcloud method
Without extractNoun function
This is crime location
Emphasised word :
Street, Residence, Side walk
10. 7. Time series
X axis = Crime date
Y axis = occurrence
The number of crime is decreased in November than August
11. 8. Map Visualization
Used packages :
“ggmap”
“ggplot2”
This is Map of Chicago
With red point(=Crime)
12. 8. Map Visualization
Used packages :
“ggmap”
“ggplot2”
This is Map of Chicago
With red point(=Crime)
13. 9. Crime Type – Arrest Proportion
X axis :
Proportion of arrest
Y axis :
Crime type.
There are big
differences between
crime type
14. 10. District Arrest proportion
X axis :
Arrest proportion
Y axis :
District 1 ~ 31
15. 11. Chisq-Test
Chisq – Test is only method I can use.
Because all of variables in my dataset is
categorial
This result shows
Arrest and crime type is dependant
District and crime type is dependant
Arrest and District is dependant
16. 12. Verification
I execute chisq-Test one
more on data set
2015 May crime dataset.
And I deduce same results.