SlideShare a Scribd company logo
1 of 5
Download to read offline
Database and Analytics Programming
Sarthak Khare
School of Computing
National College of Ireland
Dublin, Ireland
Student ID: x18180485
Jayanta Behera
School of Computing
National College of Ireland
Dublin, Ireland
Student ID: x18188834
Darshana Gowda
School of Computing
National College of Ireland
Dublin, Ireland
Student ID: x18188842
Samruddhi Kanhere
School of Computing
National College of Ireland
Dublin, Ireland
Student ID: x18190634
Abstract—Crimes threaten social peace and also create panic
amongst the society. It is not only the responsibility of law
enforcement agencies to maintain law and order but also of
civilians to remain vigilant and report any unlawful activities in
their vicinity. In order to find a relationship between complaints
lodged at the stations, the number of arrests and court summons
and the prison admissions in the city of New York, we have
performed analysis on data for the year 2018. We have created
visualizations based on features that were common to all the 4
datasets. It has been observed that overall the number of prison
admissions is lower than the complaints lodged. The numbers
further dwindle as we move to court summons and arrests.
Analysis based on Age and Gender for all the 5 boroughs of New
York showed that there was a greater number of males as opposed
to females at every stage and that most of the alleged criminals
fell under the 25-44 age category. A comparative analysis of
count of crimes per capita for all the boroughs revealed that
the highest number of crimes occurred in Bronx, followed by
Manhattan, Brooklyn, Staten Island and Queens.
Index Terms—Crime, New York, database, visualizations
I. INTRODUCTION
Regulation of crime rates and assurance of appropriate
justice is not only essential for the victims but also for
the society altogether. If justice is to prevail, and criminals
punished, crimes need to be reported in forms of complaints
and the same needs to be worked upon by the law enforcement
department to bring justice to the victims. As a part of this
project, our objective is to gain insights from the patterns of
the crimes that are accounted for, in the city of New York. We
will further explore the complaints, arrests, court summons and
prison admissions data and perform a comparative analysis
to understand the relationship between them. We will also
investigate the crime rates for the 5 boroughs in New York
City based on features like Gender, Age Group etc. Analysis
of crimes is essential in helping the law enforcement agen-
cies take effective measures for prevention and reduction of
crimes. It will allow them to gain a better perspective and
boost pre-emptive actions such as increased patrolling and
surveillance which would help reduce criminal activities. The
choice of data resonates with the objective of comparing and
individually analyzing the crime rate, the arrests and court
summons for the crimes as well as the incarcerations at a
given borough. The research question that we aim to answer is:
Are the number of prison admissions commensurate with the
number of complaints registered for various crimes committed
across the administrative districts of New York City?
II. RELATED WORK
Several researches have been performed on crime and
visualization have been created to find patterns and trends in
the crimes based on location and time, as well as type of
crime which will help in predicting if crime could happen in
a certain location or a certain time of day or week. This has
aided the law enforcement personnel to be more vigilant and
take preventive measure to reduce the number of crimes. We
have studied a variety of such research work and attempted to
find researches that are similar or related our objectives.
An analysis has been performed where clustering techniques
are used on Stop, Question and Frisk dataset. The analysis and
prediction done as a part of this research helped in identifying
locations which require higher amount of police patrolling [1].
Visualizations have been used by Bayoumi et al. to identify
the most common location, day of week and time of day for
different categories of crime. As per their analysis, crimes
against people occur more frequently at night as compared to
any other time of the day. Crimes against properties take place
late in the morning or early in the afternoon. These insights
were provided to the law enforcement personnel which could
help them take quick decisions [2].
Using big data analytics, Feng et al. discovered discerning
facts and patterns from the criminal data of three major cities
in the United States. Their aim was to help the police depart-
ment to understand crime in a better way, knowledge of which
can be used for crime detection and for undertaking preventive
measures [3]. Clustering techniques and Association rules
are also used in [4] for investigating crime data and finding
means to prevent the same. Formal concept analysis was
performed on crime data for different geographical locations.
Crimes were split into different categories based on common
attributes. This helped build a more defined model for crime
analysis based on geographical distribution [5].
Text analytics has also been used to perform crime analysis
by Ku, Nyugen and Leroy [6]. An efficient decision control
system was developed for the lesser trained security personnel
using natural language processing that provided an efficient
way to investigate crime with better accuracy.
Analysis was performed using Geo-Spatial data from the
year 2003 to 2015 for San Francisco [7] and it was observed
that the western coast of San Francisco was far safer as
compared to the east coast. As per the analysis of crime over a
period, it was observed that crime occurred mostly during the
weekend. The author identified the three most unsafe regions
of the city using Hotspot technology.
In order to deal with the crime rates, Shah et al. [8]
proposed a framework which would take crime related data
and transform it into visual reports. They have used graphical
representations for summarizing their findings. Live heatmaps
of locations which high density of crimes were created, and
clustering algorithms were implemented on geographical loca-
tions to identify patterns in crimes committed. These assisted
in taking pro-active measures for reduction of crimes.
Many researches have also been performed on crimes and
predictive analysis has been used to predict future occurrences
of crimes [9]. Based on history of criminal incidents, Sivana-
galeela and Rajesh performed clustering of criminal activities.
They have generated a pattern to identify the crime areas based
on achieved data prior to occurrence, which would eventually
help reduce incidents related to crime.
All these have been used for crime detection where im-
portance of big data analysis and data mining methods has
been emphasized, and prediction of crime has been carried
out. However, a comparative analysis of the legal steps has
not been executed as such. From our research of comparing
the proportion of complaints and arrest against court summons
and imprisonment, we can see that the proportions vary by
a considerable amount. These results should help the law
enforcement and judiciary authorities to look back and identify
if the reasons behind this are something to be worked upon.
III. METHODOLOGY
To achieve the objectives of the project, a series of steps
have been followed. A diagrammatic representation of the
process flow followed can be seen in Fig. 1
Fig. 1. Process Flow diagram
A. Data Collection
The first step in this process is gathering appropriate data.
The data for New York City for the year 2018 has been
extracted for analysis. An outline of the 4 related datasets
used for the project is as follows:
• Complaints: This dataset has all the criminal complaints
lodged by victims and witnesses in New York city. It has
about 450k records and 35 features.
• Arrests: This dataset contains information of all the
arrests that took place in the selected year of interest.
It has about 250k rows and 18 attributes.
• Court Summons: The dataset includes information of all
the criminal summons that happened. It has about 89k
rows and 16 features.
• Prison Admissions: Information about all the prison ad-
missions is contained in the above dataset which has a
little above 19k rows and 9 attributes.
All the four datasets in JSON format are programmatically
extracted using open APIs. The first three datasets are obtained
from the New York open data (https://data.cityofnewyork.us/)
whereas the fourth dataset is from the data.gov website. Also,
the population data of New York for the year 2018 is web
scraped.
B. Unstructured Data Storage
As the collected data is in JSON format, MongoDB database
has been used for its storage. The data gathered has been
split and pushed into MongoDB in the form of documents.
MongoDB is an open-source database and is the best for
storing structures like that of JSON
C. Data Preprocessing
This is the most important step of the end to end process.
In this step, the records have been fetched from MongoDB
and converted to dataframe for cleaning the data. All the
preprocessing and transformation has been done using pandas
dataframe.
• Feature Selection: All the unnecessary columns except
the columns required for the analysis are dropped. The
columns such as Gender, Borough (Administrative Dis-
trict), Age Group have been selected for the analysis.
• Feature Calculation: In the Prison Admissions dataset,
the age data present is continuous in nature whereas the
other three datasets contain categorical age data. A new
column has been added to the dataset to capture the age
in the form of categorical values that match the other
3 datasets. Borough column has been introduced and
boroughs corresponding to the county data present in the
dataset have been populated. The complaints, arrests and
court summons datasets have date column, which have
been used to calculate the day and month.
• Missing Data: The missing data are imputed based on the
normal distribution.
• Dealing with Missing Data and NA values: For features
containing higher proportion of missing values, data has
been imputed based on distribution plots to avoid loss
of essential data. Rows have been dropped for features
containing fewer proportions of NA or missing values.
D. Structured Data Storage
In this step, the pre-processed data in the dataframe has
been converted to CSV. To store this clean data, which is
in a structured format, PostgreSQL database has been used.
PostgreSQL being an open-source relational database best
suited for storing structured data.
E. Visualizations and Analysis
In this step, the data has been extracted into pandas
dataframe from PostgreSQL database. This data is used for
further analysis and visualizations. Various visualizations are
created such that they answer the proposed objectives and
research question. All the steps are carried out using Python
programming language. Python being an open source and easy
to use language, provides a variety of packages for analyzing
and visualizing data. To create the visualizations, Python
packages such as Matplotlib, Seaborn and Altair have been
used. Seaborn package is an extension of Matplotlib. Altair is
another user-friendly python package used for visualizations.
The process has been programmed to accommodate user input.
This need has been carried out by taking the year as an input
from the user. The code has been written to accommodate any
data with the same structure. GitHub has been used by the en-
tire team as a version control tool for sharing and maintaining
the codes, data and visualization results throughout the period
of completion of the project.
IV. RESULTS
This section will cover the visualizations and the results
obtained for the analysis which was conducted above.
From Fig. 2, we can observe that the number of complaints
received by the New York Police Department are the highest,
followed by number of arrests made by the department.
However, the number of court summons and incarcerations
are significantly lower than the other two.
Fig. 2. Monthly Crime Count
As the area chart in Fig. 2 gives just an overall trend, we
have plotted the individual trends for all the 4 datasets for
detailed analysis of trends. The line chart, as seen in Fig. 3,
helps us deduce the trend for the year 2018. Here, we can see
there is an overall decrease in the crime as the year progresses,
for all the 4 categories. However, during the months of May to
August, complaints made are the highest which then decline
towards the end of the year.
Fig. 3. Monthly Crime Count - Individual Analysis
As observed in Fig. 4, top 10 crimes for complaints and
arrests are very similar, ‘Petit Larceny’ is at the top in
complaints and takes the 3rd
spot in arrests, similarly ‘Assault
3’ also appears in the top 3 in both the categories. However,
if we look at court summons and prison categories, we can
see the top 10 crimes are very dissimilar to complaints and
arrests. Court summons are dominated by crimes like ‘Mo-
tor vehicle Safety Regulations’ and ‘Marijuana Possessions’,
while, incarcerations are mostly made in violent categories of
crimes such as ‘Possession of Weapons’, ‘Robbery’ etc.
Fig. 4. Top 10 Crimes
Fig. 5 gives the total count in each of the categories
by different boroughs of NYC. Here we can see, Brooklyn
gets the highest number of complaints and arrests, whereas,
Manhattan leads in court summons and prison admissions.
Staten Island appears to be the safest of all the boroughs
having the lowest counts in each of the categories.
The above analysis does not give an accurate picture of
the proportions as the population of the boroughs have not
been accounted for. Hence, we plotted the same chart taking
into consideration the population of the boroughs. The count
per capita has been calculated by dividing the individual
count by the population of each of the borough. The updated
Fig. 5. Count by Borough
plot can be seen in Fig. 6. We can now notice that Bronx
actually has the highest number of complaints and arrests,
although Manhattan still leads the way in court summons and
incarcerations. Earlier, we had deemed Staten Island to be the
safest borough. However, we can now see Staten Island is
the 2nd
safest and Queens takes its place in being the safest
borough in NYC.
Fig. 6. Count per Capita by Borough
The stacked bar charts in figure 7 and 8, show an analysis
of the crimes committed by age groups and gender in each of
the categories by boroughs.
It can be inferred from these 2 figures that people belonging
to age-group ’25-44’ commit the highest number of crimes
in every category and in every borough and more of the
Fig. 7. Count by Age
crimes are committed by the male members of the society
as compared to females.
Fig. 8. Count by Gender
Fig. 9. Heat Map of Arrests
We have also plotted a heatmap, as seen in Fig. 9, to capture
the areas where the arrests were high in numbers. It shows that
the areas along the borders have lesser arrests whereas there
are higher number of arrests concentrated in the city centers.
V. CONCLUSIONS AND FUTURE WORK
We visualized all the datasets collected from multiple links
and tried to find a relationship between criminal complaints
and juristic conviction. The visualizations provided a clear
picture of the proportion of each of the individual entities.
We found that for the number of criminal complaints lodged,
the arrests made proportionate to around two-third of the
complaints. It could be inferred that an average of 2 arrests
were made by New York police for every 3 complaints lodged
and the trend remains similar throughout the year. The most
common types of crimes of the two entities are observed to
be violence and harassment. The pattern of both complaints
and arrests dropped by the end of the year.
However, we find a steep downfall in the numbers when it
comes to court proceedings. The proportions of court summons
fall below 30% as compared to the arrests. It could be inferred
that the cases might be reverted by the victims or the arrests
did not go to court. From the court summon datasets, the
greatest number of crime types differs as compared to the
complaints and arrests.
From the statistical graphs, it was also inferred that around
70% of the summons are convicted by the court. When
it comes to criminal inference, it was observed that male
commit more crime in comparison to female across all the
boroughs. Most arrests and complaints are lodged against
young adults and middle ages people (age group 25-44 years).
Of the 5 boroughs in New York, Bronx and Manhattan are
comparatively unsafe as the per capita crime rates are higher
than that in Queens and Staten Island.
However, all our researches were limited to New York city,
where we analyzed data only for the year 2018. Additional
datasets are required to draw a firm conclusion about the crime
pattern and criminal justice across US. Data related to reasons
explaining why arrests not being made for the complaints
lodged or why arrests were not taken to court or why the victim
was not imprisoned along with details of date and time of entry
could help gain insights on what factors specifically affect the
proportions. Considering all the information available, a more
accurate crime prediction could be performed using machine
learning methods.
REFERENCES
[1] A. A. Alkhaibari and Ping-Tsai Chung. Cluster analysis for reducing
city crime rates. In 2017 IEEE Long Island Systems, Applications and
Technology Conference (LISAT), pages 1–6, May 2017.
[2] S. Bayoumi, S. AlDakhil, E. AlNakhilan, E. A. Taleb, and H. AlShabib.
A review of crime analysis and visualization. case study: Maryland state,
usa. In 2018 21st Saudi Computer Society National Computer Conference
(NCC), pages 1–6, April 2018.
[3] M. Feng, J. Zheng, J. Ren, A. Hussain, X. Li, Y. Xi, and Q. Liu. Big data
analytics and mining for effective visualization and trends forecasting of
crime data. IEEE Access, 7:106111–106123, 2019.
[4] Hossein Hassani, Xu Huang, Emmanuel Silva, and Mansi Ghodsi. A
review of data mining applications in crime. Statistical Analysis and
Data Mining, 9, 04 2016.
[5] Quist-Aphetsi Kester. Visualization and analysis of geographical crime
patterns using formal concept analysis. INTERNATIONAL JOURNAL OF
REMOTE SENSING AND GEOSCIENCE(IJRSG), 2, 07 2013.
[6] C. Ku, J. H. Nguyen, and G. Leroy. Tasc - crime report visualization
for investigative analysis: A case study. In 2012 IEEE 13th International
Conference on Information Reuse Integration (IRI), pages 466–473, Aug
2012.
[7] Darshan Shah and Ryan Leonard. San francisco crime visualization.
International Journal of Computer Applications, 181:13–19, 07 2018.
[8] Samiullah Shah, Vijdan Khalique, Salahuddin Saddar, and Naeem Ma-
hoto. A framework for visual representation of crime information. Indian
Journal of Science and Technology, 10:1–8, 12 2017.
[9] B. Sivanagaleela and S. Rajesh. Crime analysis and prediction using fuzzy
c-means algorithm. In 2019 3rd International Conference on Trends in
Electronics and Informatics (ICOEI), pages 595–599, April 2019.

More Related Content

Similar to Database and Analytics Programming - Project report

Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisIOSR Journals
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime AnalysisTom Donoghue
 
Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxjohniemcm5zt
 
Investigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysisInvestigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysisZhongLI28
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...IJITCA Journal
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...IJITCA Journal
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...IJITCA Journal
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...IJITCA Journal
 
The Relevance of Crime Mapping in Relation to.pptx
The Relevance of Crime Mapping in Relation to.pptxThe Relevance of Crime Mapping in Relation to.pptx
The Relevance of Crime Mapping in Relation to.pptxelsiegumoc0
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxaudeleypearl
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxroushhsiu
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docxhealdkathaleen
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docxtodd271
 
06 analysis of crime
06 analysis of crime06 analysis of crime
06 analysis of crimeJim Gilmer
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceKaushik Rajan
 
Disadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingDisadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingChristina Ramirez
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime AnalysisParang Saraf
 

Similar to Database and Analytics Programming - Project report (20)

Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
 
report
reportreport
report
 
Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docx
 
Investigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysisInvestigating crimes using text mining and network analysis
Investigating crimes using text mining and network analysis
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
 
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
APPLYING DATA ENVELOPMENT ANALYSIS AND CLUSTERING ANALYSIS IN ENHANCING THE P...
 
The Relevance of Crime Mapping in Relation to.pptx
The Relevance of Crime Mapping in Relation to.pptxThe Relevance of Crime Mapping in Relation to.pptx
The Relevance of Crime Mapping in Relation to.pptx
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
 
Running head CRIME ANALYSIS .docx
Running head CRIME ANALYSIS                                     .docxRunning head CRIME ANALYSIS                                     .docx
Running head CRIME ANALYSIS .docx
 
06 analysis of crime
06 analysis of crime06 analysis of crime
06 analysis of crime
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduce
 
Disadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingDisadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led Policing
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Database and Analytics Programming - Project report

  • 1. Database and Analytics Programming Sarthak Khare School of Computing National College of Ireland Dublin, Ireland Student ID: x18180485 Jayanta Behera School of Computing National College of Ireland Dublin, Ireland Student ID: x18188834 Darshana Gowda School of Computing National College of Ireland Dublin, Ireland Student ID: x18188842 Samruddhi Kanhere School of Computing National College of Ireland Dublin, Ireland Student ID: x18190634 Abstract—Crimes threaten social peace and also create panic amongst the society. It is not only the responsibility of law enforcement agencies to maintain law and order but also of civilians to remain vigilant and report any unlawful activities in their vicinity. In order to find a relationship between complaints lodged at the stations, the number of arrests and court summons and the prison admissions in the city of New York, we have performed analysis on data for the year 2018. We have created visualizations based on features that were common to all the 4 datasets. It has been observed that overall the number of prison admissions is lower than the complaints lodged. The numbers further dwindle as we move to court summons and arrests. Analysis based on Age and Gender for all the 5 boroughs of New York showed that there was a greater number of males as opposed to females at every stage and that most of the alleged criminals fell under the 25-44 age category. A comparative analysis of count of crimes per capita for all the boroughs revealed that the highest number of crimes occurred in Bronx, followed by Manhattan, Brooklyn, Staten Island and Queens. Index Terms—Crime, New York, database, visualizations I. INTRODUCTION Regulation of crime rates and assurance of appropriate justice is not only essential for the victims but also for the society altogether. If justice is to prevail, and criminals punished, crimes need to be reported in forms of complaints and the same needs to be worked upon by the law enforcement department to bring justice to the victims. As a part of this project, our objective is to gain insights from the patterns of the crimes that are accounted for, in the city of New York. We will further explore the complaints, arrests, court summons and prison admissions data and perform a comparative analysis to understand the relationship between them. We will also investigate the crime rates for the 5 boroughs in New York City based on features like Gender, Age Group etc. Analysis of crimes is essential in helping the law enforcement agen- cies take effective measures for prevention and reduction of crimes. It will allow them to gain a better perspective and boost pre-emptive actions such as increased patrolling and surveillance which would help reduce criminal activities. The choice of data resonates with the objective of comparing and individually analyzing the crime rate, the arrests and court summons for the crimes as well as the incarcerations at a given borough. The research question that we aim to answer is: Are the number of prison admissions commensurate with the number of complaints registered for various crimes committed across the administrative districts of New York City? II. RELATED WORK Several researches have been performed on crime and visualization have been created to find patterns and trends in the crimes based on location and time, as well as type of crime which will help in predicting if crime could happen in a certain location or a certain time of day or week. This has aided the law enforcement personnel to be more vigilant and take preventive measure to reduce the number of crimes. We have studied a variety of such research work and attempted to find researches that are similar or related our objectives. An analysis has been performed where clustering techniques are used on Stop, Question and Frisk dataset. The analysis and prediction done as a part of this research helped in identifying locations which require higher amount of police patrolling [1]. Visualizations have been used by Bayoumi et al. to identify the most common location, day of week and time of day for different categories of crime. As per their analysis, crimes against people occur more frequently at night as compared to any other time of the day. Crimes against properties take place late in the morning or early in the afternoon. These insights were provided to the law enforcement personnel which could help them take quick decisions [2]. Using big data analytics, Feng et al. discovered discerning facts and patterns from the criminal data of three major cities in the United States. Their aim was to help the police depart- ment to understand crime in a better way, knowledge of which can be used for crime detection and for undertaking preventive measures [3]. Clustering techniques and Association rules are also used in [4] for investigating crime data and finding means to prevent the same. Formal concept analysis was performed on crime data for different geographical locations. Crimes were split into different categories based on common attributes. This helped build a more defined model for crime analysis based on geographical distribution [5]. Text analytics has also been used to perform crime analysis by Ku, Nyugen and Leroy [6]. An efficient decision control system was developed for the lesser trained security personnel using natural language processing that provided an efficient way to investigate crime with better accuracy. Analysis was performed using Geo-Spatial data from the year 2003 to 2015 for San Francisco [7] and it was observed that the western coast of San Francisco was far safer as compared to the east coast. As per the analysis of crime over a
  • 2. period, it was observed that crime occurred mostly during the weekend. The author identified the three most unsafe regions of the city using Hotspot technology. In order to deal with the crime rates, Shah et al. [8] proposed a framework which would take crime related data and transform it into visual reports. They have used graphical representations for summarizing their findings. Live heatmaps of locations which high density of crimes were created, and clustering algorithms were implemented on geographical loca- tions to identify patterns in crimes committed. These assisted in taking pro-active measures for reduction of crimes. Many researches have also been performed on crimes and predictive analysis has been used to predict future occurrences of crimes [9]. Based on history of criminal incidents, Sivana- galeela and Rajesh performed clustering of criminal activities. They have generated a pattern to identify the crime areas based on achieved data prior to occurrence, which would eventually help reduce incidents related to crime. All these have been used for crime detection where im- portance of big data analysis and data mining methods has been emphasized, and prediction of crime has been carried out. However, a comparative analysis of the legal steps has not been executed as such. From our research of comparing the proportion of complaints and arrest against court summons and imprisonment, we can see that the proportions vary by a considerable amount. These results should help the law enforcement and judiciary authorities to look back and identify if the reasons behind this are something to be worked upon. III. METHODOLOGY To achieve the objectives of the project, a series of steps have been followed. A diagrammatic representation of the process flow followed can be seen in Fig. 1 Fig. 1. Process Flow diagram A. Data Collection The first step in this process is gathering appropriate data. The data for New York City for the year 2018 has been extracted for analysis. An outline of the 4 related datasets used for the project is as follows: • Complaints: This dataset has all the criminal complaints lodged by victims and witnesses in New York city. It has about 450k records and 35 features. • Arrests: This dataset contains information of all the arrests that took place in the selected year of interest. It has about 250k rows and 18 attributes. • Court Summons: The dataset includes information of all the criminal summons that happened. It has about 89k rows and 16 features. • Prison Admissions: Information about all the prison ad- missions is contained in the above dataset which has a little above 19k rows and 9 attributes. All the four datasets in JSON format are programmatically extracted using open APIs. The first three datasets are obtained from the New York open data (https://data.cityofnewyork.us/) whereas the fourth dataset is from the data.gov website. Also, the population data of New York for the year 2018 is web scraped. B. Unstructured Data Storage As the collected data is in JSON format, MongoDB database has been used for its storage. The data gathered has been split and pushed into MongoDB in the form of documents. MongoDB is an open-source database and is the best for storing structures like that of JSON C. Data Preprocessing This is the most important step of the end to end process. In this step, the records have been fetched from MongoDB and converted to dataframe for cleaning the data. All the preprocessing and transformation has been done using pandas dataframe. • Feature Selection: All the unnecessary columns except the columns required for the analysis are dropped. The columns such as Gender, Borough (Administrative Dis- trict), Age Group have been selected for the analysis. • Feature Calculation: In the Prison Admissions dataset, the age data present is continuous in nature whereas the other three datasets contain categorical age data. A new column has been added to the dataset to capture the age in the form of categorical values that match the other 3 datasets. Borough column has been introduced and boroughs corresponding to the county data present in the dataset have been populated. The complaints, arrests and court summons datasets have date column, which have been used to calculate the day and month. • Missing Data: The missing data are imputed based on the normal distribution. • Dealing with Missing Data and NA values: For features containing higher proportion of missing values, data has been imputed based on distribution plots to avoid loss of essential data. Rows have been dropped for features containing fewer proportions of NA or missing values.
  • 3. D. Structured Data Storage In this step, the pre-processed data in the dataframe has been converted to CSV. To store this clean data, which is in a structured format, PostgreSQL database has been used. PostgreSQL being an open-source relational database best suited for storing structured data. E. Visualizations and Analysis In this step, the data has been extracted into pandas dataframe from PostgreSQL database. This data is used for further analysis and visualizations. Various visualizations are created such that they answer the proposed objectives and research question. All the steps are carried out using Python programming language. Python being an open source and easy to use language, provides a variety of packages for analyzing and visualizing data. To create the visualizations, Python packages such as Matplotlib, Seaborn and Altair have been used. Seaborn package is an extension of Matplotlib. Altair is another user-friendly python package used for visualizations. The process has been programmed to accommodate user input. This need has been carried out by taking the year as an input from the user. The code has been written to accommodate any data with the same structure. GitHub has been used by the en- tire team as a version control tool for sharing and maintaining the codes, data and visualization results throughout the period of completion of the project. IV. RESULTS This section will cover the visualizations and the results obtained for the analysis which was conducted above. From Fig. 2, we can observe that the number of complaints received by the New York Police Department are the highest, followed by number of arrests made by the department. However, the number of court summons and incarcerations are significantly lower than the other two. Fig. 2. Monthly Crime Count As the area chart in Fig. 2 gives just an overall trend, we have plotted the individual trends for all the 4 datasets for detailed analysis of trends. The line chart, as seen in Fig. 3, helps us deduce the trend for the year 2018. Here, we can see there is an overall decrease in the crime as the year progresses, for all the 4 categories. However, during the months of May to August, complaints made are the highest which then decline towards the end of the year. Fig. 3. Monthly Crime Count - Individual Analysis As observed in Fig. 4, top 10 crimes for complaints and arrests are very similar, ‘Petit Larceny’ is at the top in complaints and takes the 3rd spot in arrests, similarly ‘Assault 3’ also appears in the top 3 in both the categories. However, if we look at court summons and prison categories, we can see the top 10 crimes are very dissimilar to complaints and arrests. Court summons are dominated by crimes like ‘Mo- tor vehicle Safety Regulations’ and ‘Marijuana Possessions’, while, incarcerations are mostly made in violent categories of crimes such as ‘Possession of Weapons’, ‘Robbery’ etc. Fig. 4. Top 10 Crimes Fig. 5 gives the total count in each of the categories by different boroughs of NYC. Here we can see, Brooklyn gets the highest number of complaints and arrests, whereas, Manhattan leads in court summons and prison admissions. Staten Island appears to be the safest of all the boroughs having the lowest counts in each of the categories. The above analysis does not give an accurate picture of the proportions as the population of the boroughs have not been accounted for. Hence, we plotted the same chart taking into consideration the population of the boroughs. The count per capita has been calculated by dividing the individual count by the population of each of the borough. The updated
  • 4. Fig. 5. Count by Borough plot can be seen in Fig. 6. We can now notice that Bronx actually has the highest number of complaints and arrests, although Manhattan still leads the way in court summons and incarcerations. Earlier, we had deemed Staten Island to be the safest borough. However, we can now see Staten Island is the 2nd safest and Queens takes its place in being the safest borough in NYC. Fig. 6. Count per Capita by Borough The stacked bar charts in figure 7 and 8, show an analysis of the crimes committed by age groups and gender in each of the categories by boroughs. It can be inferred from these 2 figures that people belonging to age-group ’25-44’ commit the highest number of crimes in every category and in every borough and more of the Fig. 7. Count by Age crimes are committed by the male members of the society as compared to females. Fig. 8. Count by Gender Fig. 9. Heat Map of Arrests We have also plotted a heatmap, as seen in Fig. 9, to capture the areas where the arrests were high in numbers. It shows that the areas along the borders have lesser arrests whereas there are higher number of arrests concentrated in the city centers.
  • 5. V. CONCLUSIONS AND FUTURE WORK We visualized all the datasets collected from multiple links and tried to find a relationship between criminal complaints and juristic conviction. The visualizations provided a clear picture of the proportion of each of the individual entities. We found that for the number of criminal complaints lodged, the arrests made proportionate to around two-third of the complaints. It could be inferred that an average of 2 arrests were made by New York police for every 3 complaints lodged and the trend remains similar throughout the year. The most common types of crimes of the two entities are observed to be violence and harassment. The pattern of both complaints and arrests dropped by the end of the year. However, we find a steep downfall in the numbers when it comes to court proceedings. The proportions of court summons fall below 30% as compared to the arrests. It could be inferred that the cases might be reverted by the victims or the arrests did not go to court. From the court summon datasets, the greatest number of crime types differs as compared to the complaints and arrests. From the statistical graphs, it was also inferred that around 70% of the summons are convicted by the court. When it comes to criminal inference, it was observed that male commit more crime in comparison to female across all the boroughs. Most arrests and complaints are lodged against young adults and middle ages people (age group 25-44 years). Of the 5 boroughs in New York, Bronx and Manhattan are comparatively unsafe as the per capita crime rates are higher than that in Queens and Staten Island. However, all our researches were limited to New York city, where we analyzed data only for the year 2018. Additional datasets are required to draw a firm conclusion about the crime pattern and criminal justice across US. Data related to reasons explaining why arrests not being made for the complaints lodged or why arrests were not taken to court or why the victim was not imprisoned along with details of date and time of entry could help gain insights on what factors specifically affect the proportions. Considering all the information available, a more accurate crime prediction could be performed using machine learning methods. REFERENCES [1] A. A. Alkhaibari and Ping-Tsai Chung. Cluster analysis for reducing city crime rates. In 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), pages 1–6, May 2017. [2] S. Bayoumi, S. AlDakhil, E. AlNakhilan, E. A. Taleb, and H. AlShabib. A review of crime analysis and visualization. case study: Maryland state, usa. In 2018 21st Saudi Computer Society National Computer Conference (NCC), pages 1–6, April 2018. [3] M. Feng, J. Zheng, J. Ren, A. Hussain, X. Li, Y. Xi, and Q. Liu. Big data analytics and mining for effective visualization and trends forecasting of crime data. IEEE Access, 7:106111–106123, 2019. [4] Hossein Hassani, Xu Huang, Emmanuel Silva, and Mansi Ghodsi. A review of data mining applications in crime. Statistical Analysis and Data Mining, 9, 04 2016. [5] Quist-Aphetsi Kester. Visualization and analysis of geographical crime patterns using formal concept analysis. INTERNATIONAL JOURNAL OF REMOTE SENSING AND GEOSCIENCE(IJRSG), 2, 07 2013. [6] C. Ku, J. H. Nguyen, and G. Leroy. Tasc - crime report visualization for investigative analysis: A case study. In 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), pages 466–473, Aug 2012. [7] Darshan Shah and Ryan Leonard. San francisco crime visualization. International Journal of Computer Applications, 181:13–19, 07 2018. [8] Samiullah Shah, Vijdan Khalique, Salahuddin Saddar, and Naeem Ma- hoto. A framework for visual representation of crime information. Indian Journal of Science and Technology, 10:1–8, 12 2017. [9] B. Sivanagaleela and S. Rajesh. Crime analysis and prediction using fuzzy c-means algorithm. In 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pages 595–599, April 2019.