Dataset from National Institute of justice about the crimes of San Francisco. Apply Network Analysis after calculation of distances between different crime points as nodes of a city and then put the approach.
2. PRECAP
Idea on How to Analyze, Visualize and Try to Forecast the Crime .
Network is a set of defined nodes and edges, and we try to study the
pattern of the edges between nodes depending on the type of crime,
number of occurrences of the different crimes according to the date of
occurrence
Our approach is completely trivial and we try to analyze the network
formed on the given datasets after we visualize and plot it.
The network matrices which we use to analyze are more of a known
testing and training set of manual way, and This can implemented on a
large scale further on.
3. Existing Other Approaches.
Trivial Approach as in knowing what happened before and expecting/
predicting what would happen later on.
Data Mining Techniques:- Association Rule Mining, Support Vector
Machines, Linear Regression Methods, Neural Networks.
Machine Learning Techniques:- Data Mining Techniques can be practically
evaluated using Machine Learning Methods.
After Plotting the Graph about the datasets , we can Visualize it and using
the metrics, can come to conclusion about a Particular event we are trying
to forecast.
4. Problem Description
Predict the occurrence of Crime using a set of already computed datasets.
Plot the data as a network and analysis of the network.
Understand the behavior of the networks which would form upon
visualization.
Visualization always speeds up the process of analysis due to the fact that
A single image(Network) explains what a set of 100 lines of description of
the data would do.
Node to Node Analysis to be done and understand the movement of the
path of the network to ease the goal of prediction
5. AGENDA
Collection of Datasets
Manual Analysis of the datasets and About the type of representation of
datasets.
Classifying datasets and Removing Unwanted data from the Dataset.
Finding Nodes and Formulating the Edges.
Plotting the Network and Visualization.
Analysis and use the results to Predict the outcomes.
6. Dataset Description
We used the dataset from National Institute of Justice.
Dataset Contains records about
-- Crime Incidents that were recorded for February 2001.
-- ( X and Y ) Co-ordinates of the places of occurrences of the crime
-- Type of Crime and The Final Case type on Dispute.
-- Category of Crimes.
-- Call Groups
-- Occurrence Type
-- Census
7. Continued…
We considered 2 datasets of crime incidents on 27 & 28 February 2017.
Total number of cases Registered were 572 and 545 Respectively.
So total Number of Edges were Formulated as a Matrix by finding the distances
between the co-ordinates using the Euclidean Distance Formula.
8. Visualization
The individual data sets When plotted Primitively as a graph actually Looked Like
what I have shown in the Next slides.
Later we have plotted the occurrences in gephi software for visualization and
analysis.
Used Plotly to display chart about the data-set without any classification.
Used Gephi since it is Easy and adaptable though the size of files limited.
Number of Nodes :- Taken into consideration was 4 in both graphs.
Number of edges formed were :-
13. Analysis
After the visualization,
We did the manual analysis as to how the crimes are changing in numbers
between the networks on comparison with the network weights. Which is the
number of crimes committed on both the particular days.
SO using that like how I have stated in the paper, we can predict whether the
crimes are going to decrease or increase at particular co-ordinates which are
common on both of the networks.
14. Challenges
We figured out how to go about it very late and that was the main problem.
If we had more time, we would have been able to do a more thorough study. This would
include using more of the data set and creating more visualizations.
Another problem that we had during this study was our lack of awareness for some of
the tools being used. With more experience of some of these tools like Gephi, we would
be able to create more intricate models.
Without the limitation of resources, such as time and experience with tools, a more
advanced analysis of the possibility of crime prediction. This would lead to the ability to
test our model on a larger scale.
15. Conclusion
The metrics we used were completely a manual approach and when we did it, we
found the approach was effective , because the network formed and implemented
was good enough to describe what we needed and expected from our approach
This approach was implemented on just data of two days, but then ca be
implemented for large datasets.
With this trivial approach of comparison of similar nodes taken as co-ordinates,
the results we achieved could be easily evaluated to be true.
16. Future Scope
The approach which we used would work fine when evaluated manually, but when
the same approach if applied to a big dataset requires lot of time and hard core
programming implementation which we could not do.
This can actually be the base for many more methods to develop for forecasting
crime just like data mining techniques.
17. References
Guidance:- The most important Part by Professor Charalampos Chelmis
Cell-to-Cell Activity Prediction for Smart Cities:- The IEEE paper by Blerim Cici,
Emmanouil Alimpertis, Alexander Ihler, Athina Markopoulou, University of
California, Irvine
We used a number of resources for writing the report, which we have mentioned in
that