Hotspot Crime Detection Using Machine Learning

TABLE CONTENT
01 PROBLEM IDENTIFICATION
02 APPROACH TO SOLVE PROBLEM
03 DATASET AND LIBRARIES
04 EDA(EXPLATORY DATA ANALYSIS)
05 TRAIN TEST SPLIT
06 BUILDING CLASSIFICATION MODEL
07 RESULTS AND CONCLUSION

PROBLEM IDENTIFICATION
To develop a machine learning model capable of
accurately identifying crime hotspots within Los Angeles.
By harnessing LAPD crime data, the model will provide
actionable insights, enabling the development of targeted
security recommendations for clients operating in high-
risk areas. Through this endeavor, the project aims to
strengthen community safety initiatives, foster
collaboration with local law enforcement, and uphold
their commitment to data-driven solutions for public
safety.

APPROACH TO SOLVE THE PROBLEM
Comparison
and
Conclusion
.
Evaluation
.
Modeling
.
Model Selection
.
Collect & Pre
processing
Data
.

Dataset and Libraries
"DATE OCC": "MM/DD/YYYY",
"TIME OCC": "In 24 hour military time.",
"AREA": "The LAPD has 21 Community Police
Stations referred to as Geographic Areas within
the department. These Geographic Areas are
sequentially numbered from 1-21.",
"AREA NAME": "The 21 Geographic Areas or
Patrol Divisions are also given a name
designation that references a landmark or the
surrounding community that it is responsible for.
For example 77th Street Division is located at the
intersection of South Broadway and 77th Street,
serving neighborhoods in South Los Angeles.",
"Rpt Dist No": "A four-digit code that represents
a sub-area within a Geographic Area. ",
"Part 1-2": “Indicates the type of crime",
"Crm Cd": "Indicates the crime committed.
(Same as Crime Code 1)",
"Vict Age": "Two character numeric",
"Vict Sex": "F - Female M - Male X - Unknown",
"Vict Descent": "Descent Code: A - Other
Asian B - Black C - Chinese D - Cambodian
F - Filipino G - Guamanian H -
Hispanic/Latin/Mexican I - American
Indian/Alaskan Native J - Japanese K -
Korean L - Laotian O - Other P - Pacific
Islander S - Samoan U - Hawaiian V -
Vietnamese W - White X - Unknown Z -
Asian Indian",
"Premis Cd": "The type of structure, vehicle,
or location where the crime took place.",
"Premis Desc": "Defines the Premise Code
provided.",
"Weapon Used Cd": "The type of weapon
used in the crime. ",
"Weapon Desc": "Defines the Weapon Used
Code provided.",
"LOCATION": "Street address of crime
incident rounded to the nearest hundred
block to maintain anonymity.",
"LAT": "Latitude",
"LON": "Longtitude"

Dataset and Libraries
Libraries:
Pandas: To Process the data as the data was in CSV
format
Matplotlib and Seaborn : It is commonly used for data
visualization and creating various types of charts and
plots
Scikit-learn: Scikit-Learn, also known as Sklearn is a
python library to implement machine learning models
and statistical modelling

EXPLORATORY DATA ANALYSIS
FUNCTION OPERATIONS
df=pd.read_csv(“”) Importing our dataset into Data frame and storing in df (i.e variable) (pd
refers to pandas).
df.head(), df.tail() To Display the first 5 Rows and last 5 Rows .
df.shape() array dimensions that tells the number of rows and columns of a given Data
Frame.
df.info() Display columns ,datatypes, non-null count and memory usage
df.describe() Provides summary statistics of data like mean, median, minimum,
maximum and more
df.isnull().sum() Check the Total missing /null values.
df.duplicated().sum() Check the duplicate values.
LabelEncoder() Replace Categorial Value to Numerical
StandardScaler() Scales your data into equal range
sns.histplot() Display distribution of your continuous dataset
sns.boxplot() To identify Outliers
sns.countplot() It count of the number of records by category

Building Classification Model
We have used 3 Algorithm to find out the best accuracy according to
our variables:
• Decision Tree
A decision tree is a machine learning model used for classification and
regression. It splits data into branches based on feature values, creating a
tree-like structure. It's easy to interpret, but can overfit, requiring
techniques like pruning to optimize.
• Naïve Bayes
Naive Bayes is a probabilistic machine learning model used for
classification. It assumes feature independence and applies Bayes' theorem.
It's efficient, particularly for large datasets, and performs well with text data
like spam detection, despite its naive assumptions.
• KNN
The K-Nearest Neighbors (KNN) algorithm is a supervised machine
learning algorithm that uses a distance-based approach to classify or predict
the grouping of a data point.

Decision Tree
.
.
Importing Algorithm And training the model
Classification report

Naïve Bayes
.
.

KNN
.
.

Results and Conclusion
MODEL ACCURACY MODEL FIT
Decision Tree 1 Overfit
Naïve Bayes 0.71 Partially overfit
KNN 0.86 Good fit

Hotspot Crime Detection Using Machine Learning

More Related Content

What's hot

Similar to Hotspot Crime Detection Using Machine Learning

More from Boston Institute of Analytics

Recently uploaded

Hotspot Crime Detection Using Machine Learning