This document describes a capstone project to predict the severity of car accidents in Seattle using machine learning models. The project uses data from the Seattle Department of Transportation on accident characteristics and outcomes to build classification models including decision trees, KNN, and logistic regression. The logistic regression model achieved the best performance with an F1 score of 0.76 and R2 score of 0.03. In conclusion, the author states that such a model could help emergency services, insurers, and transit agencies respond to accidents.
APPLIED DATA SCIENCECAPSTONE
CAR ACCIDENT SEVERITY
OMID KARAMI
APPLIED DATA SCIENCE CAPSTONE
SEPTEMBER 2020
2.
PROBLEM DEFINITION
Predict theroad accident severity in the Seattle
Transportation System
Target the Right Audiences:
• Emergency Services
• Insurance and public transport companies
• Municipality
• Passengers
3.
DATA
Data Description
The datasetis available in Comma Separated Value (CSV) format
Source: Seattle Department of Transportation
The dataset consists of 39 independent variables and 221,525 rows.
The dependent variable or target variable, “SEVERITYCODE”,
contains value corresponds to the severity of the collision.
4.
DATA
Data Cleaning
Data cleaningis one of the most important steps in pre-processing stage.
Eliminating unwanted, duplicate and irrelevant observations significant
caused a good insights into the data and achieve better results.
• Missing target variable
• Redundant and useless data
• Convert categorical variable to numeric
• Standardization
• Normalization
METHODOLOGY
Predict the severityof accidents using the supervised
learning algorithms
Two different categories in Machine Learning Models.
Regression and Classification models.
• Decision Tree
• K-Nearest Neighbor (KNN)
• Logistic Regression
9.
RESULT AND EVALUATION
ModelF1 Score Jaccard Score R2 Score
Decision Tree 0.67 0.52 -0.33
K-Nearest Neighbor (KNN) 0.67 0.46 -0.39
Logistic Regression 0.76 0.61 0.03
11.
CONCLUSION
The purpose ofthis study is to predict a model for the severity of
road accidents.
Such a model can be adapted with any road traffic network
anywhere in the world.
The accuracy of this model is very good. It has 82% accuracy.
Therefore, based on this modeling, we can help the organizations
that are involved in road accidents such as:
• Emergency services
• Traffic center,
• Insurance and public transport