SlideShare a Scribd company logo
Predictive Analysis of Traffic violations
Group 1
Introduction
Road accidents - big worldwide threat:
● Up to 1.27 million deaths/year [2].
● Up to 50 million injuries/year [2].
● Over 2.5 millions/year involved in US [2].
● Huge economic and social impact.
Data mining is arduous for traffic violations:
● Huge data size and high dimensions [2-4].
● Popularity of classification methods [3-5].
● High dependence on collected data [3-5].
● Testing multiple to choose the best [3-5].
Flow of Project
Project followed CRISP-DM model and provided instructions
A. Selecting a dataset from any free source on web.
B. Study the variables in the selected dataset; draw
preliminary conclusions; and develop at least three
initial research questions/hypotheses.
C. Develop a Business Use Case.
D. Use the visualization (min of 5) to explore dataset;
present research hypotheses, based on the
visualizations.
E. Produce a dataset satisfying all of the criteria in part c.
F. Present 3 modeling techniques for hypotheses.
G. Develop models using 3 algorithm for each model.
H. Provide recommendations based on modeling results.
Dataset Chosen
Traffic Violations of MCP: records from 2012 to 2018
from https://catalog.data.gov/ (Jan 17, 2018)
Attributes:
● Date of Stop
● Time of Stop
● Belts
● Contribution Accident
● Description
● Phone Used
● Fatal
● Property damage
● Violation Type
● Hazmat
Data Preprocessing
Missing Data Handling
● Two attributes i.e. “Agency” and “Accident” are removed from the dataset for being
single-valued attributes with value 'No' for every one of the records.
● Records with Null values are evacuated utilizing XL miners missing data treatment.
New Attributes
● Three new binary-valued variables are presented to be specific, "Phone Usage", "
Contributed to accident " and "Fatal" in light of the visualizations made for the
dataset.
Outliers detection
● Outliers are identified for the attribute "Year" and treated.
Filtering Datasets
● For each search question separate datasets were created for modeling.
● R programming language is utilized to accomplish balanced dataset with an
equivalent weight of target variables.
Visualizations - Violations based on hours
Visualizations - Child violations with year & month
Visualizations - Phone violations with year & month
Visualizations - Violations with Personal injury & seat belt
Search Hypothesis
1
Whether a violation
occurred contributed to
an accident or not
Whether the violation
that led to an accident
was fatal or not
2
The geolocation, at
which a violation is
likely to occur based on
several factors
3
Modeling
Model 1
Predictors:
● Belt
● Alcohol
● Vehicle Type
● Year
● Phone Usage
Target variable:
Contributed to accident
Model 2
Predictors:
● Belt
● HAZMAT
● Alcohol
● Phone Usage
Target variable:
Fatal
Model 3
Predictors:
● Belt, Personal Injury
● Property Damage
● Alcohol
● Violation Type
Target variable:
Cluster ID (geo-coordinates)
Model I: contrib. to accident
Algorithm Single Tree Random Trees Naïve Bayes
Precision 0.626 0.503 0.621
Sensitivity 0.275 0.996 0.266
Specificity 0.836 0.023 0.838
F1-Score 0.383 0.669 0.373
Belts
Phone usage
Alcohol
Type:
Bus
Best model: Random Trees due to highest
sensitivity and F1-score
Belts and phone usage are the top predictors
for contribution to accident
Model II: Fatality of accident
Alcohol
Phone usage
HAZMAT
Algorithm Single Tree Random Trees Naïve Bayes
Precision 0.579 0.639 0.579
Sensitivity 0.79 0.383 0.543
Specificity 0.404 0.776 0.59
F1-Score 0.668 0.478 0.561
Best model: Single Tree due to highest
sensitivity and F1-score
Alcohol and phone usage are the top predictors for fatal accidents
Model III: geolocation for “likely” violations
● K-means clustering and
Single Tree classification are
best algorithms due to
unbiased results
● Cluster 5 has the highest
number of alcohol violations
and personal injuries
● Cluster 9 has the highest
number of belt violations
Recommendations
> Recommend MCP to increase their attention and increase enforcement of rules
for belts and phone usage as the main reason for accident contribution
> Recommend MCP to pay an extra attention and take extra measures to drunk
drivers and phone usage when driving
> Alert Maryland police about the major areas violations are likely to be caused:
>> high number of alcohol violations with personal injuries in cluster 5
>> multiple belt violations in cluster 9
> Perform similar recommendations to the insurance companies in the state of
Maryland
Conclusion
● 5 visualizations have been produced and 3 research hypotheses developed;
● Data preprocessed in several datasets;
● 3 modeling technique is performed for hypotheses;
● Several classification, clustering and regression models have been considered for
modeling: Single Tree, Random Trees, K-Means clustering, Multiple regression, etc.;
● Random Trees and Single Tree are the best algorithms for models 1 and 2 due to an
importance of high sensitivity and a high F1-score;
● K-means clustering and Single Tree classification have been considered as the best
algorithms for model 3 providing numbers of different types of violations along with the
number of injuries for various clusters;
● XLMiner, Tableau, and R are used for analysis.
References
● Discovering Knowledge in Data: An Introduction to Data Mining, Daniel T. Larose and Chantal D. Larose,
Wiley, 2nd edition: Wiley ISBN 978-0-470-90874-7
● Abellan, J., Lopez, G., & De O~na, J. (2013). Analysis of traffic accident severity using Decision Rules via
Decision Trees. Expert Systems with Applications, 40, 6047–6054.
● Chang, L.-Y., & Chien, J.-T. (2015). Analysis of driver injury severity in truck-involved accidents using a non-
parametric classification tree model. Safety Science, 51(1), 17–22.
● Chen, W. H., & Jovanis, P. P. (2012). Method for identifying factors contributing to driver injury severity in
traffic crashes. Transportation Research Record, 1717, 1–9.
● Kashani, A. T., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to investigate the factors
influencing the crash severity of motorcycle pillion passengers. Journal of Safety Research, 51, 93–98.
● Kwon, O. H., Rhee, W., & Yoon, Y. (2015). Application of classification algorithms for analysis of road safety
risk factor dependencies. Accident Analysis and Prevention, 75, 1–15.
● Xie, Y., Zhang, Y., & Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models.
Journal of Transportation Engineering ASCE, 135(1), 18–25.
● Mujalli, M. O., & de O~na, J. (2011). A method for simplifying the analysis of traffic accidents injury severity
on two-lane highways using Bayesian networks. Journal of Safety Research, 42, 317–326.
● De O~na, J., Lopez, G., & Abellan, J. (2013). Extracting decision rules from police accident reports through
decision trees. Accident Analysis & Prevention, 50, 1151–1160.
QUESTIONS ?

More Related Content

Similar to Predictive analysis of traffic violations

Accident dtection using opencv and using AI
Accident dtection using opencv and using AIAccident dtection using opencv and using AI
Accident dtection using opencv and using AI
rk7ramesh2580
 
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
ijcsa
 
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
Analysis of Machine Learning Algorithm with Road Accidents Data SetsAnalysis of Machine Learning Algorithm with Road Accidents Data Sets
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
Dr. Amarjeet Singh
 
Accident prediction modelling for an urban road of bangalore
Accident prediction modelling for an urban road of bangaloreAccident prediction modelling for an urban road of bangalore
Accident prediction modelling for an urban road of bangalore
eSAT Publishing House
 
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET Journal
 
IRJET - Predicting Accident Severity using Machine Learning
IRJET -  	  Predicting Accident Severity using Machine LearningIRJET -  	  Predicting Accident Severity using Machine Learning
IRJET - Predicting Accident Severity using Machine Learning
IRJET Journal
 
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
IRJET - Road Accident and Emergency Management: A Data Analytics ApproachIRJET - Road Accident and Emergency Management: A Data Analytics Approach
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
IRJET Journal
 
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
IJERA Editor
 
India Vision Zero 2017: Speed - The Biggest Killer
India Vision Zero 2017: Speed - The Biggest KillerIndia Vision Zero 2017: Speed - The Biggest Killer
India Vision Zero 2017: Speed - The Biggest Killer
WRI Ross Center for Sustainable Cities
 
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
Dr.Makendran Chapter -II Accident Studies & Collision Diagram  .pdfDr.Makendran Chapter -II Accident Studies & Collision Diagram  .pdf
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
makendran1
 
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...Julie J. Kang, Ph.D.
 
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
IRJET Journal
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
IRJET Journal
 
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
inventionjournals
 
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
IRJET Journal
 
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
The Statistical and Applied Mathematical Sciences Institute
 
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community StreetPedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
coreconferences
 
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur CityIRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
IRJET Journal
 
Driver Distraction Management Using Sensor Data Cloud
Driver Distraction Management Using Sensor Data Cloud Driver Distraction Management Using Sensor Data Cloud
Driver Distraction Management Using Sensor Data Cloud
Bangladesh Network Operators Group
 
Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...
Institute for Transport Studies (ITS)
 

Similar to Predictive analysis of traffic violations (20)

Accident dtection using opencv and using AI
Accident dtection using opencv and using AIAccident dtection using opencv and using AI
Accident dtection using opencv and using AI
 
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
 
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
Analysis of Machine Learning Algorithm with Road Accidents Data SetsAnalysis of Machine Learning Algorithm with Road Accidents Data Sets
Analysis of Machine Learning Algorithm with Road Accidents Data Sets
 
Accident prediction modelling for an urban road of bangalore
Accident prediction modelling for an urban road of bangaloreAccident prediction modelling for an urban road of bangalore
Accident prediction modelling for an urban road of bangalore
 
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
IRJET-Road Traffic Accident Analysis and Prediction Model: A Case Study of Va...
 
IRJET - Predicting Accident Severity using Machine Learning
IRJET -  	  Predicting Accident Severity using Machine LearningIRJET -  	  Predicting Accident Severity using Machine Learning
IRJET - Predicting Accident Severity using Machine Learning
 
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
IRJET - Road Accident and Emergency Management: A Data Analytics ApproachIRJET - Road Accident and Emergency Management: A Data Analytics Approach
IRJET - Road Accident and Emergency Management: A Data Analytics Approach
 
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
Analyzing Specialized Views of Transportation Under Mean Safety By Using Fuzz...
 
India Vision Zero 2017: Speed - The Biggest Killer
India Vision Zero 2017: Speed - The Biggest KillerIndia Vision Zero 2017: Speed - The Biggest Killer
India Vision Zero 2017: Speed - The Biggest Killer
 
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
Dr.Makendran Chapter -II Accident Studies & Collision Diagram  .pdfDr.Makendran Chapter -II Accident Studies & Collision Diagram  .pdf
Dr.Makendran Chapter -II Accident Studies & Collision Diagram .pdf
 
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
Schwarz et al._2016_The Detection of Visual Distraction using Vehicle and Dri...
 
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
Sensor Based Detection & Classification of Actionable & Non-Actionable Condit...
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
To Find out the Relationship between Errors, Lapses, Violations and Traffic A...
 
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
IRJET- Measuring The Driver's Perception Error in the Traffic Accident Risk E...
 
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
 
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community StreetPedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
Pedestrian Conflict Risk Model at Unsignalized Locations on a Community Street
 
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur CityIRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
IRJET- Mode Choice Behaviour Analysis of Students in Thrissur City
 
Driver Distraction Management Using Sensor Data Cloud
Driver Distraction Management Using Sensor Data Cloud Driver Distraction Management Using Sensor Data Cloud
Driver Distraction Management Using Sensor Data Cloud
 
Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Predictive analysis of traffic violations

  • 1. Predictive Analysis of Traffic violations Group 1
  • 2. Introduction Road accidents - big worldwide threat: ● Up to 1.27 million deaths/year [2]. ● Up to 50 million injuries/year [2]. ● Over 2.5 millions/year involved in US [2]. ● Huge economic and social impact. Data mining is arduous for traffic violations: ● Huge data size and high dimensions [2-4]. ● Popularity of classification methods [3-5]. ● High dependence on collected data [3-5]. ● Testing multiple to choose the best [3-5].
  • 3. Flow of Project Project followed CRISP-DM model and provided instructions A. Selecting a dataset from any free source on web. B. Study the variables in the selected dataset; draw preliminary conclusions; and develop at least three initial research questions/hypotheses. C. Develop a Business Use Case. D. Use the visualization (min of 5) to explore dataset; present research hypotheses, based on the visualizations. E. Produce a dataset satisfying all of the criteria in part c. F. Present 3 modeling techniques for hypotheses. G. Develop models using 3 algorithm for each model. H. Provide recommendations based on modeling results.
  • 4. Dataset Chosen Traffic Violations of MCP: records from 2012 to 2018 from https://catalog.data.gov/ (Jan 17, 2018) Attributes: ● Date of Stop ● Time of Stop ● Belts ● Contribution Accident ● Description ● Phone Used ● Fatal ● Property damage ● Violation Type ● Hazmat
  • 5. Data Preprocessing Missing Data Handling ● Two attributes i.e. “Agency” and “Accident” are removed from the dataset for being single-valued attributes with value 'No' for every one of the records. ● Records with Null values are evacuated utilizing XL miners missing data treatment. New Attributes ● Three new binary-valued variables are presented to be specific, "Phone Usage", " Contributed to accident " and "Fatal" in light of the visualizations made for the dataset. Outliers detection ● Outliers are identified for the attribute "Year" and treated. Filtering Datasets ● For each search question separate datasets were created for modeling. ● R programming language is utilized to accomplish balanced dataset with an equivalent weight of target variables.
  • 7. Visualizations - Child violations with year & month
  • 8. Visualizations - Phone violations with year & month
  • 9. Visualizations - Violations with Personal injury & seat belt
  • 10. Search Hypothesis 1 Whether a violation occurred contributed to an accident or not Whether the violation that led to an accident was fatal or not 2 The geolocation, at which a violation is likely to occur based on several factors 3
  • 11. Modeling Model 1 Predictors: ● Belt ● Alcohol ● Vehicle Type ● Year ● Phone Usage Target variable: Contributed to accident Model 2 Predictors: ● Belt ● HAZMAT ● Alcohol ● Phone Usage Target variable: Fatal Model 3 Predictors: ● Belt, Personal Injury ● Property Damage ● Alcohol ● Violation Type Target variable: Cluster ID (geo-coordinates)
  • 12. Model I: contrib. to accident Algorithm Single Tree Random Trees Naïve Bayes Precision 0.626 0.503 0.621 Sensitivity 0.275 0.996 0.266 Specificity 0.836 0.023 0.838 F1-Score 0.383 0.669 0.373 Belts Phone usage Alcohol Type: Bus Best model: Random Trees due to highest sensitivity and F1-score Belts and phone usage are the top predictors for contribution to accident
  • 13. Model II: Fatality of accident Alcohol Phone usage HAZMAT Algorithm Single Tree Random Trees Naïve Bayes Precision 0.579 0.639 0.579 Sensitivity 0.79 0.383 0.543 Specificity 0.404 0.776 0.59 F1-Score 0.668 0.478 0.561 Best model: Single Tree due to highest sensitivity and F1-score Alcohol and phone usage are the top predictors for fatal accidents
  • 14. Model III: geolocation for “likely” violations ● K-means clustering and Single Tree classification are best algorithms due to unbiased results ● Cluster 5 has the highest number of alcohol violations and personal injuries ● Cluster 9 has the highest number of belt violations
  • 15. Recommendations > Recommend MCP to increase their attention and increase enforcement of rules for belts and phone usage as the main reason for accident contribution > Recommend MCP to pay an extra attention and take extra measures to drunk drivers and phone usage when driving > Alert Maryland police about the major areas violations are likely to be caused: >> high number of alcohol violations with personal injuries in cluster 5 >> multiple belt violations in cluster 9 > Perform similar recommendations to the insurance companies in the state of Maryland
  • 16. Conclusion ● 5 visualizations have been produced and 3 research hypotheses developed; ● Data preprocessed in several datasets; ● 3 modeling technique is performed for hypotheses; ● Several classification, clustering and regression models have been considered for modeling: Single Tree, Random Trees, K-Means clustering, Multiple regression, etc.; ● Random Trees and Single Tree are the best algorithms for models 1 and 2 due to an importance of high sensitivity and a high F1-score; ● K-means clustering and Single Tree classification have been considered as the best algorithms for model 3 providing numbers of different types of violations along with the number of injuries for various clusters; ● XLMiner, Tableau, and R are used for analysis.
  • 17. References ● Discovering Knowledge in Data: An Introduction to Data Mining, Daniel T. Larose and Chantal D. Larose, Wiley, 2nd edition: Wiley ISBN 978-0-470-90874-7 ● Abellan, J., Lopez, G., & De O~na, J. (2013). Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Systems with Applications, 40, 6047–6054. ● Chang, L.-Y., & Chien, J.-T. (2015). Analysis of driver injury severity in truck-involved accidents using a non- parametric classification tree model. Safety Science, 51(1), 17–22. ● Chen, W. H., & Jovanis, P. P. (2012). Method for identifying factors contributing to driver injury severity in traffic crashes. Transportation Research Record, 1717, 1–9. ● Kashani, A. T., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to investigate the factors influencing the crash severity of motorcycle pillion passengers. Journal of Safety Research, 51, 93–98. ● Kwon, O. H., Rhee, W., & Yoon, Y. (2015). Application of classification algorithms for analysis of road safety risk factor dependencies. Accident Analysis and Prevention, 75, 1–15. ● Xie, Y., Zhang, Y., & Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models. Journal of Transportation Engineering ASCE, 135(1), 18–25. ● Mujalli, M. O., & de O~na, J. (2011). A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks. Journal of Safety Research, 42, 317–326. ● De O~na, J., Lopez, G., & Abellan, J. (2013). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention, 50, 1151–1160.

Editor's Notes

  1. Belts: If traffic violation involved a seat belt violation. Personal Injury: If traffic violation involved Personal Injury. Property Damage: If traffic violation involved Property Damage. Fatal: If traffic violation involved a fatality. HAZMAT: If the traffic violation involved hazardous materials. Violation Type: Violation type. (Examples: Warning, Citation, SERO) Geolocation: Geo-coded location information.