SlideShare a Scribd company logo
TrafficViolations
Business Analytics with SAS
Group No: 12
Neha Gokhale
Raviraj Khatu
Rugved Gramopadhye
Nikhil Bagewadi
Sarvesh Marathe
Introduction
&
Overview
•About the Dataset
•Data Preprocessing
•DescriptiveAnalysis
•ModelingTechniques
•Model Comparison
•Results and Implications
About
The
Dataset
• Traffic Violation Data 2017
Source: Data.gov
• Groups of Attributes
1. Location
2. Causes of Violation
3. Vehicle Information
4. Driver’s Information
5. Consequences of Violation
• Type of Attributes
 Huge Data
 Excess of categories in a column
 Imbalanced categories inTargetVariable
 MissingValues
Data
Preprocessing
 Data Extraction
 Data Classification
 Oversampling
 Impute Operation
 Data Partition
Data Extraction& DataClassification
Oversampling& ImputeOperation
Data PreprocessingSummary
Descriptive
Analysis
 Seat Belt as a cause of Accident
 Gender involved in anAccident
 Relation between Commercial License andAccident
 Day-wise distribution ofTrafficViolation cases
 Different types ofViolations
 Vehicle types involved inTrafficViolation
DescriptiveAnalysis
Seat Belts as a cause of Accident Gender involved in an Accident
DescriptiveAnalysis
Day-wise distribution ofTrafficViolations
Relation between Commercial License and Accident
DescriptiveAnalysis
DifferentTypes ofViolations Vehicle types involved inTrafficViolation
Modeling
Techniques
 DecisionTree
 Logistic Regression
 Neural Network
 Auto Neural Network
Data Modeling
using all input
variables
DecisionTree
 Simple and Easy to use
 Suitable for binary target variable
ImportantVariables
 Type of violation
 Personal Injury
 Property Damage
 Charge
 Description
Data Modeling
using all input
variables
Logistic Regression
 Recommended for binary target variables
 Uses Maximum Likelihood to estimate the model
parameters
ImportantVariables
 Day ofWeek
 Hour of Day
 Personal Injury
 Property Damage
 ViolationType
 Description
Data Modeling
using all input
variables
Neural Network
• It is supervised machine learning algorithm
• Data partitioned into
 Train – 70%
 Validation – 15%
 Test – 15%
Train Validation Test
0.0394 0.5042 0.4938
Data Modeling
using all input
variables
Auto Neural Network
• More Flexible than Neural Network
• We can specify number of Hidden
Units
Number of
Hidden Units
Misclassification
Rate - Train
Misclassification
Rate - Validate
Misclassification
Rate - Test
1 0.0 0.50 0.48
2 0.49 0.5 0.5021
3 0.46 0.57 0.56
Significant
Variables
 Description
 Hour of Day
 ViolationType
 Day ofWeek
Data Modeling
using only
significant
variables as
inputs
DecisionTree Logistic Regression
Predicted
Positive Negative
Actual
Positive 106 12
Negative 31 87
Predicted
Positive Negative
Actual
Positive 106 12
Negative 33 85
Accuracy = 81.77% Accuracy = 80.77%
Data Modeling
using only
significant
variables as
inputs
Neural Network Auto Neural Network
Predicted
Positive Negative
Actual
Positive 106 12
Negative 31 87
Predicted
Positive Negative
Actual
Positive 99 19
Negative 30 88
Accuracy = 79.23%Accuracy = 81.77%
Final Model
Model
Comparison
Results &
Implications
1. Accidents can happen on any day
2. Men are more involved in an accident and not women
3. The Majority drivers involved in accidents were not having the
official driving license
4. Can be useful for driving license department to show the people
the importance of proper training and seat belts
References
 https://support.sas.com/kb/24/205.html
 https://catalog.data.gov/dataset/traffic-violations-56dda
 https://support.sas.com/resources/papers/proceedings15/SAS196
5-2015.pdf
 http://support.sas.com/publishing/pubcat/chaps/57587.pdf
 https://support.sas.com/rnd/app/stat/papers/logistic.pdf
ThankYou !

More Related Content

Similar to Traffic violations Data Analysis Using SAS

Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
raj
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
Sandesh Rao
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel File
Mehmet Gök
 
Data Mining.pptx
Data Mining.pptxData Mining.pptx
Data Mining.pptx
Treat Assignment help
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
Sandesh Rao
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
Yogesh Shinde
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...
Institute for Transport Studies (ITS)
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
Aniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
patilaniket2418
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
Peter Varhol
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
Agile Testing Alliance
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - finalAndrew White
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4j
Neo4j
 
Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions
Neo4j
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
CSCJournals
 

Similar to Traffic violations Data Analysis Using SAS (20)

Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel File
 
Data Mining.pptx
Data Mining.pptxData Mining.pptx
Data Mining.pptx
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
 
Data analysis
Data analysisData analysis
Data analysis
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...Improving the understanding of safety performance of commercial motorcycles i...
Improving the understanding of safety performance of commercial motorcycles i...
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
 
Data analysis
Data analysisData analysis
Data analysis
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - final
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4j
 
Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Traffic violations Data Analysis Using SAS