Traffic violations Data Analysis Using SAS

TrafficViolations
Business Analytics with SAS
Group No: 12
Neha Gokhale
Raviraj Khatu
Rugved Gramopadhye
Nikhil Bagewadi
Sarvesh Marathe

Introduction
&
Overview
•About the Dataset
•Data Preprocessing
•DescriptiveAnalysis
•ModelingTechniques
•Model Comparison
•Results and Implications

About
The
Dataset
• Traffic Violation Data 2017
Source: Data.gov
• Groups of Attributes
1. Location
2. Causes of Violation
3. Vehicle Information
4. Driver’s Information
5. Consequences of Violation
• Type of Attributes

 Huge Data
 Excess of categories in a column
 Imbalanced categories inTargetVariable
 MissingValues

Data
Preprocessing
 Data Extraction
 Data Classification
 Oversampling
 Impute Operation
 Data Partition

Data Extraction& DataClassification

Descriptive
Analysis
 Seat Belt as a cause of Accident
 Gender involved in anAccident
 Relation between Commercial License andAccident
 Day-wise distribution ofTrafficViolation cases
 Different types ofViolations
 Vehicle types involved inTrafficViolation

DescriptiveAnalysis
Seat Belts as a cause of Accident Gender involved in an Accident

DescriptiveAnalysis
Day-wise distribution ofTrafficViolations
Relation between Commercial License and Accident

DescriptiveAnalysis
DifferentTypes ofViolations Vehicle types involved inTrafficViolation

Modeling
Techniques
 DecisionTree
 Logistic Regression
 Neural Network
 Auto Neural Network

Data Modeling
using all input
variables
DecisionTree
 Simple and Easy to use
 Suitable for binary target variable
ImportantVariables
 Type of violation
 Personal Injury
 Property Damage
 Charge
 Description

Data Modeling
using all input
variables
Logistic Regression
 Recommended for binary target variables
 Uses Maximum Likelihood to estimate the model
parameters
ImportantVariables
 Day ofWeek
 Hour of Day
 Personal Injury
 Property Damage
 ViolationType
 Description

Data Modeling
using all input
variables
Neural Network
• It is supervised machine learning algorithm
• Data partitioned into
 Train – 70%
 Validation – 15%
 Test – 15%
Train Validation Test
0.0394 0.5042 0.4938

Data Modeling
using all input
variables
Auto Neural Network
• More Flexible than Neural Network
• We can specify number of Hidden
Units
Number of
Hidden Units
Misclassification
Rate - Train
Misclassification
Rate - Validate
Misclassification
Rate - Test
1 0.0 0.50 0.48
2 0.49 0.5 0.5021
3 0.46 0.57 0.56

Significant
Variables
 Description
 Hour of Day
 ViolationType
 Day ofWeek

Data Modeling
using only
significant
variables as
inputs
DecisionTree Logistic Regression
Predicted
Positive Negative
Actual
Positive 106 12
Negative 31 87
Predicted
Positive Negative
Actual
Positive 106 12
Negative 33 85
Accuracy = 81.77% Accuracy = 80.77%

Data Modeling
using only
significant
variables as
inputs
Neural Network Auto Neural Network
Predicted
Positive Negative
Actual
Positive 106 12
Negative 31 87
Predicted
Positive Negative
Actual
Positive 99 19
Negative 30 88
Accuracy = 79.23%Accuracy = 81.77%

Results &
Implications
1. Accidents can happen on any day
2. Men are more involved in an accident and not women
3. The Majority drivers involved in accidents were not having the
official driving license
4. Can be useful for driving license department to show the people
the importance of proper training and seat belts

References
 https://support.sas.com/kb/24/205.html
 https://catalog.data.gov/dataset/traffic-violations-56dda
 https://support.sas.com/resources/papers/proceedings15/SAS196
5-2015.pdf
 http://support.sas.com/publishing/pubcat/chaps/57587.pdf
 https://support.sas.com/rnd/app/stat/papers/logistic.pdf

Traffic violations Data Analysis Using SAS

Recommended

Recommended

More Related Content

Similar to Traffic violations Data Analysis Using SAS

Similar to Traffic violations Data Analysis Using SAS (20)

Recently uploaded

Recently uploaded (20)

Traffic violations Data Analysis Using SAS