2. Outline
• Project Objective
• Data Source And Variables
• Research Question
• Data Preprocessing
• Method of Analysis
• Results
• Recommendations
3. Objective
• Evaluate the profile of drivers causing different
types of accidents.
• Detect anomalies present in the data.
• Detect missing values and outliers.
• Bin the ages of drivers and cause of accidents.
• Profile of drivers for type of accident.
4. Data- Lists of variables
COMP_CODE Company Code
CLAIM_NO Claim no#
CLAIM_YEAR Claim year
VEHICLE_TYPE Vehicle type
REG_NO Registrations Number
REG_DATE Registration Date
VECHICLE_MAKE Vechicle Manufacturer
CAPACITY Capacity
SUM_ASSURED Sum assured
AGE_DRIVER Age of driver
SEX_DRIVER Gender of driver
SELF_PAID_OTHER Relation of the driver with the vehicles
LICENCE_ISSUE_YEAR When license was issued to the driver
CAUSE_ACCIDENT Major cause for the accident
PLACE_ACCIDENT Where the accident took place
LOSS_NATURE Nature of loss due to accident
ESTIMATED_LOSS Estimated loss due to accident
PLACE_OF_REPAIR Where was the vehicle taken for repair
OS_PROVISION OS_PROVISION
DATE_SURVEY When the survey was conducted
PARTS Cost of parts for repair
LABOUR Cost of labour for repair
DEPRECIATION_AMT The amount by which the value of the car will depreciate.
ASSESSED_LOSS The monetary loss assessed by the company
PAID_AMT The amount paid by the company
SALVAGE_AMT The amount by which an asset depreciates each period
DATE_PAYMENT When was payment made by the company
MODE_SETTLEMENT Mode of settlement of paid amount
FEE_EXPENSE Fee paid for insurance processing
TIME_ACCIDENT Time of accident
DATE_ACCIDENT Date of accident
6. Research Questions-
Data are explored with focus on following questions
• City or highway -where are most accidents happening ?
• Is it better if the owner drives their own car?
• Does all state show a similar condition?
• What causes of accident are majorly significant?
• Which “cause of accident” results into higher loss?
• Are vehicle makers producing low quality product?
• Does driver’s license issuing year a good indicator of the
risk of accident?
7. Methodology
• Started with data organization and time frame of study
• Exploratory analysis and descriptive statistics.
• Profile Development
• Evaluation of results Interpretation
• Recommendations from the analysis
8. Pre-processing
• Illegal values
• Number of illegal values are found in variables like REG_NO,
VEHICLE_MAKE, CAPACITY, PARTS, LABOUR,
DEPRECIATION_AMT, ASSESSED_LOSS, PAID_AMT,
SALVAGE_AMT.
• Data columns (PARTS LABOUR DEPRECIATION_AMT
ASSESSED_LOSS PAID_AMT SALVAGE_AMT ) had un-natural high
values. On close examination, it turned out to be date value stored as number.
It is suspected that these dates values are part of PAYMENT_DATE variable
but may have shifted when the files was being converted into different
formats.
• State code has been extracted from REG_NO and illegal values were
rectified using the standard code table available on Wikipedia
• VEHICLE_MAKE was cleaned and reduced to a considerable number of
levels.
• Missing values
• Variables with few missing values PARTS, LABOUR were imputed using
mean values of the type of vehicle.
• Variables like SALVAGE_AMT and FEE_EXPENCE has high percentage of
missing values, so these variables were not considered for analysis.
9. Continue..
• Outliers
• Detected in variables dealing with money like SUM_ASSURED,
ESTIMATED_LOSS.
• Age has been binned into groups (Less than 25, 26-35, 36-45,
46-55, 56 and above)
• VEHICLE_TYPE was reduced by one level by merging
MOPEDS/SCOOTERS and SCOOTERS into 1 category.
• All date type variables have been broken into year, quarter
and month.
• Anomalies also detected in data.
• Records are present where claim year is before accident year.
10. Time Frame of study
• As shown above, more than 80% of the sampled data has been surveyed in
2000 and 2001.
• Also most of the accidents has taken place in 2000 and 2001.
• So moving forward, we will consider only significant years for further
analysis, as the sample data for other years are very less and might not be
able to provide correct picture of those years.
• Also, as surveys were conducted during 2000 & 2001. Other year’s data
(+/- 2 years) will be more prone to human errors based on respondents
memory.
11. • The analysis of years of claims and payments also show a similar patter and are
concentrated around 2000 and 2001.
• Analysis of the vehicle registration
year shows that the data is
concentrated between 1998-2001.
• It is also indicative that most vehicles
were almost new when they meet
accident, which also might be
because the drivers were novice.
Hence forward, filter would be applied on Survey
Year to consider only 1998 - 2002
12. Exploratory Analysis
• More accidents happen
within city limits compared
to highways.
• Females are part of lesser
number of accidents
compared to males.
• Only male drivers are
involved in highway
accidents.
The above insights may be surfacing
because
1) In terms of driving, Indian females are not
seen behind the wheels frequently.
2) People don’t frequent highways until very
confident with driving skill. They generally
hire a driver.
3) When on highways they are very alert but in
city limits their level of alertness is generally
lower.
13. Locational details
• Most number of accidents
1) Maharashtra 2) Andhra Pradesh 3) Gujarat
• States of AP, MP, Gujarat &
Maharashtra have similar ratios for
“Place of accidents”. (around 35% on
Highways and rest within city)
• States of Karnataka and UP shows
higher percentage of accidents within
city limits (90% & 75% respectively)
• Rest of the states show only 1 kind of
accidents
Within City Limits – Delhi, Rajasthan
Highway - Bihar, Chhattisgarh.
14. • Almost 48% of the accidents occur when the
owner drives their vehicle. This may also be
because most of the car owners are urban
citizens and within city limits they would
generally drive themselves.
• Least accidents happen when the car is driven
by not the owner or a paid driver. Here the
person driving can be the owners relative or an
unpaid driver.
• Driver Error/Negligence (26%), Collision
(16%), Traffic Congestion (14%) are the main
reasons for accidents
• A large part of the causes of accident is
uncategorized and grouped together as
“Others” (32%).
• Over speeding, Mechanical breakdown
contribute to a very small portion of the cause
of accident.
Exploratory Analysis
15. Comparison between City
and Highway accidents
• Higher volume of city accidents take
place compared to Highway.
• Also the total of sum assured is higher for
City limit, hence proving that the
Insurance sector has better business
opportunity in city rather than highway.
• Also the registration number of the
vehicles gives the idea of the location of
accidents but this logic is mostly valid for
city limit accidents. As vehicles travelling
on highways tend go across state and
hence the assumption would be less
accurate.
• Also, accidents on highways tend to take
longer to be recorded and also due to lack
of services on highway may not always
go through official processes.
Hence the analysis of accidents within
city limits would be more beneficial
and profiling would produce more
accurate results
16. Analyzing the city accidents
67% of the accidents
happened within city
limits.
60% of the accidents happen with cars
followed by 30% happening with
mopeds/scooters/motorcycles.
No Taxi accidents have happened within
city limits
Most of the cars and 2-wheelers involved
have been registered between 1999-2001.
98% of the drivers are
males with average age of
35 and range of 18 – 62
years of age.
56%of the time the owner was driving the
vehicle themselves. This jumps to near 60% for
private vehicles like cars, mopeds/scooters and
motorcycles, while around 85% of the time
public vehicles (Bus, Taxi, 3 Wheeler) are driven
by Paid drivers.
17. • But only 3.25% of the vehicles from these
mentioned manufacturers have meet
accident because of mechanical
breakdown, which clears the quality issue
and thus the high numbers above are
indicative of there market dominance.
• 55% of the cars involved is accident
belong to Maruti followed by 12%
made by Tata Motors.
• 55% of the Motorcycles involved
are made by Hero Honda followed
by 21% by Bajaj and 10% from
Yamaha
• Moped/Scooters are mostly made by
Bajaj (26%), Kinetic (26%), LML
(20%) or TVS (15%)
• High number of accidents are
indicative of 2 things
1. The quality of cars manufactured
are low
2. The manufacturer is the market
leader and sales volumes are very
high
18. • Maharashtra (MH) leads for
most Cars, Motorcycle and
CGV accidents.
• Madhya Pradesh (MP) leads
for most bus accidents, while
Andhra Pradesh (AP) for 3-
wheelers.
• Gujarat is place with
maximum Moped/Scooter
accidents and also the place
where most collisions happen.
• Accidents caused due to AOG
Perils, Over speeding,
Mechanical Breakdown
mostly happen in
Maharashtra.
• Andhra Pradesh (AP) leads
for most accidents cased by
Driver error/ Negligence. It is
also the only state where
every type of vehicle
accidents have been
registered.
• Karnataka (KA) leads for
most accidents caused with
Parked Vehicles.
19. • Most drivers causing accidents were
issued license within last 5-6 years.
• Motorcycles are the major contributors
for accidents due to negligence.
• The average age of drivers of motor
cycle is way below 35, and hence can
be a major reason why most motor
cycle accidents are caused due to
driver error/negligence.
20. • Recent licenses have been issued to
only 3-wheeler drivers.
• But driver negligence is one of the
major cause of accidents across all
public vehicles
• For 3-wheelers, over speeding is
also a prominent factor
• Other than 3-wheeler drivers rest
drivers are close or above the
average age of 35.
21. • The number of mechanical break-downs may be very less but the average
estimated loss is the highest.
• A similar case is seen with AOG Perils.
22. Results Interpretation
Profile for City accidents
Vehicle
Type
Driver type Age Group of
Drivers causing
accidents
% of the
License issued
in the last 10
years
Top 3 Cause of accident Nature of loss
and average
estimated
monetary value
Bus (1%) Paid (100%) 26-35 years (100%) 0% No prominent reason Damage
(INR
1,20,000.00)
Cars
(59%)
Self (58%) 26-35 years (~ 36%) 75% Traffic Congestion (33%)
Others (33%)
Driver Error/Negligence
(23%)
Damage
(INR 24,000.00)
36-45 years (~ 34%) 50% Others (45%)
Driver Error/Negligence
(25%)
Collision (20%)
Damage
(INR 18,000.00)
Paid (18%) 26-35 years (~58%) 55% Driver Error/Negligence
(55%)
Others (28%)
Damage
(INR 18,000.00)
Others (22%) 26-35 years (~44%) 70% Others(40%)
Traffic Congestion (20%)
Damage
(INR 18,000.00)
Theft-Partial
(INR 17000.00)
36-45years (~30%) 57% Driver Error/Negligence
(43%)
Collision (29%)
Damage
(INR 11,000.00)
Theft-Partial
23. Vehicle
Type
Driver type Age Group of
Drivers causing
accidents
% of the
License issued
in the last 10
years
Top 3 Cause of accident Nature of loss
and average
estimated
monetary value
CGV
(6%)
Paid (100%) 26-35 years (30%) 60% No prominent reasons Damage
(INR
1,99,000.00)
36-45 years (40%) 50% No prominent reasons Damage
(INR
1,68,000.00)
Moped/S
cooter
(13%)
Self (64%) Less than 25 years
(35%)
100% Collision (40%)
Others (40%)
Damage
(INR 2562.00)
46-55 years (28%) 50% Driver Error/Negligence
(75%)
Traffic Congestion (25%)
Damage
(INR 2995.00)
26-35 years (21%) 100% Traffic Congestion (65%)
Others (33%)
Damage
(INR 4717.00)
Others (32%) 36-45 years (57%) 50% Collision (50%)
Driver Error/Negligence
(25%)
Others (25%)
Damage
(INR 5165.00)
26-35years (29%) 100% Others (50%)
Traffic Congestion (50%)
Damage
(INR 3163.00)
24. Vehicle
Type
Driver type Age Group of
Drivers causing
accidents
% of the
License issued
in the last 10
years
Top 3 Cause of accident Nature of loss
and average
estimated
monetary value
Motorcy
cle
(20%)
Self (64%) Less than 25 years
(43%)
100% Driver Error/Negligence
(56%)
Others (22%)
Damage
(INR 7918.00)
26-35 years (48%) 100% Driver Error/Negligence
(30%)
Traffic Congestion (30%)
Collision (20%)
Damage
(INR 3833.00)
Others (33%) Less than 25 years
(55%)
100% Driver Error/Negligence
(83%)
Collision (17%)
Damage
(INR 6808.00)
Three
Wheelers
(2%)
Self (68%) 26-35 years (100%) 50% Driver Error/Negligence
(50%)
Over-speeding (50%)
Damage
(INR 31900.00)
25. Main Message
• 26-35 years is the most common age group.
• Young drivers are more riskier.
• Driver Error/Negligence is the most common cause of
accidents for private vehicles.
• Public vehicles don’t have any prominent reasons for
accident.
• Average Estimated loss for private vehicles is 10 times
that of public vehicles
26. Recommendation
• Person’s relationship with the vehicle should be considered.
• The year of license issued should be a major factor.
• Analysis results can be used for better design of insurance
terms and premium.
• Collect more data on driver’s mental physical condition for a
better picture
• Addition of medical conditions would also add more clarity.
• Weather condition data like rain, haze etc. can also be
additional help.