SlideShare a Scribd company logo
1 of 16
Download to read offline
Analysis of UK Road Accidents
Author - Krishnendu Das
Student id - 28980980
Tutor – Yalong Yang
Introduction:
Almost one million of the world population die of road accidents. The following report explores
the accident data of United Kingdom (UK) from the year 2005 to 2014. Around twenty thousand
data recorded over the years provides significant insights to understand the accident behaviour and
factors influencing accidents in the UK.
Motivation:
While my stay in the United Kingdom for a year in 2015, I had the privilege to make some lifelong
friends who commute to work, daily. One of the most common problems that they regularly
complained about is the frequent road accidents which caused delays in their journey severely
impacting their work efficiencies. As a data scientist, exploring the various aspects of crashes in
the UK will allow me to extract critical information that can be used by the citizens to avoid future
disasters. The following are the few questions that are used to progress with the data analysis and
exploration:
• What is the condition affecting the accidents in the UK?
• How age, gender, the day of a week influences the accidents?
• What are the vehicles more prone to the accident?
Data Source:
As per UK’s data collection policy, STAT19 is a set of protocols that outlines the various
guidelines for the information collected when a crash happens. The entirety of the data is not public
due to data confidentiality. The information recorded in STAT19 has three distinct elements:
• Accidents.csv: Circumstance of the crash comprising of three components:
1. Date and time of the accident
2. Location of the accident with junction details
3. The condition of the area like weather and light
• Vehicles.csv: Record of the vehicle involved in the crash comprising of two
essential elements:
1. Age of the driver
2. Sex of the driver
• Casualities.csv: Details of the casualty comprising of:
1. The severity of the injury
The metadata definition of the variables in the data can be found at http://data.dft.gov.uk/road-
accidents-safety-data/Road-Accident-Safety-Data-Guide.xls. Link to data: https://bit.ly/2HwfIBK
Extra data files have been used to supplement the hypothesis derived from the analysis. Following
are the data files:
• VEH01: UK car sales data from licensed vehicles and new registration tables,
produced by Department for Transport.
Data: https://www.gov.uk/government/statistical-data-sets/all-vehicles-veh01
• RAS51012: the UK reported drink and drive data obtained from the department of
transport.
Data: https://www.gov.uk/government/statistical-data-sets/ras51-reported-
drinking-and-driving
• SPE0111: Data estimates of vehicle compliance with speed limits on roads
Data: https://www.gov.uk/government/statistical-data-sets/spe01-vehicle-speeds
Wrangling:
Wrangling the primary data was little challenging due to the size of the records. However, it was
more challenging to wrangle the supporting data. The main files were in CSV format and had
around 1.6 million data in each file. Wrangling was done entirely using R. The data were mostly
structured and contained more than 70 features. The following formatting strategies were followed:
1. The columns were renamed as it had multiple blank names
2. The files were read using read_csv
3. The three files were merged using dplyr package
4. Once combined, selected columns were transformed into a single data frame
5. The values in the column are factors and required joining with metadata
definitions to make more sense
The metadata file was difficult to wrangle as the file was .xlsx in format and R does not have
formal methods to read excel file. After analysing different packages to read excel file, XLConnect
is used to read the data. The excel file has more than 40 worksheets that contained the metadata
definition of the column values included in the primary data files.
It would have been difficult to convert all the 40 worksheets into a single data frame as the
dimension of each sheet is different. Hence a function was developed that read all the sheets and
store each of them as a list of data frames. The name of the worksheets contained
blank names but did not require any wrangling as the names were accessed using ``.
All the three files have a renamed common column: ‘Accident_Index’. Besides, vehicle and
casualty were related with an extra column: ‘vehicle reference’ which was also used to join them.
Merging the files require 15 minutes of processing. Three separate columns were derived from the
Date column in an accident.csv Preprocessing these columns will improve dplyr performance as
the data size is quite significant.
Wrangling the supporting files were more difficult as the data were unstructured. The data were in
.ods format and hence required particular attention. The files were converted to CSV and then read
into the R.
• VEH01: The file has the following structure.
Initial rows were skipped
1.The file was read using
XLConnect:: loadWorkbook
2.The Date string was cleaned
using a regular expression
3.Garbage values were
replaced with 0 instead of ‘…’
or ‘.’
4.Rows after 93 columns were
removed as the data involved
Britain records only
5.Finally, the different vehicle
columns were melted to get a
single column
• RAS51012: Snapshot of the structure of the file is provided below.
1. Same strategies were
used for this file as well as the
previous one
2. All the data were
extracted from each worksheet
and then transposed into a
single data frame
3. Data was fetched
from 2004 to 2015
• SPE0111: Snapshot of the structure of the file is provided below.
1. Wrangling this file was the
most challenging of all
2. Same strategies are followed
as per the above files
3. The heavy vehicles columns
had three sub-columns which
required aggregation and
converted into a single file
4. Data from all the sheets were
merged into a single data frame
with a year as one column
5. All the vehicle category columns were then melted into a single column
Data Checking
The following data cleaning techniques were used:
1. The categorical values were already factorised
2. There were age bands where drivers age was 0-6 and 7 to 11 with quite a high accident
count. Those values were cleaned as it is quite impossible to have a large number of
accidents with drivers of such age. Those values were replaced with the mode of the
distribution of the age band
3. There were missing values in all the columns which were fixed with imputation. The
missing values in the categorical data were replaced with their mode in the distribution
whereas that of the quantitative value were replaced with the mean of their distribution
4. There were few records where the value of the features was unknown like Gender, Road
Conditions, Light Conditions. They were imputed as well with mode imputation
5. The supporting data had missing values which were imputed with 0 as that would not
impact the entire distribution of the data
6. The percentage of the supporting files were converted to total aggregates
Exploration:
The exploration was started analysing the trend of the accident over the years.
Over the years the number of accidents has gradually reduced. However, accidents have
progressively increased from the year 2012 to 2014. There has been an increase of 5% in the
accident rate from the year 2013. Over the decade there has been an overall decrease of 26% in
accident rate as of 2014. The number of car sales every year was analysed to justify the above
trend. While classifying the severity of the accidents, the total count of severe and fatal crashes
has relatively remained the same over the years. However, there has been an overall decrease in
the number of Slight accidents by more than 33%. Over the decade, the maximum number of
Serious accidents happened in 2006 while the maximum Fatal crashes occurred in 2007. Slight
accidents contribute the most by 87.17%, followed by Serious accidents at 11.81% and Fatal
accidents at 1.02%. Car sales data for the UK was plotted over the years.
The number of car sales per year in
the above chart suggests that due to
the global financial crisis (recession)
the sales count has significantly
dropped from 2007 and gradually
increased from the year 2011. In
2012, UK witnessed a significant
surge in car sale which also overlaps
the accident trends over the years.
Thus, car sales have significantly
impacted UK car accidents.
The maximum number of accidents
happens in the district of
Birmingham, followed by Leeds,
Manchester and Glasgow. The top 20
cities affected by accidents in the UK
are shown in the left chart. Although
Birmingham has the highest number
of accidents, their police force ranks
less compared to the total number of
accidents they handle. Metropolitan police have dealt with the maximum number of accidents over
the years because they supervised 38 districts that are positioned around the central city of London.
The time of the accident was next analysed by Months, Weeks and Hours.
It can be seen that more or less the total number of accidents remained the same throughout the
year except during January and December. These two months have witnessed more accidents than
the other months because of the long holidays during these months when many would prefer to
take breaks and plan road trips leveraging the holiday. Thus, holiday seasons have an impact on
the number of accidents happening in the UK as well.
Now, the days of a week are explored for the accidents occurred. The figure above reveals that
most of the accident happens on Friday and the least on Sunday. It could be possible that most
people return home late at night, after recreation, being drunk. Drunk driving increases the rate of
accidents whereas people prefer to stay more at home on Sundays. This will become clearer if the
drunk driving data is explored for the recent years. The below graph shows the hourly drink drive
data on various days of a week from 2010 to 2016.
Although the data is a very recent one, it reveals the tendency of drunk driving more on Fridays
and Saturdays. The trend obtained suggests the peak time of drunken driving starts from 6:00 PM
and continues until midnight. Excluding the weekends, the following hourly accident trend is
obtained.
The above above graph shows that the accident trend follows the office hours trend. The dispersion
is bimodal, each mode has its peak during the start and the end of the office. As per the media
reports (Refer: http://www.bbc.com/news/uk-38026625), UK residents have an average commute
time of 2 hours which bolsters the above arguments. Consequently, it can also be correlated that
due to work stress, accidents in the evening are more than accidents in the morning. Thus, office
commuters significantly contribute to the number of accidents in the UK. So which type of accident
causes more casualties and where does it occur the most?
The index is the calculated percentage of the number of casualties divided by the number of
accidents. Higher the index, more severe is the category. The index for Fatal and Serious accidents
are relatively higher compared to the Slight accidents. What causes so much of causalities in the
first two sections and why there are so many slight accidents? Factors like Road type, speed limits,
weather, the area of driving, age, sex etc. will be analysed.
Road Conditions:
The above figure shows that the maximum number of accident occurred on a single carriageway
followed by dual carriageway. However, speed limits played a significant role. In dual
carriageway, 43.05% of the crash happens when the speed limit is 70mph whereas in single
carriageway more than 50% of the accident happens at the speed limit of 30 mph and 28% of them
occur at 60 mph. For single carriageway, the safer speed limit band is between 40 to 50 mph. It is
noteworthy to mention here that UK government changed the speed limit for Dual carriageway to
70 mph and single lane to 60 mph in 1977. As per the above findings, this decision certainly has
an impact on the total number of accidents in the UK.
Moreover, the accidents are more prone to places near a T or staggered junction followed by areas
which are within 20 meters to the T junction and crossroads.
The above figure shows that at a T junction 76% of the accident happens when there is a speed
limit of 30mph whereas it is 40% in case of junction within in 20 pts. 30 mph single carriage roads
near to the T-junctions are the most impacted points in the UK. This also corroborates to the fact
that unclassified roads in the UK which have a speed limit of 30 mph are the ones experiencing
the maximum number of accidents. The below graph confirms the statement.
Also, recent researchers around the world have revealed that around two-thirds of the crashes in
which people are killed or injured occur on roads which has a speed limit of 30 mph or less (refer:
http://www.carsfatal4.com/the-fatal-four/amani/ ). It has been observed that on 30 mph roads in
built-up areas, 45% of car drivers exceed 30 mph and 15% exceed 35 mph. It seriously increases
the risk of fatal injury and crash by 3.5 - 5.5 times (refer:
https://www.rospa.com/rospaweb/docs/advice-services/road-safety/drivers/inappropriate-
speed.pdf )
The above data exploration overlaps with the research outcomes. It could be seen that drivers tend
to over speed in all types of vehicles mostly by the cars followed by heavy goods vehicles(HGV).
Almost 40% of the total cars tend to overspeed whereas around 38% of HGVs tend to over speed
in a 30-mph speed limit road. ‘Car’ contributes more than 10% in over speeding on a 30-mph
speed limit road.
All the above condition makes unclassified roads, the most dangerous ones in the UK.
Vehicles and Driving Manoeuvres:
Vehicle and Driver is explored against their drive
maneuvering, age, sex etc. Examining the data, it
was found ‘cars’ contribute 76% to the accident
count followed by pedal cycle at 6% and Vans
(3.5-ton goods) at 5%. Segregating the accident
count by severity, it can be seen from the chart to
the right that for slight accident pedal cycles
contribute the most by 21%. However, fatal
accidents are mostly caused by Goods Carrier
(7.5 tons): 23%, followed by Motorcycle (500
cc): 22%. Serious accidents are mainly caused by
Motorcycle (500 cc) as well: 20%.
Thus, more severe accidents are caused by the motorbikes. From the above histogram on the left,
exploring the age distribution of motorbike drivers causing accidents, it was found that a huge
number of them are caused by teenagers. The reason could be that in the UK, the drivers are
initially required to pass a theory test rather than a practical test to get a two-wheeler license. This
strategy probably has the worst repercussion.
Extrapolating the above finding to all vehicles causing an accident, it lends the yearly trend on the
above right chart. It can be identified that gradually the number of teenagers aged between 11 to
15 causing accidents is substantially increasing from 2011 to 2014 whereas that of between age 16
to 20 are rising from 2013 after a steep decline in their number over the years The abrupt decrease
in the number of accidents due to motorbike drivers could be attributed to the stringent driving
license policies that are being enforced over the years excluding 125cc Motorbikes and pedal
cyclists (Refer: https://bit.ly/2r6Gx4q). As a result, more of the teenagers are licensing themselves
on bikes lesser than 125cc thereby gradually increasing their accident count. The below graph
exhibits the same trend.
Consequently, just vehicle type cannot define the cause of accidents. Exploring the vehicle
maneuver, it was found that ‘overtaking’ or ‘going ahead of others’ consumes the most of accident
cause. It is followed by driving movement – ‘turning right’. On analysis, it is found that accident
due to ‘turning right’ is more prevalent in unclassified roads. This could be due to the unclassified
roads are mostly single carriage ones without any partition in between causing a collision by
incoming traffic.
Going ahead of others contributes around 46% to the total number of an accident on all speed limit
roads. Thus, speed limits or overtaking does not have any impact on overtaking. On all routes,
drivers have the same tendency to go ahead of others causing accident. Exceeding on the offside
in a 15-mph highway is 25%, which is the highest amongst all the section.
Gender of Driver:
Sex of the driver can play a crucial role, impacting the count of accidents. Overall the years, the
trend for the number of male and female drivers have remained the same, with Male being the
dominant contributor. Men contributed to more than 63% of the UK accidents. The hourly
distribution of the gender is explored below:
The first section of the graph (next page) shows the spread of male and female drivers causing an
accident while commuting to work. The dispersion follows the office hour timings which suggests
that the rush starts around 7:00 AM in the morning and gradually wanes at around 10:00 AM. The
rush again spikes around 4:00 p.m. and ends at about 8:00 PM. This gives us an idea of the UK
office timings from 10 AM to 4 AM in most cases. The second section of the graph (next page)
shows the dispersion of the gender when the students/pupil drive themselves to school. The young
girls are very safe drivers compared to the boys.
The third section reveals something extraordinary. It shows the dispersion of accidents by gender
when parents drive their kids to and from the school. The female drivers are more prone to crashes
than the male drivers in this case. To explain this, the first section of the graph is explored. The
rush for the female drivers starts a bit late than the Male drivers while commuting to work. The
Male drivers begin early for their office. This could be because the females drive their kids to
school more than the males and must reach office within time simultaneously, causing an accident
due to rash driving – a hypothesis from the above trend that needs more research and analysis.
Light (darkness) and Weather Conditions:
The below chart reveals that 30 mph - unclassified roads are the ones that are most affected due to
darkness, causing an accident. Around 68% of the accident by darkness happens due to lights unlit
whereas 62% of them occurs due to no lighting. Wiltshire is the most affected city with no
lightning whereas the City of Edinburgh is most impacted due to lights unlit.
The below graph represents the impact of different weather conditions on the number of accidents.
It is evident from the below chart that ‘Darkness due to no lighting’ and ‘Fine, no high winds’ has
the maximum number of accidents. On the contrary to the popular belief, ‘Snowing + high winds’
and ‘unlit lights’ do not contribute much to accident percentage. This could be because people are
reluctant to drive in such snowy weather.
Due to ‘Snowing and High Winds’ and no lightning Pembrokeshire is the most accident-prone city
in this category. Analyzing the accident distribution over the map reveals that Scotland roads are
better lit up than Britain’s road. Also, roads connecting London have lights that are unlit compared
to other places which need proper supervision from the area administrators. Moreover, drives along
the coast of UK do not have lighting and requires more investigation.
Conclusion:
From the above analysis, it is imperative that Friday evenings experience more accidents than any
other hours of the day in a week, although the number of accidents has reduced over the years.
Further, the risk of a crash increases by 70% if the drive is on an unclassified road (30 mph speed
limit). The accident is least likely to occur if the driver maintains a speed of 40-50 mph in either
single or dual carriageway roads.
Besides, the driver should be more careful while taking a right turn on a single carriage road to
avoid accidents. More stringent practice by driving schools can be a wise alternative to tackle this
problem. Since the UK has many unclassified roads with a speed limit of 30 mph, its high time to
introduce more traffic lights near T-junctions for safer driving. To check accidents on motorbikes,
UK government should further restrict the licensing of driving 125cc bikes as their accidents are
increasing gradually over the years, since 2009.
England’s administration needs more supervision on street light maintenance of all the roads
connecting the city of London. Lastly, female drivers who drop their kids to schools on their way
to office should be more careful. Maximum accidents were caused by cars, where the purpose of
the commute was a daily job. In this regard, UK must introduce more trains in Birmingham, Leeds,
Manchester to reduce the accident density by private cars.
Reflection:
1. First hands-on experience in data exploration allowing to learn its various aspects
2. Realized how supporting data could be used to correlate trends and conclude
3. Thoroughly used R for wrangling and plotting and helped to get acquainted with dplyr for
extensive data preparation and analysis
4. An excellent opportunity to explore the data through various charts on Tableau
5. Approached in-depth dive analysis on the time aspects of the UK accidents and helped to
understand the impact of low granularity data on the study
6. Learned to implement time series plotting of a given data
7. Required to follow the road system hierarchy of the United Kingdom in detail
8. The different road aspects of the accident could have been analysed in a correlation matrix
9. Initially, the gender analysis was not insightful at all. Deep dive analysis of the same
element on a time series yielded more meaningful insights
10. The exploration comprised of a holistic approach to the United Kingdom as a country. An
interactive analysis of each city/districts through rich visualisation will allow bringing
more intelligent insights
11. Only selected features from the total 70+ features were analysed from the entire dataset
due to a shortage of time and page restrictions
12. Gained confidence to carry out future data exploration on the large dataset in personal
projects
Bibliography:
• https://www3.nd.edu/~steve/computing_with_data/24_dplyr/dplyr.html
• http://stat545.com/bit001_dplyr-cheatsheet.html
• https://github.com/tidyverse/dplyr/blob/master/R/colwise-mutate.R
• https://en.wikipedia.org/wiki/Roads_in_the_United_Kingdom
• https://en.wikipedia.org/wiki/Reported_Road_Casualties_Great_Britain
• http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r-
software-and-data-visualization
• https://www.statista.com/statistics/633052/share-vehicles-speeds-30-mph-roads-gb/
• https://www.express.co.uk/life-style/cars/790615/car-crash-UK-accidents-most-
dangerous-roads-revealed
• https://www.licencebureau.co.uk/wp-content/uploads/road-use-statistics.pdf
• https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_
data/file/390167/Birmingham_Evidence_Pack__for_publication__FINAL.pdf
• https://www.nomisweb.co.uk/reports/lmp/la/1946157186/report.aspx#tabrespop
• https://www.gov.uk/government/statistics/road-conditions-in-england-2017
• https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_
data/file/4484/CPR1131-analysis-of-stats-19-data.pdf
• http://www.bbc.co.uk/news/uk-15975564

More Related Content

Similar to UK Road Accident Analysis: 2005-2014

Towards Improving Crash Data Management System in Gulf Countries
Towards Improving Crash Data Management System in Gulf CountriesTowards Improving Crash Data Management System in Gulf Countries
Towards Improving Crash Data Management System in Gulf CountriesIJERA Editor
 
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...IRJET Journal
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceShantanu Deshpande
 
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...IRJET Journal
 
MGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportMGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportSara Husna
 
Speed Thrills But Kills_Group 2
Speed Thrills But Kills_Group 2Speed Thrills But Kills_Group 2
Speed Thrills But Kills_Group 2Ovais Siddiqui
 
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...IRJET Journal
 
Global traffic scorecard di Inrix
Global traffic scorecard di InrixGlobal traffic scorecard di Inrix
Global traffic scorecard di InrixFilippo Bernardi
 
Survey on Enhancing Accident Safety: Technological Solutions
Survey on Enhancing Accident Safety: Technological SolutionsSurvey on Enhancing Accident Safety: Technological Solutions
Survey on Enhancing Accident Safety: Technological SolutionsIRJET Journal
 
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...IRJET Journal
 
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...WRI Ross Center for Sustainable Cities
 
Seminar paper 5
Seminar paper 5Seminar paper 5
Seminar paper 5juilice
 
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...Data Portal India
 
final presenation - Saket_Anantesh (1)
final presenation - Saket_Anantesh (1)final presenation - Saket_Anantesh (1)
final presenation - Saket_Anantesh (1)Anantesh Salem
 
Automotive Aftermarket in North America to 2016
Automotive Aftermarket in North America to 2016Automotive Aftermarket in North America to 2016
Automotive Aftermarket in North America to 2016ReportsnReports
 
2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassaKassu Jilcha (PhD)
 
2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassaKassu Jilcha (PhD)
 
Analysis and Prediction of Crash Fatalities in Australia
Analysis and Prediction of Crash Fatalities in AustraliaAnalysis and Prediction of Crash Fatalities in Australia
Analysis and Prediction of Crash Fatalities in AustraliaFady M. A Hassouna
 

Similar to UK Road Accident Analysis: 2005-2014 (20)

Towards Improving Crash Data Management System in Gulf Countries
Towards Improving Crash Data Management System in Gulf CountriesTowards Improving Crash Data Management System in Gulf Countries
Towards Improving Crash Data Management System in Gulf Countries
 
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...
Identification of Accident Black Spots on NH-65(Behbalpur Village, Hisar to B...
 
Capstone
CapstoneCapstone
Capstone
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-Intelligence
 
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
 
MGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final ReportMGT 3050 Decision Science Final Report
MGT 3050 Decision Science Final Report
 
Speed Thrills But Kills_Group 2
Speed Thrills But Kills_Group 2Speed Thrills But Kills_Group 2
Speed Thrills But Kills_Group 2
 
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...
Identification of Accident Black Spots on NH-65(Hisar City to Behbalpur Villa...
 
Global traffic scorecard di Inrix
Global traffic scorecard di InrixGlobal traffic scorecard di Inrix
Global traffic scorecard di Inrix
 
Survey on Enhancing Accident Safety: Technological Solutions
Survey on Enhancing Accident Safety: Technological SolutionsSurvey on Enhancing Accident Safety: Technological Solutions
Survey on Enhancing Accident Safety: Technological Solutions
 
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
 
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...
Are We Reaching a Plateau or “Peak” Travel Trends in Passenger Transport in S...
 
Seminar paper 5
Seminar paper 5Seminar paper 5
Seminar paper 5
 
Assessment of Latin America Transport Data Availability and Quality
Assessment of Latin America Transport Data Availability and QualityAssessment of Latin America Transport Data Availability and Quality
Assessment of Latin America Transport Data Availability and Quality
 
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
Use of Road Accidents Data by Government Stakeholders to reduce Road Accident...
 
final presenation - Saket_Anantesh (1)
final presenation - Saket_Anantesh (1)final presenation - Saket_Anantesh (1)
final presenation - Saket_Anantesh (1)
 
Automotive Aftermarket in North America to 2016
Automotive Aftermarket in North America to 2016Automotive Aftermarket in North America to 2016
Automotive Aftermarket in North America to 2016
 
2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa
 
2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa2016 book published road accidents and safety from addis ababa to hawassa
2016 book published road accidents and safety from addis ababa to hawassa
 
Analysis and Prediction of Crash Fatalities in Australia
Analysis and Prediction of Crash Fatalities in AustraliaAnalysis and Prediction of Crash Fatalities in Australia
Analysis and Prediction of Crash Fatalities in Australia
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

UK Road Accident Analysis: 2005-2014

  • 1. Analysis of UK Road Accidents Author - Krishnendu Das Student id - 28980980 Tutor – Yalong Yang Introduction: Almost one million of the world population die of road accidents. The following report explores the accident data of United Kingdom (UK) from the year 2005 to 2014. Around twenty thousand data recorded over the years provides significant insights to understand the accident behaviour and factors influencing accidents in the UK. Motivation: While my stay in the United Kingdom for a year in 2015, I had the privilege to make some lifelong friends who commute to work, daily. One of the most common problems that they regularly complained about is the frequent road accidents which caused delays in their journey severely impacting their work efficiencies. As a data scientist, exploring the various aspects of crashes in the UK will allow me to extract critical information that can be used by the citizens to avoid future disasters. The following are the few questions that are used to progress with the data analysis and exploration: • What is the condition affecting the accidents in the UK? • How age, gender, the day of a week influences the accidents? • What are the vehicles more prone to the accident? Data Source: As per UK’s data collection policy, STAT19 is a set of protocols that outlines the various guidelines for the information collected when a crash happens. The entirety of the data is not public due to data confidentiality. The information recorded in STAT19 has three distinct elements: • Accidents.csv: Circumstance of the crash comprising of three components: 1. Date and time of the accident 2. Location of the accident with junction details 3. The condition of the area like weather and light • Vehicles.csv: Record of the vehicle involved in the crash comprising of two essential elements: 1. Age of the driver 2. Sex of the driver • Casualities.csv: Details of the casualty comprising of: 1. The severity of the injury
  • 2. The metadata definition of the variables in the data can be found at http://data.dft.gov.uk/road- accidents-safety-data/Road-Accident-Safety-Data-Guide.xls. Link to data: https://bit.ly/2HwfIBK Extra data files have been used to supplement the hypothesis derived from the analysis. Following are the data files: • VEH01: UK car sales data from licensed vehicles and new registration tables, produced by Department for Transport. Data: https://www.gov.uk/government/statistical-data-sets/all-vehicles-veh01 • RAS51012: the UK reported drink and drive data obtained from the department of transport. Data: https://www.gov.uk/government/statistical-data-sets/ras51-reported- drinking-and-driving • SPE0111: Data estimates of vehicle compliance with speed limits on roads Data: https://www.gov.uk/government/statistical-data-sets/spe01-vehicle-speeds Wrangling: Wrangling the primary data was little challenging due to the size of the records. However, it was more challenging to wrangle the supporting data. The main files were in CSV format and had around 1.6 million data in each file. Wrangling was done entirely using R. The data were mostly structured and contained more than 70 features. The following formatting strategies were followed: 1. The columns were renamed as it had multiple blank names 2. The files were read using read_csv 3. The three files were merged using dplyr package 4. Once combined, selected columns were transformed into a single data frame 5. The values in the column are factors and required joining with metadata definitions to make more sense The metadata file was difficult to wrangle as the file was .xlsx in format and R does not have formal methods to read excel file. After analysing different packages to read excel file, XLConnect is used to read the data. The excel file has more than 40 worksheets that contained the metadata definition of the column values included in the primary data files. It would have been difficult to convert all the 40 worksheets into a single data frame as the dimension of each sheet is different. Hence a function was developed that read all the sheets and store each of them as a list of data frames. The name of the worksheets contained blank names but did not require any wrangling as the names were accessed using ``. All the three files have a renamed common column: ‘Accident_Index’. Besides, vehicle and casualty were related with an extra column: ‘vehicle reference’ which was also used to join them.
  • 3. Merging the files require 15 minutes of processing. Three separate columns were derived from the Date column in an accident.csv Preprocessing these columns will improve dplyr performance as the data size is quite significant. Wrangling the supporting files were more difficult as the data were unstructured. The data were in .ods format and hence required particular attention. The files were converted to CSV and then read into the R. • VEH01: The file has the following structure. Initial rows were skipped 1.The file was read using XLConnect:: loadWorkbook 2.The Date string was cleaned using a regular expression 3.Garbage values were replaced with 0 instead of ‘…’ or ‘.’ 4.Rows after 93 columns were removed as the data involved Britain records only 5.Finally, the different vehicle columns were melted to get a single column • RAS51012: Snapshot of the structure of the file is provided below. 1. Same strategies were used for this file as well as the previous one 2. All the data were extracted from each worksheet and then transposed into a single data frame 3. Data was fetched from 2004 to 2015
  • 4. • SPE0111: Snapshot of the structure of the file is provided below. 1. Wrangling this file was the most challenging of all 2. Same strategies are followed as per the above files 3. The heavy vehicles columns had three sub-columns which required aggregation and converted into a single file 4. Data from all the sheets were merged into a single data frame with a year as one column 5. All the vehicle category columns were then melted into a single column Data Checking The following data cleaning techniques were used: 1. The categorical values were already factorised 2. There were age bands where drivers age was 0-6 and 7 to 11 with quite a high accident count. Those values were cleaned as it is quite impossible to have a large number of accidents with drivers of such age. Those values were replaced with the mode of the distribution of the age band 3. There were missing values in all the columns which were fixed with imputation. The missing values in the categorical data were replaced with their mode in the distribution whereas that of the quantitative value were replaced with the mean of their distribution 4. There were few records where the value of the features was unknown like Gender, Road Conditions, Light Conditions. They were imputed as well with mode imputation 5. The supporting data had missing values which were imputed with 0 as that would not impact the entire distribution of the data 6. The percentage of the supporting files were converted to total aggregates
  • 5. Exploration: The exploration was started analysing the trend of the accident over the years. Over the years the number of accidents has gradually reduced. However, accidents have progressively increased from the year 2012 to 2014. There has been an increase of 5% in the accident rate from the year 2013. Over the decade there has been an overall decrease of 26% in accident rate as of 2014. The number of car sales every year was analysed to justify the above trend. While classifying the severity of the accidents, the total count of severe and fatal crashes has relatively remained the same over the years. However, there has been an overall decrease in the number of Slight accidents by more than 33%. Over the decade, the maximum number of Serious accidents happened in 2006 while the maximum Fatal crashes occurred in 2007. Slight accidents contribute the most by 87.17%, followed by Serious accidents at 11.81% and Fatal accidents at 1.02%. Car sales data for the UK was plotted over the years.
  • 6. The number of car sales per year in the above chart suggests that due to the global financial crisis (recession) the sales count has significantly dropped from 2007 and gradually increased from the year 2011. In 2012, UK witnessed a significant surge in car sale which also overlaps the accident trends over the years. Thus, car sales have significantly impacted UK car accidents. The maximum number of accidents happens in the district of Birmingham, followed by Leeds, Manchester and Glasgow. The top 20 cities affected by accidents in the UK are shown in the left chart. Although Birmingham has the highest number of accidents, their police force ranks less compared to the total number of accidents they handle. Metropolitan police have dealt with the maximum number of accidents over the years because they supervised 38 districts that are positioned around the central city of London. The time of the accident was next analysed by Months, Weeks and Hours.
  • 7. It can be seen that more or less the total number of accidents remained the same throughout the year except during January and December. These two months have witnessed more accidents than the other months because of the long holidays during these months when many would prefer to take breaks and plan road trips leveraging the holiday. Thus, holiday seasons have an impact on the number of accidents happening in the UK as well. Now, the days of a week are explored for the accidents occurred. The figure above reveals that most of the accident happens on Friday and the least on Sunday. It could be possible that most people return home late at night, after recreation, being drunk. Drunk driving increases the rate of accidents whereas people prefer to stay more at home on Sundays. This will become clearer if the drunk driving data is explored for the recent years. The below graph shows the hourly drink drive data on various days of a week from 2010 to 2016. Although the data is a very recent one, it reveals the tendency of drunk driving more on Fridays and Saturdays. The trend obtained suggests the peak time of drunken driving starts from 6:00 PM
  • 8. and continues until midnight. Excluding the weekends, the following hourly accident trend is obtained. The above above graph shows that the accident trend follows the office hours trend. The dispersion is bimodal, each mode has its peak during the start and the end of the office. As per the media reports (Refer: http://www.bbc.com/news/uk-38026625), UK residents have an average commute time of 2 hours which bolsters the above arguments. Consequently, it can also be correlated that due to work stress, accidents in the evening are more than accidents in the morning. Thus, office commuters significantly contribute to the number of accidents in the UK. So which type of accident causes more casualties and where does it occur the most? The index is the calculated percentage of the number of casualties divided by the number of accidents. Higher the index, more severe is the category. The index for Fatal and Serious accidents are relatively higher compared to the Slight accidents. What causes so much of causalities in the first two sections and why there are so many slight accidents? Factors like Road type, speed limits, weather, the area of driving, age, sex etc. will be analysed.
  • 9. Road Conditions: The above figure shows that the maximum number of accident occurred on a single carriageway followed by dual carriageway. However, speed limits played a significant role. In dual carriageway, 43.05% of the crash happens when the speed limit is 70mph whereas in single carriageway more than 50% of the accident happens at the speed limit of 30 mph and 28% of them occur at 60 mph. For single carriageway, the safer speed limit band is between 40 to 50 mph. It is noteworthy to mention here that UK government changed the speed limit for Dual carriageway to 70 mph and single lane to 60 mph in 1977. As per the above findings, this decision certainly has an impact on the total number of accidents in the UK. Moreover, the accidents are more prone to places near a T or staggered junction followed by areas which are within 20 meters to the T junction and crossroads. The above figure shows that at a T junction 76% of the accident happens when there is a speed limit of 30mph whereas it is 40% in case of junction within in 20 pts. 30 mph single carriage roads near to the T-junctions are the most impacted points in the UK. This also corroborates to the fact
  • 10. that unclassified roads in the UK which have a speed limit of 30 mph are the ones experiencing the maximum number of accidents. The below graph confirms the statement. Also, recent researchers around the world have revealed that around two-thirds of the crashes in which people are killed or injured occur on roads which has a speed limit of 30 mph or less (refer: http://www.carsfatal4.com/the-fatal-four/amani/ ). It has been observed that on 30 mph roads in built-up areas, 45% of car drivers exceed 30 mph and 15% exceed 35 mph. It seriously increases the risk of fatal injury and crash by 3.5 - 5.5 times (refer: https://www.rospa.com/rospaweb/docs/advice-services/road-safety/drivers/inappropriate- speed.pdf )
  • 11. The above data exploration overlaps with the research outcomes. It could be seen that drivers tend to over speed in all types of vehicles mostly by the cars followed by heavy goods vehicles(HGV). Almost 40% of the total cars tend to overspeed whereas around 38% of HGVs tend to over speed in a 30-mph speed limit road. ‘Car’ contributes more than 10% in over speeding on a 30-mph speed limit road. All the above condition makes unclassified roads, the most dangerous ones in the UK. Vehicles and Driving Manoeuvres: Vehicle and Driver is explored against their drive maneuvering, age, sex etc. Examining the data, it was found ‘cars’ contribute 76% to the accident count followed by pedal cycle at 6% and Vans (3.5-ton goods) at 5%. Segregating the accident count by severity, it can be seen from the chart to the right that for slight accident pedal cycles contribute the most by 21%. However, fatal accidents are mostly caused by Goods Carrier (7.5 tons): 23%, followed by Motorcycle (500 cc): 22%. Serious accidents are mainly caused by Motorcycle (500 cc) as well: 20%.
  • 12. Thus, more severe accidents are caused by the motorbikes. From the above histogram on the left, exploring the age distribution of motorbike drivers causing accidents, it was found that a huge number of them are caused by teenagers. The reason could be that in the UK, the drivers are initially required to pass a theory test rather than a practical test to get a two-wheeler license. This strategy probably has the worst repercussion. Extrapolating the above finding to all vehicles causing an accident, it lends the yearly trend on the above right chart. It can be identified that gradually the number of teenagers aged between 11 to 15 causing accidents is substantially increasing from 2011 to 2014 whereas that of between age 16 to 20 are rising from 2013 after a steep decline in their number over the years The abrupt decrease in the number of accidents due to motorbike drivers could be attributed to the stringent driving license policies that are being enforced over the years excluding 125cc Motorbikes and pedal cyclists (Refer: https://bit.ly/2r6Gx4q). As a result, more of the teenagers are licensing themselves on bikes lesser than 125cc thereby gradually increasing their accident count. The below graph exhibits the same trend.
  • 13. Consequently, just vehicle type cannot define the cause of accidents. Exploring the vehicle maneuver, it was found that ‘overtaking’ or ‘going ahead of others’ consumes the most of accident cause. It is followed by driving movement – ‘turning right’. On analysis, it is found that accident due to ‘turning right’ is more prevalent in unclassified roads. This could be due to the unclassified roads are mostly single carriage ones without any partition in between causing a collision by incoming traffic. Going ahead of others contributes around 46% to the total number of an accident on all speed limit roads. Thus, speed limits or overtaking does not have any impact on overtaking. On all routes, drivers have the same tendency to go ahead of others causing accident. Exceeding on the offside in a 15-mph highway is 25%, which is the highest amongst all the section. Gender of Driver: Sex of the driver can play a crucial role, impacting the count of accidents. Overall the years, the trend for the number of male and female drivers have remained the same, with Male being the dominant contributor. Men contributed to more than 63% of the UK accidents. The hourly distribution of the gender is explored below: The first section of the graph (next page) shows the spread of male and female drivers causing an accident while commuting to work. The dispersion follows the office hour timings which suggests that the rush starts around 7:00 AM in the morning and gradually wanes at around 10:00 AM. The rush again spikes around 4:00 p.m. and ends at about 8:00 PM. This gives us an idea of the UK office timings from 10 AM to 4 AM in most cases. The second section of the graph (next page) shows the dispersion of the gender when the students/pupil drive themselves to school. The young girls are very safe drivers compared to the boys.
  • 14. The third section reveals something extraordinary. It shows the dispersion of accidents by gender when parents drive their kids to and from the school. The female drivers are more prone to crashes than the male drivers in this case. To explain this, the first section of the graph is explored. The rush for the female drivers starts a bit late than the Male drivers while commuting to work. The Male drivers begin early for their office. This could be because the females drive their kids to school more than the males and must reach office within time simultaneously, causing an accident due to rash driving – a hypothesis from the above trend that needs more research and analysis. Light (darkness) and Weather Conditions: The below chart reveals that 30 mph - unclassified roads are the ones that are most affected due to darkness, causing an accident. Around 68% of the accident by darkness happens due to lights unlit whereas 62% of them occurs due to no lighting. Wiltshire is the most affected city with no lightning whereas the City of Edinburgh is most impacted due to lights unlit.
  • 15. The below graph represents the impact of different weather conditions on the number of accidents. It is evident from the below chart that ‘Darkness due to no lighting’ and ‘Fine, no high winds’ has the maximum number of accidents. On the contrary to the popular belief, ‘Snowing + high winds’ and ‘unlit lights’ do not contribute much to accident percentage. This could be because people are reluctant to drive in such snowy weather. Due to ‘Snowing and High Winds’ and no lightning Pembrokeshire is the most accident-prone city in this category. Analyzing the accident distribution over the map reveals that Scotland roads are better lit up than Britain’s road. Also, roads connecting London have lights that are unlit compared to other places which need proper supervision from the area administrators. Moreover, drives along the coast of UK do not have lighting and requires more investigation. Conclusion: From the above analysis, it is imperative that Friday evenings experience more accidents than any other hours of the day in a week, although the number of accidents has reduced over the years. Further, the risk of a crash increases by 70% if the drive is on an unclassified road (30 mph speed limit). The accident is least likely to occur if the driver maintains a speed of 40-50 mph in either single or dual carriageway roads. Besides, the driver should be more careful while taking a right turn on a single carriage road to avoid accidents. More stringent practice by driving schools can be a wise alternative to tackle this problem. Since the UK has many unclassified roads with a speed limit of 30 mph, its high time to introduce more traffic lights near T-junctions for safer driving. To check accidents on motorbikes, UK government should further restrict the licensing of driving 125cc bikes as their accidents are increasing gradually over the years, since 2009. England’s administration needs more supervision on street light maintenance of all the roads connecting the city of London. Lastly, female drivers who drop their kids to schools on their way to office should be more careful. Maximum accidents were caused by cars, where the purpose of the commute was a daily job. In this regard, UK must introduce more trains in Birmingham, Leeds, Manchester to reduce the accident density by private cars.
  • 16. Reflection: 1. First hands-on experience in data exploration allowing to learn its various aspects 2. Realized how supporting data could be used to correlate trends and conclude 3. Thoroughly used R for wrangling and plotting and helped to get acquainted with dplyr for extensive data preparation and analysis 4. An excellent opportunity to explore the data through various charts on Tableau 5. Approached in-depth dive analysis on the time aspects of the UK accidents and helped to understand the impact of low granularity data on the study 6. Learned to implement time series plotting of a given data 7. Required to follow the road system hierarchy of the United Kingdom in detail 8. The different road aspects of the accident could have been analysed in a correlation matrix 9. Initially, the gender analysis was not insightful at all. Deep dive analysis of the same element on a time series yielded more meaningful insights 10. The exploration comprised of a holistic approach to the United Kingdom as a country. An interactive analysis of each city/districts through rich visualisation will allow bringing more intelligent insights 11. Only selected features from the total 70+ features were analysed from the entire dataset due to a shortage of time and page restrictions 12. Gained confidence to carry out future data exploration on the large dataset in personal projects Bibliography: • https://www3.nd.edu/~steve/computing_with_data/24_dplyr/dplyr.html • http://stat545.com/bit001_dplyr-cheatsheet.html • https://github.com/tidyverse/dplyr/blob/master/R/colwise-mutate.R • https://en.wikipedia.org/wiki/Roads_in_the_United_Kingdom • https://en.wikipedia.org/wiki/Reported_Road_Casualties_Great_Britain • http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r- software-and-data-visualization • https://www.statista.com/statistics/633052/share-vehicles-speeds-30-mph-roads-gb/ • https://www.express.co.uk/life-style/cars/790615/car-crash-UK-accidents-most- dangerous-roads-revealed • https://www.licencebureau.co.uk/wp-content/uploads/road-use-statistics.pdf • https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_ data/file/390167/Birmingham_Evidence_Pack__for_publication__FINAL.pdf • https://www.nomisweb.co.uk/reports/lmp/la/1946157186/report.aspx#tabrespop • https://www.gov.uk/government/statistics/road-conditions-in-england-2017 • https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_ data/file/4484/CPR1131-analysis-of-stats-19-data.pdf • http://www.bbc.co.uk/news/uk-15975564