Tushar Dalvi DWBI

Analysis of Road Accidents in United Kingdom
Tushar Shailesh Dalvi
x18134301
April 12, 2019
Abstract
Road traffic safety is one of the main Concern for department of transport also
for any country citizens. In order to provide maximum safety to everyone gov-
ernment bodies, local agencies, department of transport are continually evaluating
on current strategies of transportation. In order to reduce Accidents and avoid
causality, careful analysis of road traffic data should be done which can lead us to
safe driving environment where we can rigorously reduce casualties. This paper is
going to provide you a simple but useful analysis of UK Road Accident data which
can lead to reduce number of Road accident. In this paper I am going to study
how different road condition, road surface, Potholes are going to affect on driving
skills and leads to accidents. Another factors like weather condition, light situation
is how much does affect on the drivers visibility. How factors will really help us to
lead to analyse and understand the real cause of accident from first person perspec-
tive. This all analysis will also Overview the relationship between Vehicle collision
approach, vehicle condition, type of vehicle, drivers age group and Sexof thedriver.
To enhance this report, we will search in which are most accidents happened and
police officer or local authority attended accident spot or not.
1 Introduction
Road traffic crashes are one of the world’s largest public health and injury prevention
problems. The problem is all the more acute because the victims are overwhelmingly
healthy before their crashes and after crash they have injuries either physically or men-
tally. A report published by the WHO in 2004 estimated that some 1.2 million people
werekilledand50millioninjuredintrafficcollisionsontheroadsaroundtheworldeach
year and was the leading cause of death among children 10 to 19 years of age. The report
also noted that the problem was most severe in developing countries and that simple
prevention measures could halve the number of deaths Road traffic safety (2019)
Theproblemofdeathsandinjuryasaresultofroadaccidentsisnowacknowledgedto
beaglobalphenomenonwithauthoritiesinvirtuallyallcountriesoftheworldconcerned
about the growth in the number of people killed and seriously injured on their roads. In
recent years there have been twomajor studies of causes of death worldwide which have
beenpublishedintheGlobalBurdenofDisease(1996,WorldHealthOrganisation,World
Bank and Harvard University) and in the World Health Report Making a Difference’
(WHO 1999). These publications show that in 1990 road accidents as a cause of death
or disability were by no means insignificant, lying in ninth place out of a total of over
1

100 separately identified causes. However, by the year 2020 forecasts suggest that as a
cause of death, road accidents will move up to sixth place and in terms of years of life
lost (YLL) and disability adjusted life years (DALYs)1 will be in second and third place
respectively Jacobs & Aeron-Thomas (2000).
With the reference of this Reports and Findings I would like to contribute some
own findings that could help to reduce some road accidents and show some light in
Road Accident relationship with different factors. With the help of Data warehousing
technique, i am going to develop a system which can fulfil below Requirements:
(Req-1)Is there any relation between Road Death count and Population?
(Req-2)What is the relation between the Road Accident death and Car and Bikes Regis-
tration?
(Req-3)At what Rank United kingdom stands in Road Accident fatality count with other
European Countries?
2 Data Sources
To complete This project, I have used 3 data sets from 3 different Sources, of which 1st
is Structured data set, 2nd dataset is from Statista and 3rd data is unstructured which
is Scrape from PDF and Wikipedia. All data sources are explained below.
Source Type Brief Summary
Kaggle Structured Kaggle Data set contained all detailed in-
formation about number of records for year
from2011to2017withnumberofcasualties,
speed limit and other related information.
Statista Structured This Data set contained number of sales fro
Bike and Car from year 2011 to 2017
PDF Unstructured In this Data set we got information about
RoaddeathNumberforallEuropeanUnion
for year2017.
Wikipedia Unstructured In this Data set we got information about
United Kingdom Population from 2011 to
2017.
Table 2: Summary of sources of data used in the project

2.1 Source 1: Kaggle (Structure)
Toperformprojectmy1stDatasetisfromKaggleWhichcontainedtheinformationabout
United Kingdom Road Accident information and which Vehicle involved in Accident From
year 2005 to 2017. For the project work I download ONE data set which is Road Safety
Data - Accidents 2017, Data sets Created date on website are Tuesday, 14 August 2018
11:03:43 GMT+0100 (British Summer Time) and latest updated on Tuesday12 January
2019 12:23:24 GMT +0000 (Greenwich Mean Time), which fulfil the project requirement
that structured data should be created within 1 year. Data set was in Zip Format which
containing Accident Information.csv and Vehicle Information.csv format files. I have used
AccidentInformation.csvfiletoExploredatamoreandfindproperconclusion. Mainfile
which is Accident Information.csv which holds a data about Accident information in
which Accident Index, Accident Severity,Number of Casualties and year are very crucial
fields,expect from thesefieldother fields arelistedbelow which Ialso usedto get proper
insight of United Kingdom Road Accident data.
Figure 1: Source 1
Kaggle Source: https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles/
activity

2.2 Source 2: Statista
Second dataset is downloaded from this Statista website Statista Data set contained to-
tal number of Motorcycle and car registered in Unitedkingdom from year 2000 to 2017.
There were two excel sheets for Statista data. The first one was Overview, Mention-
ing all the details regarding the data and its source and the second sheet consisted
of actual yearly registration data. So, we removed the Overview sheet using the R
code(Appendix).The data was published in on April 2018 on Statista. This data in-
corporates Three columns Year, Cars, Motorcycles
Figure 2: Source 1
Statista Source: https://www.statista.com/statistics/312594/motorcycle-and-car-regist

2.3 Source 3: Wikipedia and Road Safety Authorities website
(Unstructured)
My third data set I used from two sources from www.wikipedia.org and www.rsa.ie. In
whichdataisprovidedinPDFformat. FromWikipediawehaveextractedthepopulation
of United Kingdom for years 2011 to 2017. When data is extracted from Wikipedia was
not in proper format, initially to scrape this data we use htmltab library. In first step we
scrape data, weremoved some unwantedrowsbecause this Wikipedia page-maintained
data from 1938 and to completed analysis we need data from year 2011 to 2017 also we
removed some extra Column, to identify the country weadded extra Country field. After
that we replaced the column name to understand the proper data which we are using
from specific column. In second stage we removed blank rows which was in population
columns, to complete this task we used is.na function which was best fitted for this
condition,andsomeofthepopulationcolumnhavingblankspacesinbetweennumbers,
because of that we are not able to get proper integer format in SSMS, hence we removed
that spaces with gsub function which is provided by R Programming.
SeconddatawhichwedownloadedfromwebsitewasinpdfformatfromWWW.rsa.ie
this website was maintained byRSA which is Road Safety Authority of Ireland was open
website. The data which we have downloaded is in pdf format accessible to anyone from
website. That pdf maintaining various data related Accidents in all European countries.
To scrape data from PDF we use libraries like tabulizer, rJava, tidyr. Those libraries
help to get proper data from pdf, Initially we set which data we need to download from
pdf, after we scrape proper data from pdf we need to clean that data for proper use.
After that we deleted some columns to get proper data which we can use. Later then we
renamed the column name to get proper insight about columns. Some of the countries
name containing junk values, to remove this value we use separate function. To identify
the year, we added Year field for better prospect. In table we removed population and
other field which was unwanted. Final and crucial step was to delete row which having
UK related accident death data, because we are getting this data from structured data,
I deleted those rows. Lastly whole cleaned data stored in csv format file.
Wikipedia Source: https://en.wikipedia.org/wiki/Demography_of_the_United_Kingdom
PDF Source: http://www.rsa.ie/en/Utility/News/2018/Ireland-4th-Safest-EU-Country-for-
3 Related Work
In this analysis which have been conducted based on the data and reports of Road
accidentsofUnitedkingdom, andbasedonthoseanalysis,actions are alreadytaken and
authority getting actions to prevent Road accidents. As we discussed in abstract Road
traffic safetyis oneof themain Concern for department of transport also for anycountry
citizens. Also as published in In 1998, road traffic injuries were estimated to be the 9th
leading cause of loss of healthy life globally and are projected to become the 3rd leading
causeby2020,The majorityofthisburden can be located inthedevelopingworldwhere
most of the projected Increase will occur Ghaffar et al. (2002).
Q1How have these (or similar) datasets been used before?
Q2What is generally known about the domain within which your requirements (in
Section 1) are situated?

Q3What significant results exist in this area, and how to you expect to add to them
by undertaking this project?
See ? for an example of a lit review that looks at specific challenges and approaches
within a given area.
4 Data Model
Inthissection, provide details of your dimensions, whytheyare in your datamodel, how
sources of data contribute to each of the dimensions, and present your star schema.
Adatamodel refers to thelogical inter-relationshipsanddata flow between different
dataelementsinvolvedintheinformationworld. Italsodocumentsthewaydataisstored
and retrieved. Data models facilitate communication business and technical development
byaccuratelyrepresentingtherequirementsoftheinformationsystemandbydesigning
the responses needed for those requirements. Data models help represent what data is
required and what format is to be used for different business processes What is a Data
Model? - Definition from Techopedia (n.d.). The reasons behind Kimballs Approach is,
it is easier to extend the data warehouse as it can easily accommodate new business
units. It is just creating new data marts and then integrating with other data marts
(n.d.a).Moreover, using Kimballs approach data storage is not an issue as the space
required is less which makes the data warehouse to process the queries much faster as
compared to Inmmons approach Yessad & Labiod (2016).
To achieve desire, need out of my dataset I connect my data with each other based
on unique factor they consist, so my primary data set 2.1 which consisting all detail
information about Road accidents is joined with my secondary data 2.2 statista based
on year and country. As well as my primary dataset 2.1 connected to unstructured 2.3
data on same factor to connect death count with countries and year. Based on these
datasetsIhavecreated3DimensionwhichisDimYear,DimCountryandDimRoadDeath
are explainedbelow:
DimYear: This dimension is created to holds data related to year which having in all
data source. In y structure dataset which consist the information about all Accidents
incidents by year, hence, to identify number of casualties year will be my main factor.
Same as Structure dataset my statista dataset 2.2 contained Car and bike registration
informationyearwise. hence,YearplayingoneofthemainfactorsinmyDatawarehouse.
The attribute which is used in this Dimension is YearID and Year, in which YearID is
generated using SSIS and used as Primary key in Year dimension.
DimCountry: In my structured and statista data source all county that we have com-
mon is United Kingdom by unlike these my Unstructured 2.3 data source contain all
European countries which I want to combine with Structure and statista, hence I have
created Country Dimension in which i added all the countries from my all sources. In
DimCountry I used CountryID, CountryCode and Country, CountryID is used as Pri-
mary key and generated using SSIS.
DimRoadDeath: this dimension consists of all the road accident count which I required.
WhilecreatingthesedimensionsIhavedeletedUnitedKingdomsRoadDeathcountfrom
Unstructred datatset 2.3 instead of unstructured I used my structure dataset 2.1 to get
the Road death count for each year from 2011 to 2017. This dimension consists Road-
DeathID, CountryCode and RoadDeath attributes, RoadDeathID is primary key for this
Table and Genrated by SSIS.

FactTable: The factTable which i made contain all the essential keys of the measure-
ment. FactTable plays essential role in setting up my business needs because it contain
all the aggregated values that I need to explain my BI queries. The attributes which I
used in FactTable in listed below: YearID: Primary Key of DimYear, used as Foreign key
in FactTable.
CountryID: Primary Key of DimCountry, used as Foreign key in FactTable
BikeRegistration: holds the Bike Registration Count for year 2011 to 2017.
RoadDeathID: Primary Key of DimRoadDeath, used as Foreign key in FactTable
Casulties: Casulties contain all the Death Count, which happened in all countries.
Population: Population holds data about Population count for each year.
CarRegistration: Contains Car Registration Count for year 2011 to 2017.
Figure 3: Star Schema

5 Logical Data Map
Below Logical Data Map used to show desired star schema, In this Data Map I have explained all dimensions and Fact table that I have
used and how to transformed before loading.
Table 3: Logical Data Map describing all transforma-
tions,sourcesanddestinationsforallcomponentsofthe
data model illustrated in Figure ??
Source Column Destination Column Type Transformation
1,2,3 Year DimYear Year Dimension Converted in integer from char format, collected dis-
tinct years from all 4 data sources.
1,2,3 CountryCode DimCountry CountryCode Dimension Collected all distinct countries from each source and
used in DimCountry Dimension $
1,2,3 Country DimCountry Country Dimension Collected all distinct countries with the reference from
countrycode from unstructred data and structured
source and used in DimCountry Dimension
1,3 Road Death
for Year2017
DimRoadDeath RoadDeath Dimension collected all Road casulties from PDF except United
Kingdom, Death Casualties for United Kingdom col-
lected from Structured data source
3 ISOCode DimRoadDeath CountryCode Dimension usedtolinkCasualtiesofspecificcountriestorespective
countries.
3 Population FactTable Population Fact Collected population from Wikipedia for United King-
dom for year 2011 to 2017.
2 CarSales FactTable CarRegistrationFact Data Collected to retrieve Car Registration count from
year 2011 to 2017.
2 BikeSales FactTable BikeRegistrationFact DataCollectedtoretrieveBikeRegistrationcountfrom
year 2011 to 2017.

6 ETL Process
ETL(Extract, Transform and Load) The process of extracting data from multiple source
systems,transformingittosuitbusinessneeds,andloadingitintoadestinationdatabase
is commonly called ETL, which stands for extraction, transformation, and loading
(n.d.b). ETL Process is Core process of any Data Warehouse Sysytem. Initially to Con-
struct a performance data warehouse the data is extracted from various sources. Next it
must be transformed and cleaned as per the requirement keeping in mind to remove all
kind of duplicate data, And then Load into Data warehouse, To complete my Project I
used below ETL Process.
Figure 4: Automated ETL Process
6.1 Extraction
First Structured Data downloaded from Kaggle, there was total 34 columns in data.
Which contained Accident Index, NumberOfCasulties, Speed Limit, and many other
fields. This is my primary data which consist of overall information about accident
occurred in United Kingdom from 2005 to 2017. Tocomplete this project mainly I need
data from year 2011 to 2017, hence removed excess Rows which I dont want with R Pro-
gramming Language. My second dataset was from statista, usually statista data comes
in proper cleaned format but carry multiple sheets in excel file. To remove and alter
field from statista data first thing I need to do is read that excel file in R, hence I used
readxl library to read file and provided sheet name that I wanted to read. Second sheet
containing data about total years from 2011 to 2017 with Total number of Motorcycles
and Car registered for respective years. Formy third dataset I used data from Wikipedia
and Road safety authority Annual report which was publically open to download for

anyone,finding out proper data from Wikipedia waseasy task but scraping that, specific
data and alter that wasnt that flexible. Raw data containing lots of unwanted fields like
fertility rate, Natural change to remove this fields I used transformation process. Same
like Wikipedia my second source for unstructured dataset was PDF which having lots of
data regarding Road accidents count all over European union, to find proper table from
pdf extract tables function from tabulizer really saved lots of time.
6.2 Transformation
After data is extracted, it must be physically transported to the target destination and
convertedintotheappropriateformat. Thisdatatransformationmayincludeoperations
such as cleaning, joining, and validating data or generating calculated data based on
existing values Kimball & Ross (2011). Initially I find out all important fields which I
need to keep and which I need to be removed. I used Readxl library to read my CSV
formatted file and allocated to variable, after that all changes has been done in that
variable only. Toremove excess rows which belongs from 2005 to 2011 I need to identify
thatrows,forthatusedgreplfunctionwithyearfieldthenusedexclamatorymarkinfrom
of grepl function so I can remove the records which containing year 2005 to 2010. After
that Ineed country fieldto identify the country,so Icreated new fieldusing country code
UK. Then I checked whole database for NULL values and blank spaces, in speed limit
fieldhadsomebankvalueswhichwascreatingirregularitywhileuploadingdatainSSMS,
hence I omitted all the blank and null values from the dataset with using function !is.na.
My second data asset was from statista, usually statista data comes in proper cleaned
format but carry multiple sheets in excel file. Toremoveand alter field from statista data
first thing I need to read that excel file in R, hence I used readxl library to read file and
provided sheet name that I wanted to read and insert all those rows and columns into
variable. After that I removed all unwanted rows in R so I can get proper content that
I need for data warehousing project. To identified data with respective country I added
country field, and finally write that data in csv formatted file. Formythird dataset I used
data from Wikipedia and Road safety authority Annual report which was publicly open
to download for anyone. In Transformation process I used htmltab library which helps
me to scrape proper data from Wikipedia. For cleaning process again !is.na function I
used. In this dataset some numeric values had spaces in-between, to find that spaces I
used gsub function which became very handy and easy to use. To write all cleaned data
in csv I used write.csv() function for proper output. Last but not the least data was to
scrape from PDF which was downloaded from pdf which is publicly available on Road
safety authority website which is maintained by Ireland RSA Team. Finding out proper
information wascrucial task andfor that Iused tabulizer, rJavaandtidyr librarieswhich
reduce my work. extract tables function from tabulizer I used to get proper data from
specific page, to get exact information even I used data coordinates to extract data from
pdf. The data which I got was raw data, hence cleaning data was main task for me. Some
of the column was having junk values in it, so using separate function from tidyr library
really saved a lot of time. this data containing United Kingdom death rate which was
already available in my structured data, hence we removed that row so I can get that
data from structure dataset.
A. Total number of libraries I used in cleaning process:

1. Readxl
2. Tabulizer
3. rJava
4. tidyr
5. htmltab
B. Total number of functions I used for cleaning process:
1. !grepl
2. !is.na
3. extract tables
4. data.frame
5. read excel
6. read.csv
7. write.csv
8. separate
9. setdiff
6.3 Load
The final step in the ETLprocess involvesloading thetransformed data into the destina-
tion target The final step in the ETL process involves loading the transformed data into
the destination target??.
To load cleaned and transformed data into Database through SSMS (SQL Manage-
ment Studio), I need to create a new database on which I can create data Warehouse,
the database I named for my project is UK which is created in SSMS. After creating
Database, we need to load all data in SSMS through SSIS (SQL Server Integration ser-
vice) on Staging Area. To complete this process, I used Data flow task which help to
completedataflowprocess. UnderdataflowtaskIusedflatfilesourcecomponentwhich
helps to load data from CSV file and carry that data SSMS using OLEDB destination
component. OLEDB component link data into SSMS and helps to create Table in SSMS
and load data into that table. In Flat file Source component need to set flat file con-
nection manager, in which we need to set name in connection manager name. after that
need to select file location from System. Once we set the file need to edit text qualifier
from NONEto Quotation marks which is depending upon thewhich separator isused in
Flatsourcefile. Intheleftpaneundercolumnssectionuabletopreview 100rowsofyour
Flat source file. Below Columns under the Advance row SSIS provides ability to change
data types for various field in Data Type option, same as data type OutputColumnWidth
helps to set character width for specific columns. Once done with this setting, we can
movetosomeSettingonOLEDBDestination. InsideOLEDBdestinationUnderconnec-
tion manager we can set the database in which we want to insert those values. We can
write a SQL query to create a new table in SSMS or SSIS provide us Accurate suggestion
to create table based on Flat file source. To create new table, we need to click on new
button, once create table query appears, we can change the table name as we want, I

used Flat file name so I cannot get confuse while creating dimension table. In left pane
wecan check the data mapping for columns from Flat file source to OLEDB destination.
Once westart the process in SSMS table will be get create inside the SSMS. Under single
Data flow task, we can Run multiple data flow task. For my project I had total 4 flat
files, so I used 4 data flow task components for each file.
After Files getting uploaded in SSMS we need to create Dimension table. Dimension
tables provide the context for fact tables and hence for all the measurements presented
in the Datawarehouse. Although dimension tables are usually much smaller than fact
tables, they are the heart and soul of the data warehouse because they provide entry
points to data Kimball & Ross (2011). To create Dimensions, I used SQL programming
language, with the help of SQL I created 2 Dimensions which was Country and Year. In
SSIS I used Execute SQL Task Component to run sql query, in that component under
construction section select SSMS instance and Database name and SQL statement used
towriteSQLquerythroughwhichIcreatedDimensiontables,whicharegoingtohelpme
tocreateFacttable. Facttablesholdthemeasurementsofanenterprise. Therelationship
between fact tables and measurements is extremely simple. If a measurement exists, it can
be modeled as a fact table row ??. To Create fact table again I used SQL Programming
language, creating Fact table was easy task because in fact table we only insert facts
that means numerical values but populating fact table with proper data was quite hard
and interesting job to perform, while populating fact table lots of duplicate data was
occurring in table and getting proper data in fact table was abstruse task. Using various
type of joins helps to populate proper data in fact table.
After creating Fact table its important that your fact table should interact with Di-
mension tables for that I have created Constraint to connect Fact table with Dimension
table. After that CUBE deployment was main task remaining to perform, because after
successful cube deployment we can able to see the desired Star schema. To complete
this process, I selected new cube from cube wizard and selected my existing Fact table
and dimension table, under fact table selected required measures which is required to
be perform for BI queries. Then I named my cube as UK and completed wizard. As
soon as wizard completed the star schema appeared with all chosen measures, to check
that fact table is populated or not I used explore data option after that displayed all
the measures with values which are required to perform BI Queries. I checked all the
values are properly showing or not and after all those processes I can say that my cube
is successfully deployed.
7 Application
In the above segment we saw fruitful Cube Deployment which we be utilizing to answer
our business inquiries in this segment. Following are the essential business inquiries which
I thought will be extremely helpful to survey the requirements which are talked about in
Section 3. The aftereffects of these quires as for the past related deal with the subject,
whichwasreferencedinsegment3,areexaminedindetailinarea7.4. Presently,givesusa
chancetoassessthe3businessquestionsandtheiroutcomestoaddressanddemonstrate
the attainment of your business requirements 1.

7.1 Does population of United Kingdom has any effect on Ca-
sualties throughout the year from 2011 to 2017 ?
For this query, the contributing sources of data are data source 2.1 and 2.3 Here we can
see that population is expanding subsequently from year 2011 towards to 2017. But if
we see there is no pattern in Casualty graph. At 2011 road casualties was at the highest
point with count 1797, but then it started falling at year 2013 casualty count was at least
point with 1608. As per our data casualties count for year 2017 is 1676 which lower that
2011 but higher than 2013. So from the graph we can say that population count does
not have any relation with number of road casualties for any year Figure 7.1.
Figure 5: Result of BI Query 1

7.2 BI Query 2:To Identify that the number of Casualties have
any effect on new Bike and Car Registration in United
Kingdom?
For this query, the contributing sources of data are source 2.1 and 2.2
AsGraphillustrate that casualties dont haveanyproperformat top most death count
as per graph is 1797 which was in 2011 but car and Bike registration Count was 28467
and1238respectivelyforsameyear. Eventhoughcasualtiesgraphisgoingupanddown,
Cars and bikes registration rate is increasing steadily every year. so, we can conclude
that united kingdom population is more intend to buy new vehicles by neglecting Road
Accidents. Figure ??. Figure 7.2

7.3 BI Query 3:To Find that in the count of number of ca-
sualties at what rank United Kingdom stands with other
European Union countries for year 2017.
For this query, the contributing sources of data are source 2.1 and 2.3
here we can see from world map representing that United Kingdom is on 7th place
onEuropeanunionroadcasualtycount,whetherFrancestandson1stnumberwith3448
Casualty count and Malta having least no. of casualties with 19 numbersFigure 7.3.

7.4 Discussion
Till now we are prepared with Result. Now its time to discuss the observation, so lets
discuss the BI Query 1[7.1] in details, this query provide us the stats for the death
casualties. even though population is increasing rapidly road casualties still going up
anddown, to reduce thecasualty countneedto improveroaduser behaviour. Improving
road user behaviour is fundamental to reducing road traffic injuries and fatalities. It
is one of five key pillars of the Global Plan for the Decade of Action for Road Safety
2011 2020 (alongside better road safety management, safer road networks, safer vehicles
and improved post-crash response) Organization et al. (2016). Street client conduct can
be improved by street well-being efforts, which in blend with social measures (e.g., law
implementation, instruction or preparing), can turn into an amazing method to induce the
general population to carry on more securely in traffic. The Global Plan for the Decade
ofActionisestablished intheSafeSystem approachwhichtendsto chanceelementsand
negotiation influencing street clients, vehicles and the street condition in a coordinated
manner, empowering increasingly powerful aversion. This methodology is known to be
fitting and compelling in settings around the world.
Secondqueryprovideusfactorswhichinformthatpeoplesareintendingtobuymore
vehicles for transportation. As per the report Over the past few decades regulation
and consumer demand have led to increasingly safe cars in high-income countries/ areas.
Many of the features that began as relatively expensive safety add-ons in high-end vehicles
have since become much more affordable and are now considered basic requirements
for all vehicles in some countries/areas. Rapid motorization in low and middle-income
countries/areas,where therisk of aroadtraffic crash ishighest andwheremotor vehicle
productionisincreasingintandemwitheconomicgrowth,meansthereisanurgentneed
for these basic requirements to be implemented globally Organization et al. (2017). To
reduce the death by vehicles, it is important to ensure that the vehicles design stick to
recognizedsafetystandards,butintheabsenceofsuchstandardsautomobilecompanies
can sell obsolete designs that are no longer legal in well-regulated countries. Alternatively,
automobile companies frequently de-specify life-saving technologies in newer models sold
in countries where regulations are weak or neglected.
8 Conclusion and Future Work
The result which we got from observation that help us to understand the Road death
casualtys status in United Kingdom from year 2011 to 2017, as per graphs new car
and bike registrations count does not depending Road accidents or not even road death
depending upon Countries population. Even we able to reduce some Accidents with
previous studies but did we able to improve our road safety systems? And we really
shaping our system to accident less cities? I tried to build system in which I combine
data from various factors and co relate the for better outcome. What I observed out
of graphs is Road accidents are damages peoples with different level like slight, serious
and fatal. In this Datawarehouse we covered only those count which had fatal accident
severity, with the addition of more detail data we can able make more dimension and
get detail insight about accident reasons. wecan check what wasthe drivers perspective,
road or Weather conditions, or maybe vehicle conditions can really help us to prevent
or reduce the total number of casualties. Because of this limitation in this database we
were not able to find the proper cause and conditions for accidents. I need toconsider

this for future aspects so we can build more robust system. In which we can relate and
identifymorecauseofroadaccidents,withlocations,weatherorroadconditions,vehicle
status, drivers perspective or maybe age, gender, as well as economical loss these points
could lead us to reduce Road accidents count.
References
(n.d.a).
(n.d.b).
Ghaffar, A., Hyder, A. A., Bishai, D. & Morrow, R. H. (2002), ‘Intervention for con-
trol of road traffic injuries: review of effectiveness literature’, JOURNAL-PAKISTAN
MEDICAL ASSOCIATION 52(2), 69–72.
Jacobs,G.&Aeron-Thomas,A.(2000),‘Areviewofglobalroadaccidentfatalities’,Paper
commissioned by the Department for International Development (United Kingdom) for
the Global Road Safety Partnership .
Kimball, R. & Ross, M. (2011), The data warehouse toolkit: the complete guide to dimen-
sional modeling, John Wiley & Sons.
Organization, W. H. et al. (2016), ‘Road safety mass media campaigns: A toolkit’.
Organization, W. H. et al. (2017), ‘Save lives: a road safety technical package’.
Road traffic safety (2019).
URL: https://en.wikipedia.org/wiki/Roadtraff icsafety
What is a Data Model? - Definition from Techopedia (n.d.).
URL: https://www.techopedia.com/definition/18702/data-model
Yessad, L. & Labiod, A. (2016), Comparative study of data warehouses modeling ap-
proaches: Inmon,kimballanddatavault,in‘2016InternationalConferenceonSystem
Reliability and Science (ICSRS)’, IEEE, pp. 95–99.
Appendix
R code example
#imporingInstalledLibrary library(
readxl ) #settingWorkingDirectory
setwd("E:/Final.Project/Statista")
#fetchingdatafromexcelintoSales
Sales <- data.frame(read_excel("UK.Car.and.Bike.Sales.xlsx", sheet ="Data"))
#DeletingFirstRows
Sales = Sales[-1:-2,]
#ChangingColumnsName

colnames( Sales) <-c("Year","Car.Sales","Bike.Sales")
#DeletingunwantedRow
Sales = Sales[-(1:11),]
#AddingNewField
Sales$Country <-’UK’
#writinginCSV
write.csv(Sales ,"Statisa_Sales.csv",row.names= F)
#ImportingInstalledLibrary
library( tab u lizer )
library( rJava )
library( tidyr )
#Setw orkingL ibrary
setwd("E:/Final.Project/Unstructure")
#load1sttablefrmPDF
ratio <-extract_tables("PIN_ANNUAL_REPORT_2018_final.pdf", pages =31, output ="da
#load2ndtablefrmPDF
country <-extract_tables("PIN_ANNUAL_REPORT_2018_final.pdf", pages =28, output ="
ratio <- ratio [[1]]
country <- country [[1]]
#toD eleteC olum ns
ratio <- ratio [, -c(5:8)]
#A ssignNametoC olum n
names(ratio) <-c("Country","Road_Death_for_Year_2017","Inhabitants","Death.per.Mil
#DeleteRows
ratio = ratio [ -(1:5) ,]
#seperateJunkValuesfromRow
ratio <- separate(ratio ,"Country", into =c("CountryCode","star"))
#DeleteJunkValuescolumn
ratio <- ratio [, -2]
#AddExtraYearField ratio
$ Year <-’2017’
#ratio$Country<-country$Country[match(ratio$Country,country$ISO.Code)]
ratio <- ratio [!( ratio$Country =="United.Kingdom"), ]
ratio <- ratio[setdiff(colnames( ratio),c(’Inhabitants’,’Death.per.Million.Inhabi
#DeleteRowswithUKName
ratio <- ratio [! grepl("UK", ratio$CountryCode ),]
#WriteallvaluesinCSV
write.csv(country ,"CountryName.csv",row.names= F)
write.csv(ratio ,"2018_Annual_Report.csv",row.names= F)
########################_ScrapingFromWikipedia_###############################
#SetworkingLibrary
setwd("E:/Final.Project/Unstructure")
library( htmltab)
url<-"https://en.wikipedia.org/wiki/Demography_of_the_United_Kingdom"
Population <- htmltab(doc =url,which=37)
Populatio n <- Populatio n[, -c(3:9)]
Population$Country <-’UK’
names( Population) <-c("Year","Population","Country")

Population <- Population [!is.na( Population$Population), ]
Populatio n = Populatio n[-(1:40),]
#Population$Population<-trim(Population$Population)
Population$Population <-gsub(".","",Population$Population )
write.csv(Population ,"population.csv",row.names= F)

Tushar Dalvi DWBI

Recommended

Recommended

More Related Content

Similar to Tushar Dalvi DWBI

Similar to Tushar Dalvi DWBI (20)

Recently uploaded

Recently uploaded (20)

Tushar Dalvi DWBI