• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data

Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data



Cyril Connolly is a lecturer in mathematics and statistics at the Institute of Art, Design and Technology (IADT) in Dun Laoghaire, Co Dublin. Prior to this he was employed as a statistician in the ...

Cyril Connolly is a lecturer in mathematics and statistics at the Institute of Art, Design and Technology (IADT) in Dun Laoghaire, Co Dublin. Prior to this he was employed as a statistician in the National Roads Authority (NRA), Apple Inc. and Gallagher (Belfast) Ltd. In 1997 he was appointed to the Motor Insurance Advisory Board (MIAB). MIAB completed its work in 2004 culminating in a report to Government containing 63 recommendations for the reform of the motor insurance sector.



Total Views
Views on SlideShare
Embed Views



1 Embed 3

http://www.slashdocs.com 3



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data Document Transcript

    • Visualising Road Traffic Accident Data1. A Brief History of Data Visualisation The power and importance of effective visualisation has long been recognised. William Playfair (1759-1823) the founder of statistical graphics contrasted his new graphical method with the tabular presentation of data as follows; ‘Information, that is imperfectly acquired, is generally imperfectly retained; and a man who has carefully inspected a printed table, finds when done, that he has only a very faint and partial ideas of what he has read’ (1) This view has been echoed over the intervening 200 years. For example, Florence Nightingale (1820-1910) recognised the power of data visualisation as an effective aid for communicating to a wide audience issues of concern particularly the impact of poor sanitation on mortality rates during the Crimean war. This is summarised in her statement of the power of graphics‘ to affect thro the eyes what we may fail to convey to the brains of the public through their word-proof ears’. Graphical innovations were relatively absent in the first half of the 20th century but renewed interest in visualisation followed the publication in 1962 of a paper entitled ‘The Future of Data Analysis’ (2) by American statistician John W. Tukey. This paper was regarded as a landmark in data visualisation. Tukey suggested that we examine our data as a detective would examine the scene of a crime - not with a hypothesis - ‘I’ll bet the butler did it’, but with an open mind and as few assumptions as possible. This approach was a radical departure from conventional data analysis (and research programmes in general) which tended to be based on the scientific principles of formulating a hypothesis, collecting appropriate data and finally using some test statistic to decide on the validity of the hypothesis. Tukey believed by letting the data speak to us ‘we can learn the truths hidden beneath the random fluctuations, errors and general confusion seen in real data’. The publication in 1967 of Jacques Bertins ‘Semiologie Graphique’ (3) was also an important milestone in the development of data visualisation. In his foreword to the English version of this text published in 1983 Howard Wainer states that the text ‘is the most important work on graphics since the publication of William Playfairs Atlas. While WilliamCyril Connolly, IADT! 1
    • Playfair illustrated good graphic practice over 200 years previously he did not explain why the specific structures of his graphic forms and formats work’. The development of a variety of highly specialised and well-developed interactive computer systems during the 1970s allowed data to be analysed in a dynamic, iterative and visual manner. One of the early systems was known as the PRIM-9 (4) at the Standford Linear Accelerator Centre. PRIM stood for Projection, Rotation, Isolation and Masking and allowed for the exploration of multidimensional data in up to nine dimensions. It ran on an IBM system and required a few million dollars worth of computer and display hardware, (the display unit was $400,000 alone) and cost several hundred dollars an hour to use. Later developments in hardware and software allowed PRIM technology to become generally available on desktop computers. The innovative Apple Macintosh hardware and software, first produced during the mid 1980s led the way in these developments with applications like MacSpin (5) and DataDesk (6). These changes in computer systems have as William Cleveland states in his text Visualising Data (7) ‘changed how we carry out visualisation but not its goals’2. Data Visualisation using DataDesk DataDesk was originally developed on the Apple Macintosh platform by Apple research fellow, Paul Velleman during the latter part of the 1980’s and subsequently become available on the Windows platform. The principle feature of DataDesk, in contrast to other mainstream data analysis applications, is the ability to interact with multiple linked views of a dataset, so that, for example, selecting a subset of cases in one view highlights them in all other views. This ability to ‘slice and dice’ data using dynamic and interactive tools brings statistics to life generating interest and an appreciation of its importance in the decision making process. Some examples of the use of DataDesk to explore Irish road accident data are shown below. i) Regional Variation of Road Accidents The knife tool ‘slices’ over the east coast of Ireland’s accident scatterplot map in Figure 1 . The two bar charts to the right of this plot illustrate the daily (Sunday = 1, Saturday = 7) and monthly distribution of accidents (January = 1, December = 12). From the plot the distribution of accidents along the east cost by weekday and month appears to be fairly constant by weekday and month.Cyril Connolly, IADT! 2
    • Figure 1: Spatial distribution of east and west coast accidents ! If the knife is moved to the west coast as shown in Figure 1 the bar charts update automatically and the distribution of accidents by weekday and month reveal a different pattern to the east coast. Accidents by weekday are lowest during midweek and highest at the weekends while accidents by month are highest during the summer months and lowest during the winter months. ii) The Influence of Daylight Variation on Pedestrian Road Accidents ! Figure 2 illustrates the number of pedestrians killed in Ireland by month between 2000 and 2006. The plot suggests a U profile with accidents higher in the winter months but lower in the summer months. ! Figure 2: Monthly Distribution of Fatal Pedestrian Road Accidents,! To investigate this pattern in more detail a plot of the number of fatal pedestrians by hour is generated. Browsing the the hourly bar chart with the knife tool it becomes clear that the U shape is explained by fatalities between 16:00 to 21:00 hours as shown in Figure 3.Cyril Connolly, IADT! 3
    • Figure 3: Monthly distribution of fatal accidents between 16:00 and 21:00 (left) and excluding the hours 16:00-21:00 (right) This is further illustrated by examining the distribution of accidents excluding the hours 16:00 to 21:00 as shown in Figure 3. The monthly bar chart now shows no evidence of a seasonal profile. The seasonal U profile of fatal pedestrian accidents during these hours is explained by the variation in the number of hours of daylight during these hours throughout the year (8). For the winter months of December and January there is virtually no daylight during these hours and the corresponding number of fatal accidents is highest. For the summer months of June and July there is almost complete daylight between 4pm and 10pm and the number of pedestrian accident is lowest.. iii) Accident Profiling using Rotating Plots The French cartographer Jacques Bertin stated in his ground breaking text Graphics and Graphic Information Processing (9) that ‘it is not sufficient to have data, to have statistics, in order to arrive at a decision. Items of data do not supply the information necessary for decision making. What must be seen are the relationships which emerge from consideration of the entire set of data’ This statement is illustrated in the examination of the age distribution of the driver, front and rear seat passengers coded as ageDr, ageFP and ageRP, respectively. If we are restricted to working in what Edward Tufte (10) refers to as two-dimensional Flatland we would generate three scatterplots which would examine the relationship between driver and front seat passenger, driver and rear seat passenger and front seat and rear seat passenger as shown in Figure 4.Cyril Connolly, IADT! 4
    • Figure 4: Scatterplots of the age of driver vs age of front passenger (left), age of driver vs age of rear passenger (centre) and age of front seat passenger versus age of rear seat passenger While these plots illustrate the presence of up to three clusters it is through the use of a rotating plot that we can see the overall relationships emerging from consideration of the entire set of data as shown in Figure 5. After spending a short time rotating the data a star shape becomes evident with each arm corresponding to a distinctive cluster. Investigating the profile of each cluster is easy with DataDesk. Capturing each cluster using a lasso tool and dynamically linking the cluster with variables of hour, primcoltype, ageDr, ageFP and ageRP, and genderDr, genderFP and genderRP gender the profile of this segment can be readily determined. For example, in Figure 5 the centre cluster is selected. The linked variables suggest that this profile comprises young vehicle occupants with a substantial number of accidents in the early hours of the morning, a high proportion of primcoltype code 2 values which corresponds to single vehicle accidents. In addition, the profile of the driver is primarily male with an excess of male over female passengers. In summary, this accident profile is explained by young male drivers with passengers of a similar age who are involved primarily in single vehicle accidents. The principal causal factor associated with this profile is alcohol and /or excessive speed. Figure 5: Centre of star cluster with dynamically linked variables hour, type of collision, age and gender of vehicle occupantsCyril Connolly, IADT! 5
    • In contrast, selecting the southern arm of the star in Figure 6 we see a considerably different profile. The early morning surge is absent as is the dominance of code 2 primcoltype. The driver and front seat passengers are of a similar but older age profile with a considerably younger rear seat passenger. The drivers are primarily male, the front seat passengers are primarily female while the distribution of male and female rear seat passengers is virtually the same. It is clear that this profile represents accidents involving parents with a young child in the rear seat. The ability to slice, brush and rotate data allows the analyst to discover hidden patterns and relationships while also providing a framework for explaining more theoretical concepts including the use of multivariate analysis techniques Figure 6: Southern arm of star cluster with dynamically linked variables hour, type of collision, age and gender of vehicle occupants In summary data visualisation is described by the American psychologist and statistician Michael Friendly as ‘an approach to data analysis that focuses on insightful graphical display. The word ‘insightful’ suggests that the goal is (we hope) to reveal some aspects of the data that might not be perceived, appreciated or absorbed by other means’ (11).Cyril Connolly, IADT! 6
    • ! ! References [1] Playfair, William, Commercial and Political Atlas, London, 1786, pp xiii- xiv. Reprinted as Playfair’s Commercial and Political Atlas and Statistical Breviary edited and introduced by Howard Wainer and Ian Spence, 2005, Cambridge University Press. [2] Tukey, J. W., 1962, The future of data analysis, Annals of Mathematical Statistics, 33: 1-67, 812. [3] Bertin, J, Semiologie Graphique, 1967, Paris: Editions Gauthier-Villars. English translation by W.J. Berg as Semiology of Graphics, Madison, WI: University of Wisconsin Press, 1983., (reprinted in October 2010 by ESRI Press) [4] Fisherkeller, M.A., Friedman, J.H., and Tukey, J.W., 1975, PRIM-9: an interactive multidimensional data display analysis system, Data: Its Use, Organisation and Management, 140-145. New York: The Association for Computing Machinery. [5] Donoho, A.W., Donoho, D.L., and Gasko, M, 1988, MacSpin: Dynamic Graphics on a desktop computer. In W.S Cleveland and M.E. McGill, eds., Dynamic Graphics for Statistics. Belmont, CA: Wadsworth, pp 331-351. [6] Velleman, P.F., 1988, Data Desk. Ithaca, New York: Data Descriptions Inc. [7] Cleveland, W.S, Visualising data, 1993, Hobart Press, page 2. [8] Pedestrian Accidents in Ireland, Great Britain and Northern Ireland, 1998, National Roads Authority, Dublin. [9] Bertin, J, La Graphique et le Treatment Graphique de I’Information 1977, Paris: Flammarion. English translation by W.J. Berg and P. Scott as Graphics and Graphic Information Processing, 1981, Berlin: Walter de Gruyter & Co. [10] Tufte, E.R, 1990, Envisioning information, Graphics Press. pp 12-30. [11] Friendly, M.,2001, Visualizing Categorical Data, SAS Institute Inc.,Cary, NC, USA.Cyril Connolly, IADT! 7