Traffic Fatality Data Mining <ul><li>Final Presentation  </li></ul><ul><li>For </li></ul><ul><li>IS 383  </li></ul><ul><li...
Overview <ul><li>Introduction </li></ul><ul><li>Understanding business data </li></ul><ul><li>Prepare and Process data </l...
Introduction & Understanding Business Data <ul><li>KD assignment involved traffic safety </li></ul><ul><li>National Highwa...
Prepare and Process Data <ul><li>The agency had done all the preprocessing work </li></ul><ul><li>Dataset was available as...
Transformation of Data <ul><li>Values of attributes were coded in numeric form </li></ul><ul><li>A user guide was provided...
Results: Restraint and Fatality <ul><li>Attributes Seat Belt Usage (REST_USED) and Fatality were derived from the Person t...
Results:  Rollover and Make/Model <ul><li>Derived from the Vehicle table, Make Model and Rollover vehicles were tested usi...
Results:  Air Bag & Fatality   <ul><li>Two variables AIR_BAG and Fatality produced very interesting results when checked f...
Results:  Pedestrian & State   <ul><li>Apriori on both the STATE and PEDESTRIAN </li></ul><ul><li>PEDESTRIAN values: </li>...
Results: Owner & Drunk  Driving <ul><li>Apriori on the Owner [self-registered, biz/government, rental etc] and Drunk drivi...
Results:  State & Weather   <ul><li>Association between states and weather (boolean) revealed that fatalities involving le...
Results:  Pedestrian & Light   <ul><li>Apriori testing with variables HIT_RUN (hit or run), LGT_COND (light condition at t...
Decision Tree: Rollover <ul><li>Attributes:  </li></ul><ul><ul><li>Make of vehicle </li></ul></ul><ul><ul><li>Body type of...
New Knowledge <ul><li>Hit and Runs very frequently happen in conditions of low light </li></ul><ul><li>Pedestrians are ver...
Upcoming SlideShare
Loading in …5
×

FARS (Fatality Analysis Reporting System) Datamining

737 views

Published on

some interesting facts about highway fatalities that I found using WEKA

Published in: Automotive, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
737
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FARS (Fatality Analysis Reporting System) Datamining

  1. 1. Traffic Fatality Data Mining <ul><li>Final Presentation </li></ul><ul><li>For </li></ul><ul><li>IS 383 </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>By </li></ul><ul><li>Taimur Hassan </li></ul>
  2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Understanding business data </li></ul><ul><li>Prepare and Process data </li></ul><ul><li>Transform data </li></ul><ul><li>Data Mining Results </li></ul><ul><li>Analysis and Interpretation </li></ul><ul><li>New Knowledge </li></ul>
  3. 3. Introduction & Understanding Business Data <ul><li>KD assignment involved traffic safety </li></ul><ul><li>National Highway Traffic Safety Administration (NHTSA) collects data about any traffic accident that involves one or more fatality </li></ul><ul><li>FARS (Fatality Analysis Reporting System) has been available to public since 1975 </li></ul><ul><li>FTP site allows data to be downloaded in multiple formats including database tables (DBF) and Excel Spreadsheet </li></ul>
  4. 4. Prepare and Process Data <ul><li>The agency had done all the preprocessing work </li></ul><ul><li>Dataset was available as three separate tables: </li></ul><ul><ul><li>Accident: Contained details about the accident itself </li></ul></ul><ul><ul><li>Person: Contained most details about the person injured or killed involving the accident </li></ul></ul><ul><ul><li>Vehicle: Contains data about the vehicle type, make/model, VIN etc </li></ul></ul><ul><li>NO personal information was provided, only information to help in analysis </li></ul><ul><li>A total of 198 fields of data were distributed amongst the three tables, some showing up in multiple tables </li></ul><ul><li>The data was ready to be analyzed </li></ul>
  5. 5. Transformation of Data <ul><li>Values of attributes were coded in numeric form </li></ul><ul><li>A user guide was provided with dataset to interpret all the values of the codes </li></ul><ul><li>Missing data values were encoded as such e.g. ‘9’ meant ‘unknown’ </li></ul><ul><li>Some interesting attributes were separated to compare and run through algorithms and to save memory and time </li></ul><ul><li>In order to consolidate results, some attribute values were grouped. An example: </li></ul><ul><ul><li>AIR_BAG variable, which had about 33 distinct values, however, they could be consolidated to simple values such as NOT_DEPLOYED, DEPLOYED, NOT_AVAILABLE </li></ul></ul><ul><ul><li>This allowed a better look at the relationship between AIR_BAG and FATALITY (discussed later) </li></ul></ul>
  6. 6. Results: Restraint and Fatality <ul><li>Attributes Seat Belt Usage (REST_USED) and Fatality were derived from the Person table and tested through the Apriori algorithm in Weka for associations. </li></ul><ul><li>  The Seat Belt Usage attribute had the following values: </li></ul><ul><ul><li>00 None Used/Not Applicable </li></ul></ul><ul><ul><li>01 Shoulder Belt </li></ul></ul><ul><ul><li>02 Lap Belt </li></ul></ul><ul><ul><li>03 Lap and Shoulder Belt </li></ul></ul><ul><ul><li>04 Child Safety Seat </li></ul></ul><ul><ul><li>05 Motorcycle Helmet </li></ul></ul><ul><ul><li>06 Bicycle Helmet </li></ul></ul><ul><ul><li>08 Restraint Used - Type Unknown </li></ul></ul><ul><ul><li>13 Safety Belt Used Improperly </li></ul></ul><ul><ul><li>14 Child Safety Seat Used Improperly </li></ul></ul><ul><ul><li>15 Helmets Used Improperly </li></ul></ul><ul><ul><li>99 Unknown  </li></ul></ul><ul><li>Fatality is a boolean value. The results revealed four interesting rules: </li></ul><ul><li>Fatality is a boolean value. The results revealed four interesting rules: </li></ul><ul><li>Conclusion : We can see from the results that usage of only a shoulder belt is linked to increased mortality rate than usage of it and a lap belt or lap belt alone. It can be hypothesized that a shoulder belt without a lap belt may present a danger to occupants during accidents, it can lead to occupants choking when forced against the belt </li></ul>
  7. 7. Results: Rollover and Make/Model <ul><li>Derived from the Vehicle table, Make Model and Rollover vehicles were tested using the Apriori algorithm to test the associations. </li></ul><ul><li>The values of MAKE_MODEL variable was derived from the merging of values in two separate variables MAKE and MODEL. </li></ul><ul><li>The ROLLOVER variable is a boolean value indicating the vehicle’s orientation after the accident. </li></ul><ul><li>Five rules were found to be very helpful in understanding the tendency of certain make/models to rollover more than others during an accident. </li></ul><ul><li>Conclusion: Upon further details on the particular make/models, we find that they are all in the category of light-trunk vehicles. </li></ul><ul><li>From safety reviews we find that center of gravity can affect chances of rollover </li></ul><ul><li>These cars have a high center of gravity with a small wheelbase, making them more vulnerable than a car. </li></ul>
  8. 8. Results: Air Bag & Fatality <ul><li>Two variables AIR_BAG and Fatality produced very interesting results when checked for associations by Apriori. AIR_BAG variable has the values: </li></ul><ul><li>DEPLOYED (The airbag(s) deployed for the victim’s side) </li></ul><ul><li>NOT_DEPLOYED (There was an airbag for the seat, but for various reasons, it did not deploy) </li></ul><ul><li>NOT_AVAILABLE (If motorcycle or late model car etc) </li></ul>Conclusion: However, we see that for the age groups listed above, it produces an opposite result.
  9. 9. Results: Pedestrian & State <ul><li>Apriori on both the STATE and PEDESTRIAN </li></ul><ul><li>PEDESTRIAN values: </li></ul><ul><ul><li>If pedestrians injured/killed = YES else NO </li></ul></ul><ul><li>we find six states most likely to have pedestrians injured or killed during a vehicle accident. </li></ul>Conclusion: Washington DC and New York have the highest pedestrian injuries per accident. However, as DC is a city, it makes sense that pedestrians are involved in almost 29% of accidents. However, New York and Hawaii pedestrian accidents still leave room for further investigations into the causes.
  10. 10. Results: Owner & Drunk Driving <ul><li>Apriori on the Owner [self-registered, biz/government, rental etc] and Drunk driving variable found an interesting, but predictable rule. </li></ul><ul><li>Only 6% of fatalities involving Biz/Gov vehicles involved drinking. </li></ul>Conclusion: From the second statistic, we see that about 35% of all drinking and driving fatalities involved victims driving vehicles not owned by them. Therefore, it can indicate that vehicles should be lent to trustworthy people.
  11. 11. Results: State & Weather <ul><li>Association between states and weather (boolean) revealed that fatalities involving least adverse weather such as rain, snow, fog etc was in Nevada, New Mexico and California. </li></ul><ul><li>The states that reported fatalities involving bad weather were New Jersey, Louisiana and Massachusetts. </li></ul><ul><li>This is not to say that bad weather caused the fatality. </li></ul><ul><li>Further research can reveal further information such as time of day, highway type etc. </li></ul><ul><li>Conclusion: It makes sense that Nevada, New Mexico and California being desert-like states would have the lowest incidences of fatality involving adverse weather. </li></ul><ul><li>It is revealing that New Jersey would have the most fatalities of states that occurred in bad weather. </li></ul><ul><li>Further investigation can lead to initiatives that improve the rate of fatality during adverse weather conditions. </li></ul>
  12. 12. Results: Pedestrian & Light <ul><li>Apriori testing with variables HIT_RUN (hit or run), LGT_COND (light condition at time of accident) and PED (pedestrian injured/killed or not) </li></ul><ul><li>Reveals that pedestrians tend to be involved most when light conditions are dark or dark, but lighted (almost 62% combined). </li></ul><ul><li>Hit and and runs tend to happen also during the dark or dark/lighted condition, especially when a pedestrian is involved (70-72%) </li></ul><ul><li>Conclusion: we can conclude that being a pedestrian during such hours puts one in great risk of being struck by a vehicle. </li></ul><ul><li>Further research questions can answer questions about the types of highways most likely to have pedestrian accidents. </li></ul><ul><li>Another very important fact that police authorities can predict is that hit and run accidents will very likely involve dark conditions and pedestrians being hit. </li></ul><ul><li>An explanation into the Hit and Run behavior can be that after such an accident drivers may feel it easy to leave the scene as they may think (and rightly so) that people would not have seen what happened or that they may not be able to identify the driver or vehicle. </li></ul>
  13. 13. Decision Tree: Rollover <ul><li>Attributes: </li></ul><ul><ul><li>Make of vehicle </li></ul></ul><ul><ul><li>Body type of vehicle </li></ul></ul><ul><ul><li>Travel speed at time of accident </li></ul></ul><ul><ul><li>The method of avoidance </li></ul></ul><ul><ul><ul><li>BRAKE, STEERING, BRAKE + STEERING, NOT_USED, OTHER. </li></ul></ul></ul><ul><li>Class: ROLLOVER (YES or NO) </li></ul><ul><li>The decision tree would help predict what kind of vehicle types are prone to rollover at certain speeds </li></ul><ul><li>What maneuvers can be used to prevent a rollover from occurring. </li></ul><ul><li>The results showed that for 82% of instances, the decision tree proved correct. </li></ul><ul><li>BODY_TYPE = CAR: NO (26167.0/4134.0) </li></ul><ul><li>BODY_TYPE = LIGHT_TRUCK </li></ul><ul><li>| TRAV_SP = 30-59_MPH: NO (4164.0/854.0) </li></ul><ul><li>| TRAV_SP = 75-96_MPH: YES (685.0/211.0) </li></ul><ul><li>| TRAV_SP = 60-74_MPH </li></ul><ul><li>| | AVOID = BRAKES: NO (143.0/55.0) </li></ul><ul><li>| | AVOID = STEERING: YES (446.0/149.0) </li></ul><ul><li>| | AVOID = NOT_USED: NO (752.0/303.0) </li></ul><ul><li>| | AVOID = STEER_AND_ BRAKES: NO (175.0/80.0) </li></ul><ul><li>| TRAV_SP = PARKED: NO (423.0/14.0) </li></ul><ul><li>| TRAV_SP = BELOW_30: NO (775.0/63.0) </li></ul><ul><li>| TRAV_SP = 96+_MPH </li></ul><ul><li>| | AVOID = BRAKES: NO (371.0/94.0) </li></ul><ul><li>| | AVOID = STEERING: YES (837.0/387.0) </li></ul><ul><li>| | AVOID = NOT_USED: NO (2192.0/459.0) </li></ul><ul><li>| | AVOID = STEER_AND_ BRAKES: NO (318.0/142.0) </li></ul><ul><li>BODY_TYPE = VAN </li></ul><ul><li>| TRAV_SP = 30-59_MPH: NO (821.0/130.0) </li></ul><ul><li>| TRAV_SP = 75-96_MPH: YES (85.0/22.0) </li></ul><ul><li>| TRAV_SP = 60-74_MPH </li></ul><ul><li>| | AVOID = BRAKES: NO (33.0/10.0) </li></ul><ul><li>| | AVOID = STEERING: YES (54.0/20.0) </li></ul><ul><li>| | AVOID = NOT_USED: NO (139.0/45.0) </li></ul><ul><li>| | AVOID = STEER_AND_ BRAKES: NO (29.0/13.0) </li></ul><ul><li>| TRAV_SP = PARKED: NO (113.0/3.0) </li></ul><ul><li>| TRAV_SP = BELOW_30: NO (260.0/14.0) </li></ul><ul><li>| TRAV_SP = 96+_MPH: NO (1397.0/251.0) </li></ul><ul><li>BODY_TYPE = HEAVY/LARGE_TRUCK: NO (4194.0/542.0) </li></ul><ul><li>BODY_TYPE = MEDIUM_TRUCK: NO (438.0/59.0) </li></ul>
  14. 14. New Knowledge <ul><li>Hit and Runs very frequently happen in conditions of low light </li></ul><ul><li>Pedestrians are very likely to be injured/killed in low light conditions </li></ul><ul><li>Avoid using only steering to control rollover </li></ul><ul><li>Make sure you always wear a seat belt despite having air bags </li></ul><ul><li>Position your shoulder belt properly and always wear in conjunction with lap belt </li></ul>

×