Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches

736 views

Published on

Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback
while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour.
This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance
policies and car rentals.
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different
behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches

  1. 1. Driving Style Analysis based on Trip Segmentation. A Comparative Multi-Technique Approach Marco Brambilla, Andrea Mauri, Paolo Mascetti @marcobrambi
  2. 2. Agenda Intro Problem Definition Dataset Data Exploration and Preliminaries Trip Segmentation Techniques Validation Conclusions
  3. 3. Intro: Relevance 1.24 million traffic-related fatalities occur annually world wide Currently the leading cause of death for people aged between 15 and 29 years Majority of cases due to improper or risky driving behavior Source: World Health Organisation (WHO)
  4. 4. Intro: Driving Process Driving Process: driving a car is a complex task that requires to take informed decisions based on information pertaining different levels such as his own state and other drivers’ behavior.
  5. 5. Intro: Relevant Information Vehicle’s Status Contextual Info • Road State • Weather Conditions • Traffic Info • Road Risk • Traffic
  6. 6. Problem Statement Data-driven driver profiling with respect to driving risk Essentially: Multivariate Time Series Segmentation Application scenarios in insurance, promoting pay-how-you-drive (PHYD) business models
  7. 7. State of the Art and Challenges State of the art: many works on identification and recognition of behavioural patterns (line following, accelerations, braking etc) and maneuvers recognition, behavioural scoring, prediction of driver intentions. Supervised Learning techniques require intensive end expensive gathering process.
  8. 8. Proposed Solution Unsupervised techniques to profile drivers behaviour based on identified recurrent patterns on driving path segmentation Comparison of 3 different approaches and use of all of them for consolidated results 1. Unsupervised Segmentation Based on Clustering 2. Unsupervised Segmentation Based on HMM 3. Unsupervised Topic Extraction
  9. 9. Contextual Scenes Observed driving behaviours that are repeated in each driver's behaviour and also across different drivers. A reduced representation of the original Multivariate Time Series conveying a simplified characterization Further reasoning is then applied
  10. 10. ETL Process 3 Steps: Extract: read collected files and selection of candidate features Transform: Filter and Grouping Features computation Load: produce a unique dataset PreProcessing Transform Global dataset.csv Load Trip File.csv Extract
  11. 11. Datasets Collection Device : Xsens MTi-G-710 (27 users) And cell phones (10 users) Retrieved Signals : Acceleration measurements Altitude GPS Positioning Speeding Orientation Mounted in-vehicle aligned with direction of movement. No Ground truth knowledge
  12. 12. Features Selected Acceleration (on Y and X axes), Speed (on Y and X axes) Difference in yaw
  13. 13. Pre-Analysis 1: Data Exploration
  14. 14. Pre-Analysis 1: Data Exploration
  15. 15. Pre-Analysis 2: Application of Driving Safety Existing Analyses Vaiana et.al. Propose a Driving Safety Diagram based on longitudinal and lateral accelerations analysis. Aggressiveness Index formulation: (A = Aggressive, S = Safe points) Graphical representation:
  16. 16. DP-Means
  17. 17. 1. Unsupervised Segmentation Based on DP-Means Clustering Problem: Bayesian nonparametric techniques require expensive sampling methods or variational techniques. DP-means: proposed by Kulis et. al. revisiting k-means: K-means like objective function + penalty A new cluster is created whenever a point is farther than λ away from every already existing centroid. Note: Clustering results depends on data ordering.
  18. 18. Clusters
  19. 19. Silhouette
  20. 20. Results Centroids
  21. 21. Results Centroids
  22. 22. Distribution of features across clusters
  23. 23. Distribution of features across clusters
  24. 24. Trip Segmentation Examples
  25. 25. Trip Segmentation Examples
  26. 26. Hidden Markov Models
  27. 27. Unsupervised Segmentation based on HMM Goal: identify latent structure given observed data points, assuming existance of Gaussian hidden states. Assign to each observed point the corresponding hidden state. Hidden Markov Models (HMM): Observation and hidden states Markovian properties Continous observation
  28. 28. Unsupervised Segmentation based on HMM Training: Baum-Welch EM algorithm to learn model parameters Decoding: Viterbi decoding to assign to each observed point the most likely hidden state
  29. 29. HMM Results Also a different variation applied: inertial HMM: lower transition probabilities enforcing state persistence. Sensible for driving.
  30. 30. HMM Results Clusters as hidden states.
  31. 31. HMM Results Clusters as hidden states.
  32. 32. Example of Trip Segmentation
  33. 33. Topic Extraction
  34. 34. Topic Extraction Approach What is topic extraction ? Model topical concepts belonging to a set of textual documents. Data are described as documents and the components are distributions of terms that reflect recurring patterns, name Topics. Hierarchical Dirichlet Processes (HDPs) soft-clustering technique based on non-parametric Bayesian theory. number of topics is not set a priori, but learned from data. Posteriori probability approximated by Variational Inference algorithm by Wang et.al. Results: Most relevant topics for each document and terms distribution in each topic.
  35. 35. Topic Extraction Process Data Quantization Documents creation Topics Extraction Topics Evaluation
  36. 36. Quantization – Binning Process with static binning strategy
  37. 37. Documents
  38. 38. Terms Relevance on Top 7 Topics Linguist…
  39. 39. Terms Relevance on Top 7 Topics … and data analyst perspectives …
  40. 40. Comparison and Validation
  41. 41. Big Issue: How to Compare? 1) Point-to-point or point distribution 2) Resulting grouping of trips 3) Perceived user similarity of trips
  42. 42. Solution 1: Point-to-Point Overlap of clusters? Per trip? Overall?
  43. 43. Solution 1: Point-to-Point
  44. 44. Solution 1: Point-to-Point
  45. 45. Solution 2: Moving from Points to Trips Can we cluster trips based on how observation points have been clustered? à Simple K-means clustering of trips for each approach. à Comparison of overlap of the different clusters Coherent with original question: grouping of trips (and thus drivers) by driving behavior
  46. 46. Result of overlap analysis K-means with K=6 clusters. DP-means vs. HMM: 74% overlap DP-means vs. Topic: 44% HMM vs. Topic: 48%
  47. 47. Human Validation of Trip Groups Experts (knowledgeable about driving styles and driving paths recorded) identify possible groups of trips in the dataset Problem: - Unable to distinguish 6 categories of groups - Only 3 categories are feasible - Best matching 6à3 categories for each method
  48. 48. Results
  49. 49. Conclusions Three different clustering techniques of driving behavior over trips -> segmentation Clustering of trips based on behavior -> up to 74% overlap over 6 clusters -> 100% overlap over 3 clusters User Validation -> 96% precision over 3 clusters
  50. 50. Future Work About collection process: Gathering process including contextual information (road risk, traffic status, weather conditions) Larger dataset to improve inference performance About implemented methods: Smarter data ordering for DP-means Relax independency assumption in HMM Improvements in data discretization process for HDP
  51. 51. Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it http://datascience.deib.polimi.it Thanks! Questions?

×