Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback
while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour.
This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance
policies and car rentals.
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different
behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches
Driving Style Analysis
based on Trip Segmentation.
A Comparative Multi-Technique Approach
Marco Brambilla, Andrea Mauri, Paolo Mascetti
Data Exploration and Preliminaries
Trip Segmentation Techniques
1.24 million traffic-related fatalities occur annually
Currently the leading cause of death for people aged
between 15 and 29 years
Majority of cases due to improper or risky driving
Source: World Health Organisation (WHO)
Intro: Driving Process
Driving Process: driving
a car is a complex task
that requires to take
based on information
levels such as his own
state and other drivers’
Intro: Relevant Information
• Road State
• Traffic Info
• Road Risk
Data-driven driver profiling
with respect to driving risk
Essentially: Multivariate Time Series Segmentation
Application scenarios in insurance, promoting
pay-how-you-drive (PHYD) business models
State of the Art and Challenges
State of the art: many works on identification and
recognition of behavioural patterns (line following,
accelerations, braking etc) and maneuvers
recognition, behavioural scoring, prediction of
Supervised Learning techniques require intensive
end expensive gathering process.
Unsupervised techniques to profile drivers
behaviour based on identified recurrent patterns
on driving path segmentation
Comparison of 3 different approaches and use of
all of them for consolidated results
1. Unsupervised Segmentation Based on Clustering
2. Unsupervised Segmentation Based on HMM
3. Unsupervised Topic Extraction
Observed driving behaviours that are
repeated in each driver's behaviour and
also across different drivers.
A reduced representation of the original
Multivariate Time Series conveying a
Further reasoning is then applied
Extract: read collected files and selection of candidate features
Filter and Grouping
Load: produce a unique dataset
Collection Device :
Xsens MTi-G-710 (27 users)
And cell phones (10 users)
Retrieved Signals :
Mounted in-vehicle aligned with
direction of movement.
No Ground truth knowledge
Acceleration (on Y and X axes),
Speed (on Y and X axes)
Difference in yaw
Pre-Analysis 2: Application of Driving
Safety Existing Analyses
Vaiana et.al. Propose a Driving Safety Diagram based on longitudinal and
lateral accelerations analysis.
Aggressiveness Index formulation:
(A = Aggressive, S = Safe points)
1. Unsupervised Segmentation Based
on DP-Means Clustering
Problem: Bayesian nonparametric techniques require expensive sampling methods or
DP-means: proposed by Kulis et. al. revisiting k-means: K-means like objective function +
A new cluster is created whenever a point is farther than λ away from every already existing centroid.
Clustering results depends on data ordering.
Unsupervised Segmentation based on
Goal: identify latent structure given observed data points,
assuming existance of Gaussian hidden states.
Assign to each observed point the corresponding hidden state.
Hidden Markov Models (HMM):
Observation and hidden states
Unsupervised Segmentation based on
Baum-Welch EM algorithm to learn model parameters
Viterbi decoding to assign to each observed point the most
likely hidden state
Also a different variation applied: inertial HMM: lower transition
probabilities enforcing state persistence. Sensible for driving.
Topic Extraction Approach
What is topic extraction ?
Model topical concepts belonging to a set of textual documents.
Data are described as documents and the components are distributions of
terms that reflect recurring patterns, name Topics.
Hierarchical Dirichlet Processes (HDPs)
soft-clustering technique based on non-parametric Bayesian theory.
number of topics is not set a priori, but learned from data.
Posteriori probability approximated by Variational Inference algorithm by
Most relevant topics for each document and terms distribution in each topic.
Topic Extraction Process
Quantization – Binning Process
with static binning strategy
Solution 2: Moving from Points to
Can we cluster trips based on how observation points have
à Simple K-means clustering of trips for each approach.
à Comparison of overlap of the different clusters
Coherent with original question: grouping of trips (and thus
drivers) by driving behavior
Result of overlap analysis
K-means with K=6 clusters.
DP-means vs. HMM: 74% overlap
DP-means vs. Topic: 44%
HMM vs. Topic: 48%
Human Validation of Trip Groups
Experts (knowledgeable about driving styles and driving
paths recorded) identify possible groups of trips in the
- Unable to distinguish 6 categories of groups
- Only 3 categories are feasible
- Best matching 6à3 categories for each method
Three different clustering techniques of driving
behavior over trips
Clustering of trips based on behavior
-> up to 74% overlap over 6 clusters
-> 100% overlap over 3 clusters
-> 96% precision over 3 clusters
About collection process:
Gathering process including contextual information (road
risk, traffic status, weather conditions)
Larger dataset to improve inference performance
About implemented methods:
Smarter data ordering for DP-means
Relax independency assumption in HMM
Improvements in data discretization process for HDP
Marco Brambilla, @marcobrambi, email@example.com