Driving Behaviour as a Telematic Fingerprint

NASDAG.org
Data Science in the Automotive Industry

I am an Automotive Management Professional and a Computer
Science Engineer from France, with an extensive experience in managing
complex projects in Supply Chain and IT, as well as starting, developing
and acquiring businesses in France, Russia, USA and the Middle East.
I came to Metis to understand, learn and practice how data science is
transforming the Automotive Business. During my projects, I focused on:
● Sentiment Analysis / Topic Modeling
● Predictive Behavior Modeling
● Driver Telematics
Philippe Dagher

Objective:
Categorize drivers based on their behaviour on the roads - their driving style
and the type of roads that they follow.
Challenge:
Identify uniquely a driver (and hence his proper “driving behaviour”) based on
the GPS log of a mobile phone located inside the car.
Idea:
Experiment Topic Modeling techniques especially Latent Semantic
Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation (LDA) to explain the
observed trips by the unobserved behaviour of drivers.
Final Project @ Metis

Machine learning approach (1/2)
❖ Preprocess the data using statistical smoothing and compression algorithms
➢ Kalman Filtering
➢ Ramer–Douglas–Peucker
❖ Extract road and driving style features
➢ per Segment: Length, Slip Angle, Convexity, Radius
➢ per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses
❖ Bin the ouput and generate the Driving Alphabet
➢ ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc
❖ Build the Driving Vocabulary - “Driving Slides” per meter
➢ ex: d3L4v2n3y1
➢ for various preprocessing sensitivities or features combinations (langages)
❖ Translate trips from GPS log into documents
➢ Tokenize, filter, … data is ready!

d1L6Br1 d1L8Sr1 d1L5Sr2 d1L6Ur2 d2L8Ur2 d3L4Sr3 d2L5Ur3 d3L4Ur4 d3L6Sr4 d3L7Sr3 d4L4Ur5 d4L3Ur5 d4L2Ur7 d5L4Sr6 d3L3Ur5 d4L3Sr6 d5L4Ur6 d4L3Ur7 d5L9Sr5
d2L5Ur4 d3L2Ur7 d6L1Sr9 d5L0Sr9 d5L1Sr9 d5L7Ur5 d2L6Ur2 d2L3Ur5 d4L1Ur8 d5L2Ur7 d6L10Sr5 d6L8Sr5 d2L4Ur3 d3L3Ur6 d5L4Srp1 v2a6n0j0y0p1 v1a6n0j3y0p1
v1a1n0j6y0p1 v1a11n0j6y0p1 v1a7n0j11y0p1 v1a16n0j7y0p1 v2a7n0j1y0p1 v2a6n0j2y0p1 v2a10n0j2y0p1 v3a6n1j3y0p1 v3a2n2j3y0p1 v3a5n2j3y0p1 v4a2n2j3y1p1
v4a5n2j5y1p1 v4a5n3j5y1p1 v4a4n3j1y1p1 v4a6n3j6y1p1 v4a5n4j5y1p1 v4a4n3j6y1p1 v4a5n4j0y1p1 v4a5n3j6y1p1 v4a5n2j9y1p1 v4a11n3j7y1p1 v3a2n2j7y0p1 v3a12n2j7y0p1
v4a5n3j6y1p1 v4a4n2j6y1p1 v4a6n3j1y1p1 v3a5n2j2y0p1 v3a3n2j6y0p1 v3a11n1j4y0p1 v2a8n1j0y0p1 v2a7n1j7y0p1 v2a17n1j1y0p1 v2a10p1 v6a0n3j4y0p1 v6a6n3j7y0p1
v6a6n0j7y0p1 v6a6n0j3y0p1 v6a0n0j2y0p1 v6a5n0j6y0p1 v6a5n0j7y0p1 v6a6n0j4y0p1 v6a0n1j3y1j3y0p1 v6a6n1j6y0p1 v6a5n1j2y0p1 v7a1n1j4y0p1 v5a3n1j1y0p1
v4a6n2j7y0p1 v5a6n2j4y0p1 v4a5n2j0y0p1 v4a5n2j2y0p1 v4a9n2j2y0p1 v5a5n2j3y0p1 v5a9n3j1y0p1 v5a9n3j1y0p1 v5a7n1j2y0p1 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0
d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v5n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v3n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v2n0y0
d5L0v2n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1vy1 d5L7v4n4y1 d5L7v4n3y1
d5L2v5n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d d6L10v3n2y0 d6L10v4n2y0
d6L10v3n1y0 d6L10v3n1y0 d6L10v2n1y0 d6L10v2n1y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v1n0y0
d4L6v2n6y3 d4L6v2n4y2 d4L6v2n3y2 d2L6v1n12y11 d2L6v1n10y10 d1L1v1n0y0 d3L3v1n1y1 d3L3v1n1y0 d3L3v1n0y0 d3L3v1n0y0 d3L3v1n0y0 d2L8v0n3y6
Example of a translated trip

LDA: Bayesian Topic Model
Per trip
“Driving Behaviour”
proportions
for each trip select a distribution of
“Driving Behaviours”
Dirichlet
parameter
Corpus: possible “Driving
Behaviour” distributions
for trips
Per “Driving Slide”
“Driving Behaviour” assignment
for each “Driving Slide” select a “Driving Behaviour”
Observed
“Driving Slide”
select actual “Driving Slide”
from the slected “Driving
Behaviour”
“Driving Behaviours”
each “Driving Behaviour” is a
distribution of “Driving Slides”
“Driving Behaviour” hyperparameter
possible “Driving Slide” distributions
for “Driving Behaviours”

Posterior Inference in LDA
❖ Goal is to obtain this posterior:
➢ How much a trip contain of “Driving Behaviour” k( ) and
➢ “Driving Behaviour” “Driving Slides” assignements z
❖ Which means that I need to calculate:
❖ GENSIM Library
➢ a Python+NumPy implementation of online LDA for inputs larger than the available RAM

Example trip in the new LDA space

❖ 2736 drivers
❖ 200 trips/driver
Total : 547200 csv files (5.92 GB)
Challenge:
To come up with a "telematic fingerprint" capable of distinguishing when a trip
was driven by a given driver, knowing that among the 200 provided trips of
each driver, a few number of trips was not driven by him/her.
Submissions are judged on area under the ROC curve calculated in a global manner (all predictions
together).
Validation on a Kaggle Competition

❖ Transpose all trips into the new Driving Behaviours Space
❖ Take one by one each trip from a selected Driver
❖ Build a prediction model trained with all other trips in the dataset:
➢ Trues if they belong to the selected Driver
➢ Falses if they do not belong to this Driver
❖ Predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble
several predictions using various sensitivities to enhance the score...
For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each
time to a randomly selected limited number of False trips
Other outlier detection / clustering techniques appear to be less performing
Machine learning approach (2/2)

MongoDB to hold 3.3 MM documents generated
Parallel processing setup on 4 DigitalOcean Droplets with 8CPU each
Gensim Library which implements three methods:
❖ latent semantic indexing (LSI, or LSA - A for Analysis)
❖ latent Dirichlet Allocation (LDA)
❖ random projections (RP)
Also, it implements online versions of each technique.
Setting the infrastructure

Predicting
❖ Achieving an AUC of 0.9 on Kaggle without any ensembling technique
which confirms the robustness of my approach...

Driving Behaviour as a Telematic Fingerprint

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Driving Behaviour as a Telematic Fingerprint

Similar to Driving Behaviour as a Telematic Fingerprint (20)

Recently uploaded

Recently uploaded (20)

Driving Behaviour as a Telematic Fingerprint