SlideShare a Scribd company logo
1 of 14
Download to read offline
NASDAG.org
Data Science in the Automotive Industry
I am an Automotive Management Professional and a Computer
Science Engineer from France, with an extensive experience in managing
complex projects in Supply Chain and IT, as well as starting, developing
and acquiring businesses in France, Russia, USA and the Middle East.
I came to Metis to understand, learn and practice how data science is
transforming the Automotive Business. During my projects, I focused on:
● Sentiment Analysis / Topic Modeling
● Predictive Behavior Modeling
● Driver Telematics
Philippe Dagher
Objective:
Categorize drivers based on their behaviour on the roads - their driving style
and the type of roads that they follow.
Challenge:
Identify uniquely a driver (and hence his proper “driving behaviour”) based on
the GPS log of a mobile phone located inside the car.
Idea:
Experiment Topic Modeling techniques especially Latent Semantic
Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation (LDA) to explain the
observed trips by the unobserved behaviour of drivers.
Final Project @ Metis
Raw data for one trip
Machine learning approach (1/2)
❖ Preprocess the data using statistical smoothing and compression algorithms
➢ Kalman Filtering
➢ Ramer–Douglas–Peucker
❖ Extract road and driving style features
➢ per Segment: Length, Slip Angle, Convexity, Radius
➢ per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses
❖ Bin the ouput and generate the Driving Alphabet
➢ ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc
❖ Build the Driving Vocabulary - “Driving Slides” per meter
➢ ex: d3L4v2n3y1
➢ for various preprocessing sensitivities or features combinations (langages)
❖ Translate trips from GPS log into documents
➢ Tokenize, filter, … data is ready!
d1L6Br1 d1L8Sr1 d1L5Sr2 d1L6Ur2 d2L8Ur2 d3L4Sr3 d2L5Ur3 d3L4Ur4 d3L6Sr4 d3L7Sr3 d4L4Ur5 d4L3Ur5 d4L2Ur7 d5L4Sr6 d3L3Ur5 d4L3Sr6 d5L4Ur6 d4L3Ur7 d5L9Sr5
d2L5Ur4 d3L2Ur7 d6L1Sr9 d5L0Sr9 d5L1Sr9 d5L7Ur5 d2L6Ur2 d2L3Ur5 d4L1Ur8 d5L2Ur7 d6L10Sr5 d6L8Sr5 d2L4Ur3 d3L3Ur6 d5L4Srp1 v2a6n0j0y0p1 v1a6n0j3y0p1
v1a1n0j6y0p1 v1a11n0j6y0p1 v1a7n0j11y0p1 v1a16n0j7y0p1 v2a7n0j1y0p1 v2a6n0j2y0p1 v2a10n0j2y0p1 v3a6n1j3y0p1 v3a2n2j3y0p1 v3a5n2j3y0p1 v4a2n2j3y1p1
v4a5n2j5y1p1 v4a5n3j5y1p1 v4a4n3j1y1p1 v4a6n3j6y1p1 v4a5n4j5y1p1 v4a4n3j6y1p1 v4a5n4j0y1p1 v4a5n3j6y1p1 v4a5n2j9y1p1 v4a11n3j7y1p1 v3a2n2j7y0p1 v3a12n2j7y0p1
v2a1n1j3y0p1 v2a5n1j9y0p1 v2a11n1j9y0p1 v3a6n1j7y0p1 v3a5n1j7y0p1 v3a6n2j6y0p1 v3a6n1j34y0p1 v3a62n2j71y0p1 v8a56n11j38y2p1 v4a13n3j7y1p1 v4a4n3j4y1p1
v4a5n3j6y1p1 v4a4n2j6y1p1 v4a6n3j1y1p1 v3a5n2j2y0p1 v3a3n2j6y0p1 v3a11n1j4y0p1 v2a8n1j0y0p1 v2a7n1j7y0p1 v2a17n1j1y0p1 v2a10p1 v6a0n3j4y0p1 v6a6n3j7y0p1
v6a6n3j3y0p1 v6a1n3j3y0p1 v6a6n3j3y0p1 v6a5n2j1y0p1 v5a6n2j4y0p1 v5a6n2j3y0p1 v5a12n1j2y0p1 v4a9n1j0y0p1 v3a9n1j2y0p1 v3a5n0j3y0p1 v3a1n0j6y0p1 v3a11n0j6y0p1
v3a0n1j3y0p1 v3a6n1j0y0p1 v3a5n1j3y0p1 v3a11n0j6y0p1 v4a1n0j4y0p1 v4a6n0j3y0p1 v4a2n0j7y0p1 v4a13n0j11y0p1 v5a7n0j4y0p1 v5a1n0j0y0p1 v5a1n0j3y0p1
v5a6n0j6y0p1 v5a6n0j2y0p1 v5a2n0j7y0p1 v6a11n0j10y0p1 v6a6n0j3y0p1 v6a0n0j3y0p1 v6a5n0j6y0p1 v6a5n0j2y0p1 v6a1n0j1y0p1 v6a0n0j3y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1
v6a6n0j7y0p1 v6a6n0j3y0p1 v6a0n0j2y0p1 v6a5n0j6y0p1 v6a5n0j7y0p1 v6a6n0j4y0p1 v6a0n1j3y1j3y0p1 v6a6n1j6y0p1 v6a5n1j2y0p1 v7a1n1j4y0p1 v5a3n1j1y0p1
v5a6n1j3y0p1 v5a10n1j3y0p1 v4a8n0j0y0p1 v3a8n0j0y0p1 v3a8n0j3y0p1 v2a10n0j1y0p1 v2a7n0j3y0p1 v2a6n0j7y0p1 v3a7n0j3y0p1 v2a7n0j6y0p1 v3a14n0j7y0p1
v3a4n0j4y0p1 v3a2n0j6y0p1 v3a12n0j3y0p1 v3a8n0j2y0p1 v3a5n0j0y0p1 v3a6n0j4y0p1 v4a1n0j3y0p1 v4a5n0j2y0p1 v4a1n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p2
v4a1n0j3y0p1 v4a6n0j7y0p1 v4a6n0j10y0p1 v4a11n0j6y0p1 v3a2n0j0y0p1 v3a1n0j3y0p1 v3a6n0j0y0p1 v3a6n0j0y0p1 v2a5n0j2y0p1 v2a3n0j5y0p1 v2a10n0j5y0p1
v1a2n0j0y0p1 v1a1n0j3y0p1 v1a5n0j10y0p1 v1a11n0j7y0p1 v1a3n0j7y0p1 v1a12n0j7y0p1 v2a3n0j1y0p1 v2a1n0j6y0p1 v2a11n0j10y0p1 v3a6n0j10y0p1 v3a12n0j7y0p1
v4a1n0j3y0p1 v4a5n0j10y0p1 v3a11n0j6y0p1 v4a2n0j3y0p1 v4a6n0j3y0p1 v5a0n0j7y0p1 v5a12n0j8y0p1 v5a4n0j4y0p1 v5a2n3j3y0p1 v5a3n3j4y0p1 v5a6n3j7y0p1
v5a6n3j5y0p1 v5a4n3j2y0p1 v5a1n3j3y0p1 v5a6n3j2y0p1 v5a1n2j4y0p1 v5a6n2j3y0p1 v5a2n3j4y0p1 v5a6n3j2y0p1 v5a6n2j3y0p1 v4a0n2j1y0p1 v4a2n2j1y0p1 v4a0n2j4y0p1
v4a6n2j7y0p1 v5a6n2j4y0p1 v4a5n2j0y0p1 v4a5n2j2y0p1 v4a9n2j2y0p1 v5a5n2j3y0p1 v5a9n3j1y0p1 v5a9n3j1y0p1 v5a7n1j2y0p1 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0
d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v5n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v3n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v2n0y0
d5L0v2n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1vy1 d5L7v4n4y1 d5L7v4n3y1
d5L7v0n0y0 d5L7v0n0y0 d5L7v0n0y0 d5L7v1n0y0 d2L6v1n6y5 d2L6v2n8y6 d2L3v2n0y0 d2L3v2n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v4n0y0 d4L1v4n0y0
d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0
d5L2v5n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d d6L10v3n2y0 d6L10v4n2y0
d6L10v3n1y0 d6L10v3n1y0 d6L10v2n1y0 d6L10v2n1y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v1n0y0
d6L8v1n0y0 d6L8v1n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v3n1y0 d6L8v3n2y0 d6L8v3n2y0 d6L8v4n2y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n3y1
d6L8v4n4y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v3n2y0 d6L8v3n2y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v3n1y0 d2L5v1n3y2
d2L5v1n2y2 d3L5v1n2y1 d3L5v2n3y2 d3L5v2n4y2 d3L5v2n6y3 d3L5v2n2y1 d3L5v2n2y1 d3L5v3n4y2 d4L6v2n5y3 d4L6v2n6y3 d4L6v3n8y3 d4L6v3n7y3 d4L6v3n7y3
d4L6v2n6y3 d4L6v2n4y2 d4L6v2n3y2 d2L6v1n12y11 d2L6v1n10y10 d1L1v1n0y0 d3L3v1n1y1 d3L3v1n1y0 d3L3v1n0y0 d3L3v1n0y0 d3L3v1n0y0 d2L8v0n3y6
Example of a translated trip
LDA: Bayesian Topic Model
Per trip
“Driving Behaviour”
proportions
for each trip select a distribution of
“Driving Behaviours”
Dirichlet
parameter
Corpus: possible “Driving
Behaviour” distributions
for trips
Per “Driving Slide”
“Driving Behaviour” assignment
for each “Driving Slide” select a “Driving Behaviour”
Observed
“Driving Slide”
select actual “Driving Slide”
from the slected “Driving
Behaviour”
“Driving Behaviours”
each “Driving Behaviour” is a
distribution of “Driving Slides”
“Driving Behaviour” hyperparameter
possible “Driving Slide” distributions
for “Driving Behaviours”
Posterior Inference in LDA
❖ Goal is to obtain this posterior:
➢ How much a trip contain of “Driving Behaviour” k( ) and
➢ “Driving Behaviour” “Driving Slides” assignements z
❖ Which means that I need to calculate:
❖ GENSIM Library
➢ a Python+NumPy implementation of online LDA for inputs larger than the available RAM
Example trip in the new LDA space
❖ 2736 drivers
❖ 200 trips/driver
Total : 547200 csv files (5.92 GB)
Challenge:
To come up with a "telematic fingerprint" capable of distinguishing when a trip
was driven by a given driver, knowing that among the 200 provided trips of
each driver, a few number of trips was not driven by him/her.
Submissions are judged on area under the ROC curve calculated in a global manner (all predictions
together).
Validation on a Kaggle Competition
❖ Transpose all trips into the new Driving Behaviours Space
❖ Take one by one each trip from a selected Driver
❖ Build a prediction model trained with all other trips in the dataset:
➢ Trues if they belong to the selected Driver
➢ Falses if they do not belong to this Driver
❖ Predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble
several predictions using various sensitivities to enhance the score...
For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each
time to a randomly selected limited number of False trips
Other outlier detection / clustering techniques appear to be less performing
Machine learning approach (2/2)
MongoDB to hold 3.3 MM documents generated
Parallel processing setup on 4 DigitalOcean Droplets with 8CPU each
Gensim Library which implements three methods:
❖ latent semantic indexing (LSI, or LSA - A for Analysis)
❖ latent Dirichlet Allocation (LDA)
❖ random projections (RP)
Also, it implements online versions of each technique.
Setting the infrastructure
Predicting
❖ Achieving an AUC of 0.9 on Kaggle without any ensembling technique
which confirms the robustness of my approach...
Thank you
http://nasdag.org

More Related Content

Viewers also liked

Maison Fleuries Agen 2015
Maison Fleuries Agen 2015Maison Fleuries Agen 2015
Maison Fleuries Agen 2015villeagen
 
Segment 7
Segment 7Segment 7
Segment 7slhanna
 
Powerpoint software
Powerpoint softwarePowerpoint software
Powerpoint softwareFolguera94
 
Definición Integral Educación
Definición Integral EducaciónDefinición Integral Educación
Definición Integral EducaciónEduardo Mera
 
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Luis Fernando Tascón Montes
 
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...Luis Fernando Tascón Montes
 
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994FactaMedia
 
Gestion de proyectos en la empresa con software open source slideshare
Gestion de proyectos en la empresa con software open source   slideshareGestion de proyectos en la empresa con software open source   slideshare
Gestion de proyectos en la empresa con software open source slideshareFENA Business School
 
Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)FactaMedia
 
Presentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaPresentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaJDdos
 
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996FactaMedia
 
Z la liberté contractuelle 546 pages
      Z la  liberté contractuelle 546 pages      Z la  liberté contractuelle 546 pages
Z la liberté contractuelle 546 pagesRabah HELAL
 
Navegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoNavegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoyamilethe
 

Viewers also liked (20)

Maison Fleuries Agen 2015
Maison Fleuries Agen 2015Maison Fleuries Agen 2015
Maison Fleuries Agen 2015
 
Segment 7
Segment 7Segment 7
Segment 7
 
Documents
DocumentsDocuments
Documents
 
Powerpoint software
Powerpoint softwarePowerpoint software
Powerpoint software
 
Proyeto uyama
Proyeto uyamaProyeto uyama
Proyeto uyama
 
Tarea 4
Tarea 4Tarea 4
Tarea 4
 
Definición Integral Educación
Definición Integral EducaciónDefinición Integral Educación
Definición Integral Educación
 
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
 
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
 
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
 
Gestion de proyectos en la empresa con software open source slideshare
Gestion de proyectos en la empresa con software open source   slideshareGestion de proyectos en la empresa con software open source   slideshare
Gestion de proyectos en la empresa con software open source slideshare
 
Miniquest
MiniquestMiniquest
Miniquest
 
Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)
 
Bloque pacie
Bloque pacieBloque pacie
Bloque pacie
 
.l. .l.
.l. .l..l. .l.
.l. .l.
 
juego de primaria
juego de  primariajuego de  primaria
juego de primaria
 
Presentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaPresentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casa
 
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
 
Z la liberté contractuelle 546 pages
      Z la  liberté contractuelle 546 pages      Z la  liberté contractuelle 546 pages
Z la liberté contractuelle 546 pages
 
Navegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoNavegadores y Correo Electr{onico
Navegadores y Correo Electr{onico
 

Similar to Driving Behaviour as a Telematic Fingerprint

Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey businessRudy Stricklan
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Alexey Grigorev
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzallLee David
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Alexey Grigorev
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
Weflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex Team
 
ANPR based Security System Using ALR
ANPR based Security System Using ALRANPR based Security System Using ALR
ANPR based Security System Using ALRAshok Basnet
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"LogeekNightUkraine
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
 
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked .
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesLeonardo Di Donato
 
Sad 07 drawing dfd supp
Sad 07 drawing dfd suppSad 07 drawing dfd supp
Sad 07 drawing dfd suppmentorrbuddy
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deploymentsOdoo
 
Creative Data Analysis with Python
Creative Data Analysis with PythonCreative Data Analysis with Python
Creative Data Analysis with PythonGrant Paton-Simpson
 
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleDas QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleLeipziger Semantic Web Tag
 

Similar to Driving Behaviour as a Telematic Fingerprint (20)

Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey business
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzall
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
Weflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logistic
 
ANPR based Security System Using ALR
ANPR based Security System Using ALRANPR based Security System Using ALR
ANPR based Security System Using ALR
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
 
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
Sad 07 drawing dfd supp
Sad 07 drawing dfd suppSad 07 drawing dfd supp
Sad 07 drawing dfd supp
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deployments
 
Creative Data Analysis with Python
Creative Data Analysis with PythonCreative Data Analysis with Python
Creative Data Analysis with Python
 
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleDas QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
 

Recently uploaded

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 

Recently uploaded (20)

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 

Driving Behaviour as a Telematic Fingerprint

  • 1. NASDAG.org Data Science in the Automotive Industry
  • 2. I am an Automotive Management Professional and a Computer Science Engineer from France, with an extensive experience in managing complex projects in Supply Chain and IT, as well as starting, developing and acquiring businesses in France, Russia, USA and the Middle East. I came to Metis to understand, learn and practice how data science is transforming the Automotive Business. During my projects, I focused on: ● Sentiment Analysis / Topic Modeling ● Predictive Behavior Modeling ● Driver Telematics Philippe Dagher
  • 3. Objective: Categorize drivers based on their behaviour on the roads - their driving style and the type of roads that they follow. Challenge: Identify uniquely a driver (and hence his proper “driving behaviour”) based on the GPS log of a mobile phone located inside the car. Idea: Experiment Topic Modeling techniques especially Latent Semantic Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation (LDA) to explain the observed trips by the unobserved behaviour of drivers. Final Project @ Metis
  • 4. Raw data for one trip
  • 5. Machine learning approach (1/2) ❖ Preprocess the data using statistical smoothing and compression algorithms ➢ Kalman Filtering ➢ Ramer–Douglas–Peucker ❖ Extract road and driving style features ➢ per Segment: Length, Slip Angle, Convexity, Radius ➢ per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses ❖ Bin the ouput and generate the Driving Alphabet ➢ ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc ❖ Build the Driving Vocabulary - “Driving Slides” per meter ➢ ex: d3L4v2n3y1 ➢ for various preprocessing sensitivities or features combinations (langages) ❖ Translate trips from GPS log into documents ➢ Tokenize, filter, … data is ready!
  • 6. d1L6Br1 d1L8Sr1 d1L5Sr2 d1L6Ur2 d2L8Ur2 d3L4Sr3 d2L5Ur3 d3L4Ur4 d3L6Sr4 d3L7Sr3 d4L4Ur5 d4L3Ur5 d4L2Ur7 d5L4Sr6 d3L3Ur5 d4L3Sr6 d5L4Ur6 d4L3Ur7 d5L9Sr5 d2L5Ur4 d3L2Ur7 d6L1Sr9 d5L0Sr9 d5L1Sr9 d5L7Ur5 d2L6Ur2 d2L3Ur5 d4L1Ur8 d5L2Ur7 d6L10Sr5 d6L8Sr5 d2L4Ur3 d3L3Ur6 d5L4Srp1 v2a6n0j0y0p1 v1a6n0j3y0p1 v1a1n0j6y0p1 v1a11n0j6y0p1 v1a7n0j11y0p1 v1a16n0j7y0p1 v2a7n0j1y0p1 v2a6n0j2y0p1 v2a10n0j2y0p1 v3a6n1j3y0p1 v3a2n2j3y0p1 v3a5n2j3y0p1 v4a2n2j3y1p1 v4a5n2j5y1p1 v4a5n3j5y1p1 v4a4n3j1y1p1 v4a6n3j6y1p1 v4a5n4j5y1p1 v4a4n3j6y1p1 v4a5n4j0y1p1 v4a5n3j6y1p1 v4a5n2j9y1p1 v4a11n3j7y1p1 v3a2n2j7y0p1 v3a12n2j7y0p1 v2a1n1j3y0p1 v2a5n1j9y0p1 v2a11n1j9y0p1 v3a6n1j7y0p1 v3a5n1j7y0p1 v3a6n2j6y0p1 v3a6n1j34y0p1 v3a62n2j71y0p1 v8a56n11j38y2p1 v4a13n3j7y1p1 v4a4n3j4y1p1 v4a5n3j6y1p1 v4a4n2j6y1p1 v4a6n3j1y1p1 v3a5n2j2y0p1 v3a3n2j6y0p1 v3a11n1j4y0p1 v2a8n1j0y0p1 v2a7n1j7y0p1 v2a17n1j1y0p1 v2a10p1 v6a0n3j4y0p1 v6a6n3j7y0p1 v6a6n3j3y0p1 v6a1n3j3y0p1 v6a6n3j3y0p1 v6a5n2j1y0p1 v5a6n2j4y0p1 v5a6n2j3y0p1 v5a12n1j2y0p1 v4a9n1j0y0p1 v3a9n1j2y0p1 v3a5n0j3y0p1 v3a1n0j6y0p1 v3a11n0j6y0p1 v3a0n1j3y0p1 v3a6n1j0y0p1 v3a5n1j3y0p1 v3a11n0j6y0p1 v4a1n0j4y0p1 v4a6n0j3y0p1 v4a2n0j7y0p1 v4a13n0j11y0p1 v5a7n0j4y0p1 v5a1n0j0y0p1 v5a1n0j3y0p1 v5a6n0j6y0p1 v5a6n0j2y0p1 v5a2n0j7y0p1 v6a11n0j10y0p1 v6a6n0j3y0p1 v6a0n0j3y0p1 v6a5n0j6y0p1 v6a5n0j2y0p1 v6a1n0j1y0p1 v6a0n0j3y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1 v6a6n0j3y0p1 v6a0n0j2y0p1 v6a5n0j6y0p1 v6a5n0j7y0p1 v6a6n0j4y0p1 v6a0n1j3y1j3y0p1 v6a6n1j6y0p1 v6a5n1j2y0p1 v7a1n1j4y0p1 v5a3n1j1y0p1 v5a6n1j3y0p1 v5a10n1j3y0p1 v4a8n0j0y0p1 v3a8n0j0y0p1 v3a8n0j3y0p1 v2a10n0j1y0p1 v2a7n0j3y0p1 v2a6n0j7y0p1 v3a7n0j3y0p1 v2a7n0j6y0p1 v3a14n0j7y0p1 v3a4n0j4y0p1 v3a2n0j6y0p1 v3a12n0j3y0p1 v3a8n0j2y0p1 v3a5n0j0y0p1 v3a6n0j4y0p1 v4a1n0j3y0p1 v4a5n0j2y0p1 v4a1n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p2 v4a1n0j3y0p1 v4a6n0j7y0p1 v4a6n0j10y0p1 v4a11n0j6y0p1 v3a2n0j0y0p1 v3a1n0j3y0p1 v3a6n0j0y0p1 v3a6n0j0y0p1 v2a5n0j2y0p1 v2a3n0j5y0p1 v2a10n0j5y0p1 v1a2n0j0y0p1 v1a1n0j3y0p1 v1a5n0j10y0p1 v1a11n0j7y0p1 v1a3n0j7y0p1 v1a12n0j7y0p1 v2a3n0j1y0p1 v2a1n0j6y0p1 v2a11n0j10y0p1 v3a6n0j10y0p1 v3a12n0j7y0p1 v4a1n0j3y0p1 v4a5n0j10y0p1 v3a11n0j6y0p1 v4a2n0j3y0p1 v4a6n0j3y0p1 v5a0n0j7y0p1 v5a12n0j8y0p1 v5a4n0j4y0p1 v5a2n3j3y0p1 v5a3n3j4y0p1 v5a6n3j7y0p1 v5a6n3j5y0p1 v5a4n3j2y0p1 v5a1n3j3y0p1 v5a6n3j2y0p1 v5a1n2j4y0p1 v5a6n2j3y0p1 v5a2n3j4y0p1 v5a6n3j2y0p1 v5a6n2j3y0p1 v4a0n2j1y0p1 v4a2n2j1y0p1 v4a0n2j4y0p1 v4a6n2j7y0p1 v5a6n2j4y0p1 v4a5n2j0y0p1 v4a5n2j2y0p1 v4a9n2j2y0p1 v5a5n2j3y0p1 v5a9n3j1y0p1 v5a9n3j1y0p1 v5a7n1j2y0p1 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v5n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v3n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1vy1 d5L7v4n4y1 d5L7v4n3y1 d5L7v0n0y0 d5L7v0n0y0 d5L7v0n0y0 d5L7v1n0y0 d2L6v1n6y5 d2L6v2n8y6 d2L3v2n0y0 d2L3v2n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d d6L10v3n2y0 d6L10v4n2y0 d6L10v3n1y0 d6L10v3n1y0 d6L10v2n1y0 d6L10v2n1y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L8v1n0y0 d6L8v1n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v3n1y0 d6L8v3n2y0 d6L8v3n2y0 d6L8v4n2y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v3n2y0 d6L8v3n2y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v3n1y0 d2L5v1n3y2 d2L5v1n2y2 d3L5v1n2y1 d3L5v2n3y2 d3L5v2n4y2 d3L5v2n6y3 d3L5v2n2y1 d3L5v2n2y1 d3L5v3n4y2 d4L6v2n5y3 d4L6v2n6y3 d4L6v3n8y3 d4L6v3n7y3 d4L6v3n7y3 d4L6v2n6y3 d4L6v2n4y2 d4L6v2n3y2 d2L6v1n12y11 d2L6v1n10y10 d1L1v1n0y0 d3L3v1n1y1 d3L3v1n1y0 d3L3v1n0y0 d3L3v1n0y0 d3L3v1n0y0 d2L8v0n3y6 Example of a translated trip
  • 7. LDA: Bayesian Topic Model Per trip “Driving Behaviour” proportions for each trip select a distribution of “Driving Behaviours” Dirichlet parameter Corpus: possible “Driving Behaviour” distributions for trips Per “Driving Slide” “Driving Behaviour” assignment for each “Driving Slide” select a “Driving Behaviour” Observed “Driving Slide” select actual “Driving Slide” from the slected “Driving Behaviour” “Driving Behaviours” each “Driving Behaviour” is a distribution of “Driving Slides” “Driving Behaviour” hyperparameter possible “Driving Slide” distributions for “Driving Behaviours”
  • 8. Posterior Inference in LDA ❖ Goal is to obtain this posterior: ➢ How much a trip contain of “Driving Behaviour” k( ) and ➢ “Driving Behaviour” “Driving Slides” assignements z ❖ Which means that I need to calculate: ❖ GENSIM Library ➢ a Python+NumPy implementation of online LDA for inputs larger than the available RAM
  • 9. Example trip in the new LDA space
  • 10. ❖ 2736 drivers ❖ 200 trips/driver Total : 547200 csv files (5.92 GB) Challenge: To come up with a "telematic fingerprint" capable of distinguishing when a trip was driven by a given driver, knowing that among the 200 provided trips of each driver, a few number of trips was not driven by him/her. Submissions are judged on area under the ROC curve calculated in a global manner (all predictions together). Validation on a Kaggle Competition
  • 11. ❖ Transpose all trips into the new Driving Behaviours Space ❖ Take one by one each trip from a selected Driver ❖ Build a prediction model trained with all other trips in the dataset: ➢ Trues if they belong to the selected Driver ➢ Falses if they do not belong to this Driver ❖ Predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble several predictions using various sensitivities to enhance the score... For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each time to a randomly selected limited number of False trips Other outlier detection / clustering techniques appear to be less performing Machine learning approach (2/2)
  • 12. MongoDB to hold 3.3 MM documents generated Parallel processing setup on 4 DigitalOcean Droplets with 8CPU each Gensim Library which implements three methods: ❖ latent semantic indexing (LSI, or LSA - A for Analysis) ❖ latent Dirichlet Allocation (LDA) ❖ random projections (RP) Also, it implements online versions of each technique. Setting the infrastructure
  • 13. Predicting ❖ Achieving an AUC of 0.9 on Kaggle without any ensembling technique which confirms the robustness of my approach...