Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Sergio Ballesteros, TomTom
Kia Eisinga, TomTom
Driver Location Intelligence at
Scale using Apache Spark, Delta
Lake and ML...
Ourvision
A safe, connected, autonomous world that is free of
congestion and emissions.
4
Bigdatadrivesour
business,but
dataprivacyalways
comesfirst
Data
• Anonymous location (GPS) traces
5
742.000.000kmevery day
18.000 x
6
7
Data
• Anonymous location (GPS) Traces
• Community inputs
• User events
• Journalistic data
• Car sensor data
8
Dataflow
9
~150 trillion
data points
~80 billion data
points per day
Dataflow
10
Dataflow
11
In dash systems are outperformed by smartphones
The embedded systemis expected to be up-to-date, with no user interaction....
Driversdonotupdatetheirmaps
Today’s solutions provide manual updates,
oftenwith a necessity to drive to the dealer.
This i...
14
OEMsrequire dataefficient
solutions
While drivers expect up-to-date system, the carmakers
are usually concerned about the ...
98% OF TRIPS ARE DRIVEN WITHIN150KM RADIUS99.8% OF TRIPS ARE DRIVEN WITHIN1000KM RADIUS
16
Whenradiusis0km
• User drives within 2 regions every week day
• Radius of 0 km.
• Download and install justhome regions
• ...
Whenradiusis150km
• User drives within 2 update regions every
week day
• Radius of 150 km.
• Home region: 6 update regions...
IQMapsdemowithMLflow
19
20
Realresultsusing0.5Mtrips
21
“This insight has led me to the conclusion
that a default radius of 150km is
unnecessary, and...
Goingonholidays
• User goes for his holiday (less frequent
updated region)
• Once user starts driving, updates for all
upd...
23
Destinationprediction
24
Opportunity
25
Past: Rule-based solution
Delta Lake pipelines
Present: Machine Learning
Data
26
Original trace data from 1 source
227K device serials
Filtering out invalid trips
143K device serials
Users with a...
Features
For each trip, we have the following information:
• Where did the trip start?
• At what speed were you driving wh...
Labels
• We are given the latitude and longitude of a destination
of a trip.
• In order to find out which latitude and lon...
29
Train,validationandtestsplit
Trip ID Date Destination
Trip 1 January1 Cluster1
Trip 2 January22 Cluster2
Trip 3 February3 ...
Rapidexperimentation
31
32
Majoritybaseline
Distribution of precision on the test set with a majority baseline classifier
33
Results
Distribution of precision on the test set with a tuned classifier
34
AcceleratingtheFutureofMobility
By embracing Apache Spark, Databricks and the Azure cloud
3535
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Upcoming SlideShare
Loading in …5
×

2

Share

Download to read offline

Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and MLflow on Databricks

Download to read offline

TomTom has the mission of creating a world free of congestion and better driving experience. In order to do that, we need to understand driving behavoiur from end users, at the same time that we optimize the operational costs of our services. However, due to the large scale of our probe data from vehicles providing insights and performing advanced analytics can can be quite challenging.

During this discussion I will showcase two use cases where Databricks, Delta Lake and MLflow has enabled us to accelerate innovation. The first one is the IQMaps usecase. IQMaps is a system designed specifically for in-dash systems – taking the same up-to-date user experience you expect from navigation apps and bringing it to reliable, in-car navigation. IQ Maps learn the drivers’ driving patterns and updates the map regions that are most relevant to the user, using Wi-Fi or 4G. However, optimizing the data network consumption, which can have a high cost, while keeping the best driving experience, by having the map updated, requires complex simulations using millions of locations traces from vehicles. Apache Spark has been our key instrument to find the best balance to this trade off. The second use case is Destination Prediction. For many years, we have offered a personalized feature on our navigation products that predicts with high accuracy the driver’s next destination. Nonetheless, with the exponential increase and availability of data, and the access to more sophisticated Machine Learning models, we have revisited this feature to take it to the next level. Both us ecases take advantage of the latest frameworks and tools available on Databricks. With MLflow and Delta we have been able to find the best models that predict the destination for each individual driver, and to track each one of the KPIs.

Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and MLflow on Databricks

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Sergio Ballesteros, TomTom Kia Eisinga, TomTom Driver Location Intelligence at Scale using Apache Spark, Delta Lake and MLflow on Databricks #UnifiedDataAnalytics #SparkAISummit
  3. 3. Ourvision A safe, connected, autonomous world that is free of congestion and emissions.
  4. 4. 4 Bigdatadrivesour business,but dataprivacyalways comesfirst
  5. 5. Data • Anonymous location (GPS) traces 5
  6. 6. 742.000.000kmevery day 18.000 x 6
  7. 7. 7
  8. 8. Data • Anonymous location (GPS) Traces • Community inputs • User events • Journalistic data • Car sensor data 8
  9. 9. Dataflow 9 ~150 trillion data points ~80 billion data points per day
  10. 10. Dataflow 10
  11. 11. Dataflow 11
  12. 12. In dash systems are outperformed by smartphones The embedded systemis expected to be up-to-date, with no user interaction. And the most visible component of it is a map. Usecase1:IQMapsanalytics 12
  13. 13. Driversdonotupdatetheirmaps Today’s solutions provide manual updates, oftenwith a necessity to drive to the dealer. This is way too complex and inefficient. 13
  14. 14. 14
  15. 15. OEMsrequire dataefficient solutions While drivers expect up-to-date system, the carmakers are usually concerned about the data cost required for the map management. 15
  16. 16. 98% OF TRIPS ARE DRIVEN WITHIN150KM RADIUS99.8% OF TRIPS ARE DRIVEN WITHIN1000KM RADIUS 16
  17. 17. Whenradiusis0km • User drives within 2 regions every week day • Radius of 0 km. • Download and install justhome regions • Cellular data usage kept to a minimum 17
  18. 18. Whenradiusis150km • User drives within 2 update regions every week day • Radius of 150 km. • Home region: 6 update regions. • Cellular data usage increased 18
  19. 19. IQMapsdemowithMLflow 19
  20. 20. 20
  21. 21. Realresultsusing0.5Mtrips 21 “This insight has led me to the conclusion that a default radius of 150km is unnecessary, and a small radius of ~10km would already satisfy mostdrivers while keeping cellular data usage low for OEMs.” - Rolf Dorland, PM at TomTom
  22. 22. Goingonholidays • User goes for his holiday (less frequent updated region) • Once user starts driving, updates for all update regions the route goes through are downloaded and installed. 22
  23. 23. 23 Destinationprediction
  24. 24. 24
  25. 25. Opportunity 25 Past: Rule-based solution Delta Lake pipelines Present: Machine Learning
  26. 26. Data 26 Original trace data from 1 source 227K device serials Filtering out invalid trips 143K device serials Users with at least 50 trips 3.6K device serials Devices feasible for modelling 2.5K device serials
  27. 27. Features For each trip, we have the following information: • Where did the trip start? • At what speed were you driving when the trip started? • What was the time of day (morning/afternoon/evening) when the trip started? • Was it rush hour when the trip started? • What day of the week was it? • Was it a weekend day? • What was the season? • Which driver profile do you belong to? Historical information: • Which destination did you go to your last trip? And the one before that? And the one before that? • If it is a, let's say Monday, where did you go to the last Monday you made a trip? (do this for every weekday) To predict: To which destination are you going? What do we use in the end? 27
  28. 28. Labels • We are given the latitude and longitude of a destination of a trip. • In order to find out which latitude and longitudes belong to the same destination, we apply a clustering algorithm called DBSCAN. • DBSCAN clusters together destinations that are within 500 meters from each other. We should have at least 5 trips to a destination in order to call it a cluster. How do we define where you are going? 28
  29. 29. 29
  30. 30. Train,validationandtestsplit Trip ID Date Destination Trip 1 January1 Cluster1 Trip 2 January22 Cluster2 Trip 3 February3 Cluster1 Trip 4 February15 Cluster2 Trip 5 March 2 Cluster1 Trip 6 March 14 Cluster1 Trip 7 March 27 Cluster2 Trip 8 April 4 Cluster1 Trip 9 April 16 Cluster2 Trip 10 May 8 Cluster1 Train & validation dataset Test dataset TIME-SERIES CROSS-VALIDATION Iterativeevaluation of the trips to avoid overfitting Trip ID Date Destination Trip 1 January1 Cluster1 Trip 2 January22 Cluster2 Trip 3 February3 ? Trip ID​ Date​ Destination Trip 1​ January1​ Cluster1 Trip 2​ January22​ Cluster2 Trip 3​ February3​ Cluster1 Trip 4​ February15 ? Data for 1 driver: Trip ID​ Date​ Destination Trip 1​ January1​ Cluster1 Trip 2​ January22​ Cluster2 Trip 3​ February3​ Cluster1 Trip 4​ February15 Cluster1 … … … Trip 10 May 8 ? 30
  31. 31. Rapidexperimentation 31
  32. 32. 32
  33. 33. Majoritybaseline Distribution of precision on the test set with a majority baseline classifier 33
  34. 34. Results Distribution of precision on the test set with a tuned classifier 34
  35. 35. AcceleratingtheFutureofMobility By embracing Apache Spark, Databricks and the Azure cloud 3535
  36. 36. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT
  • tushar_kale

    Nov. 12, 2019
  • kaidataLee

    Nov. 8, 2019

TomTom has the mission of creating a world free of congestion and better driving experience. In order to do that, we need to understand driving behavoiur from end users, at the same time that we optimize the operational costs of our services. However, due to the large scale of our probe data from vehicles providing insights and performing advanced analytics can can be quite challenging. During this discussion I will showcase two use cases where Databricks, Delta Lake and MLflow has enabled us to accelerate innovation. The first one is the IQMaps usecase. IQMaps is a system designed specifically for in-dash systems – taking the same up-to-date user experience you expect from navigation apps and bringing it to reliable, in-car navigation. IQ Maps learn the drivers’ driving patterns and updates the map regions that are most relevant to the user, using Wi-Fi or 4G. However, optimizing the data network consumption, which can have a high cost, while keeping the best driving experience, by having the map updated, requires complex simulations using millions of locations traces from vehicles. Apache Spark has been our key instrument to find the best balance to this trade off. The second use case is Destination Prediction. For many years, we have offered a personalized feature on our navigation products that predicts with high accuracy the driver’s next destination. Nonetheless, with the exponential increase and availability of data, and the access to more sophisticated Machine Learning models, we have revisited this feature to take it to the next level. Both us ecases take advantage of the latest frameworks and tools available on Databricks. With MLflow and Delta we have been able to find the best models that predict the destination for each individual driver, and to track each one of the KPIs.

Views

Total views

386

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

18

Shares

0

Comments

0

Likes

2

×