Successfully reported this slideshow.
Your SlideShare is downloading. ×

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 48 Ad

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving

Download to read offline

Getting cars to drive autonomously is one of the most exciting problems these days. One of the key challenges is making them drive safely, which requires processing large amounts of data. In our talk we would like to focus on only one task of a self-driving car, namely road detection. Road detection is a software component which needs to be safe for being able to keep the car in the current lane. In order to track the progress of such a software component, a well-designed KPI (key performance indicators) evaluation pipeline is required. In this presentation we would like to show you how we incorporate Spark in our pipeline to deal with huge amounts of data and operate under strict scalability constraints for gathering relevant KPIs. Additionally, we would like to mention several lessons learned from using Spark in this environment.

Getting cars to drive autonomously is one of the most exciting problems these days. One of the key challenges is making them drive safely, which requires processing large amounts of data. In our talk we would like to focus on only one task of a self-driving car, namely road detection. Road detection is a software component which needs to be safe for being able to keep the car in the current lane. In order to track the progress of such a software component, a well-designed KPI (key performance indicators) evaluation pipeline is required. In this presentation we would like to show you how we incorporate Spark in our pipeline to deal with huge amounts of data and operate under strict scalability constraints for gathering relevant KPIs. Additionally, we would like to mention several lessons learned from using Spark in this environment.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Gheorghe Pucea, BMW Group Jennifer Reinelt, BMW Group Lessons Learned from Using Spark for Evaluating Road Detection @ BMW Autonomous Driving #UnifiedDataAnalytics #SparkAISummit
  3. 3. BMW AUTONOMOUS DRIVING 3
  4. 4. Outline 4 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  5. 5. BMW AUTONOMOUS DRIVING 5 Car Setup for Autonomous Driving
  6. 6. Outline 6 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  7. 7. Evaluation of Lane Detection 7 Real lane markings Detected lane markings At 1m? At 50m?At 100m? At 150m? How well does the car detect the lane markings?
  8. 8. How well does the car detect the lane markings? Key Performance Indicator (KPI) – Lateral Offset Evaluation of Lane Detection 8 commit 70d9c31 commit c271a01 commit 4e0bcd3 commit 6e3bcd3 150m Functional development time Lateraloffset improvement
  9. 9. Challenges: • Where are the real lane markings? How do we get the ground truth? • How do we avoid making the same mistakes as the car when looking for real lane markings? • How do we scale this ground truth generation? Evaluation of Lane Detection 9 Real lane markings Detected lane markings At 1m? At 50m?At 100m? At 150m?
  10. 10. How do we get the ground truth? • From manual labels Evaluation of Lane Detection 10 Very accurate Manual Slow Expensive to scale up Bad for Occlusions
  11. 11. How do we get the ground truth? • From additional sensors Evaluation of Lane Detection 11 Automated Fast Accurate Expensive to scale up
  12. 12. How do we get the ground truth? • Using sophisticated algorithms in the backend Evaluation of Lane Detection 12 Scalable Automated Fast Cheap Lower accuracy
  13. 13. Outline 13 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  14. 14. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI CalculationRos ConverterData Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Evaluation Pipeline 14 Data Collection InfluxDB
  15. 15. Outline 15 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  16. 16. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI CalculationRos ConverterData Ingestion Ground Truth Generation Other ApplicationsOther ApplicationsOther Applications 16 Data Collection InfluxDB AI Based Ground Truth
  17. 17. AI Based Ground Truth 17 3D Lidar points clouds Semantic Segmentation Lidar intensity in 2D bird‘s eye view Deep Neural Network Lane Marking No Lane Marking
  18. 18. Outline 18 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  19. 19. Motivation of Lessons Learned 19 Source: https://twitter.com/bigdataborat?lang=en
  20. 20. Motivation of Lessons Learned 20 Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  21. 21. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Lessons Learned – Spark Testing 21 Data Collection InfluxDB
  22. 22. Lessons Learned – Spark Testing Typical integration test 22
  23. 23. Lessons Learned – Spark Testing Drawback of static ORC‘s commited in the source code 23
  24. 24. Test data generation library Lessons Learned – Spark Testing 24 Type classes cats
  25. 25. Lessons Learned – Spark Testing Using test data generation library for integration tests 25 Cats FlatMap Type Class Scalacheck generators available with Type Classes
  26. 26. Lessons Learned – Spark Testing Sensor data streams as Scala ADT 26
  27. 27. Lessons Learned – Spark Testing Example Typeclass for generating Can Messages 27
  28. 28. Lessons Learned – Spark Testing Implemeting cats.FlatMap type class 28
  29. 29. Lessons Learned – Testing Advantages of using code instead of static Orc files • Compiler helps with breaking changes • Improves test understandability • Flexible manipulation of data using monadic operations 29
  30. 30. Lessons Learned – Catalyst Optimizations Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation 30 Data Collection InfluxDB RDD
  31. 31. Lessons Learned – Catalyst Optimizations Interested in testing the impact of RDD – Dataset – Dataframe conversion: • Test with 1 GB of Flexray data, ~ 20 runs/experiment • Count the data • Filter data by specific busId 31
  32. 32. Lessons Learned – Catalyst Optimizations Running count on ~1GB Flexray data 32 0 50 100 150 200 250 300 350 RDD Dataset Dataframe Processing time(s)
  33. 33. Lessons Learned – Catalyst Optimizations How about filtering by busId before counting? 33
  34. 34. Lessons Learned – Catalyst Optimizations How about filtering by busId before counting? 34 0 50 100 150 200 250 300 350 RDD Dataset Typed Dataset Untyped Dataframe Processing time(s)
  35. 35. Lessons Learned – Catalyst Optimizations Running „explain“ on Dataset yields: 35 Dataset Untyped API Dataset Typed API
  36. 36. Lessons Learned – Catalyst Optimizations Which version is applying push down filters? 36 a) left b) right c) both d) none
  37. 37. Lessons Learned – Catalyst Optimizations Which version is applying push down filters? 37 a) left b) right c) both d) none busIds: Array[Long] but busId is of type Int
  38. 38. Lessons Learned – Optimizations Catalyst optimizations • Types matter for push down filters • Conversion between Dataset Typed and Untyped API might hurt performance • Always check assumptions by looking at metrics/physical execution plan 38
  39. 39. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Lessons Learned – Spark Configuration 39 Data Collection InfluxDB > 1GB be available fast be sorted
  40. 40. Lessons Learned – Spark Configuration Adding the feature to rosbag converter of writing bags > 1GB Resulted in • Increased processing time • shuffle.FetchFailedException Spark UI showed • Lots of RACK_LOCAL tasks • Task are taking long 40
  41. 41. Lessons Learned – Spark Configuration Spark locality parameters 41
  42. 42. Lessons Learned – Spark Configuration Tuning Spark locality yields improved processing time 42 0 5 10 15 20 25 30 35 40 #RACK_LOC AL tasks Old config Optimized Spark locality 0 50 100 150 200 250 300 350 400 450 Processing time (s) Old config Optimized Spark locality 100% 20% ~140GB image data ~20 runs
  43. 43. Lessons Learned – Spark Configuration Tuning shuffling parameters, spark.reducer.maxReqInFlight 43 0 0.5 1 1.5 2 2.5 3 3.5 4 Failed Tasks Old config Optimized maxReqInFlight 40%
  44. 44. Lessons Learned – Configuration Writing controlled size files from Spark: • Pay attention to data locality • Writing controlled sized files is hard • Tuning Spark configuration properly yields surprising results 44
  45. 45. Summary 45 • KPIs on lane marking detection • DNN for lidar based lane detection • Tips for testing, configuring and optimizing Spark
  46. 46. Video 46 https://youtu.be/wNAmxL25Bhk
  47. 47. Thank you for listening! 47#UnifiedDataAnalytics #SparkAISummit
  48. 48. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×