SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Gheorghe Pucea, BMW Group
Jennifer Reinelt, BMW Group
Lessons Learned from Using Spark for
Evaluating Road Detection
@ BMW Autonomous Driving
#UnifiedDataAnalytics #SparkAISummit
BMW AUTONOMOUS DRIVING
3
Outline
4
• Evaluation of Lane Detection
• Evaluation Pipeline
• AI Based Ground Truth
• Lessons Learned
BMW AUTONOMOUS DRIVING
5
Car Setup for
Autonomous Driving
Outline
6
• Evaluation of Lane Detection
• Evaluation Pipeline
• AI Based Ground Truth
• Lessons Learned
Evaluation of Lane Detection
7
Real lane markings
Detected lane markings
At 1m?
At 50m?At 100m?
At 150m?
How well does the car detect the lane markings?
How well does the car detect the lane markings?
Key Performance Indicator (KPI) – Lateral Offset
Evaluation of Lane Detection
8
commit
70d9c31
commit
c271a01
commit
4e0bcd3
commit
6e3bcd3
150m
Functional development time
Lateraloffset
improvement
Challenges:
• Where are the real lane markings? How do
we get the ground truth?
• How do we avoid making the same mistakes
as the car when looking for real lane
markings?
• How do we scale this ground truth
generation?
Evaluation of Lane Detection
9
Real lane markings
Detected lane markings
At 1m?
At 50m?At 100m?
At 150m?
How do we get the ground truth?
• From manual labels
Evaluation of Lane Detection
10
Very accurate Manual
Slow
Expensive to
scale up
Bad for
Occlusions
How do we get the ground truth?
• From additional sensors
Evaluation of Lane Detection
11
Automated
Fast
Accurate
Expensive to
scale up
How do we get the ground truth?
• Using sophisticated algorithms in the backend
Evaluation of Lane Detection
12
Scalable
Automated
Fast
Cheap
Lower
accuracy
Outline
13
• Evaluation of Lane Detection
• Evaluation Pipeline
• AI Based Ground Truth
• Lessons Learned
Ros
bag
orc
Datacenter:
> 230 PB capacity and > 1.500 TB raw data/day
> 100.000 Cores and >200 GPUs
Reprocessing KPI CalculationRos ConverterData Ingestion
Ground Truth
Generation
Other
Applications
Other
Applications
Other
Applications
Evaluation Pipeline
14
Data
Collection
InfluxDB
Outline
15
• Evaluation of Lane Detection
• Evaluation Pipeline
• AI Based Ground Truth
• Lessons Learned
Ros
bag
orc
Datacenter:
> 230 PB capacity and > 1.500 TB raw data/day
> 100.000 Cores and >200 GPUs
Reprocessing KPI CalculationRos ConverterData Ingestion
Ground Truth
Generation
Other
ApplicationsOther
ApplicationsOther
Applications
16
Data
Collection
InfluxDB
AI Based Ground Truth
AI Based Ground Truth
17
3D Lidar points clouds Semantic
Segmentation
Lidar intensity in
2D bird‘s eye view
Deep Neural
Network
Lane Marking
No Lane Marking
Outline
18
• Evaluation of Lane Detection
• Evaluation Pipeline
• AI Based Ground Truth
• Lessons Learned
Motivation of Lessons Learned
19
Source: https://twitter.com/bigdataborat?lang=en
Motivation of Lessons Learned
20
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Ros
bag
orc
Datacenter:
> 230 PB capacity and > 1.500 TB raw data/day
> 100.000 Cores and >200 GPUs
Reprocessing KPI Calculation
Ros
Converter
Data
Ingestion
Ground Truth
Generation
Other
Applications
Other
Applications
Other
Applications
Lessons Learned – Spark Testing
21
Data
Collection
InfluxDB
Lessons Learned – Spark Testing
Typical integration test
22
Lessons Learned – Spark Testing
Drawback of static ORC‘s commited in the source code
23
Test data
generation
library
Lessons Learned – Spark Testing
24
Type classes
cats
Lessons Learned – Spark Testing
Using test data generation library for integration tests
25
Cats FlatMap Type Class
Scalacheck generators available with Type Classes
Lessons Learned – Spark Testing
Sensor data streams as Scala ADT
26
Lessons Learned – Spark Testing
Example Typeclass for generating Can Messages
27
Lessons Learned – Spark Testing
Implemeting cats.FlatMap type class
28
Lessons Learned – Testing
Advantages of using code instead of static Orc files
• Compiler helps with breaking changes
• Improves test understandability
• Flexible manipulation of data using monadic operations
29
Lessons Learned – Catalyst Optimizations
Ros
bag
orc
Datacenter:
> 230 PB capacity and > 1.500 TB raw data/day
> 100.000 Cores and >200 GPUs
Reprocessing KPI Calculation
Ros
Converter
Data
Ingestion
Ground Truth
Generation
30
Data
Collection
InfluxDB
RDD
Lessons Learned – Catalyst Optimizations
Interested in testing the impact of RDD – Dataset – Dataframe conversion:
• Test with 1 GB of Flexray data, ~ 20 runs/experiment
• Count the data
• Filter data by specific busId
31
Lessons Learned – Catalyst Optimizations
Running count on ~1GB Flexray data
32
0 50 100 150 200 250 300 350
RDD
Dataset
Dataframe
Processing time(s)
Lessons Learned – Catalyst Optimizations
How about filtering by busId before counting?
33
Lessons Learned – Catalyst Optimizations
How about filtering by busId before counting?
34
0 50 100 150 200 250 300 350
RDD
Dataset Typed
Dataset Untyped
Dataframe
Processing time(s)
Lessons Learned – Catalyst Optimizations
Running „explain“ on Dataset yields:
35
Dataset Untyped API Dataset Typed API
Lessons Learned – Catalyst Optimizations
Which version is applying push down filters?
36
a) left
b) right
c) both
d) none
Lessons Learned – Catalyst Optimizations
Which version is applying push down filters?
37
a) left
b) right
c) both
d) none
busIds: Array[Long] but
busId is of type Int
Lessons Learned – Optimizations
Catalyst optimizations
• Types matter for push down filters
• Conversion between Dataset Typed and Untyped API might
hurt performance
• Always check assumptions by looking at metrics/physical
execution plan
38
Ros
bag
orc
Datacenter:
> 230 PB capacity and > 1.500 TB raw data/day
> 100.000 Cores and >200 GPUs
Reprocessing KPI Calculation
Ros
Converter
Data Ingestion
Ground Truth
Generation
Other
Applications
Other
Applications
Other
Applications
Lessons Learned – Spark Configuration
39
Data
Collection
InfluxDB
> 1GB
be available fast
be sorted
Lessons Learned – Spark Configuration
Adding the feature to rosbag converter of writing bags > 1GB
Resulted in
• Increased processing time
• shuffle.FetchFailedException
Spark UI showed
• Lots of RACK_LOCAL tasks
• Task are taking long
40
Lessons Learned – Spark Configuration
Spark locality parameters
41
Lessons Learned – Spark Configuration
Tuning Spark locality yields improved processing time
42
0
5
10
15
20
25
30
35
40
#RACK_LOC AL tasks
Old config
Optimized Spark locality
0
50
100
150
200
250
300
350
400
450
Processing time (s)
Old config
Optimized Spark locality
100%
20%
~140GB image data
~20 runs
Lessons Learned – Spark Configuration
Tuning shuffling parameters, spark.reducer.maxReqInFlight
43
0
0.5
1
1.5
2
2.5
3
3.5
4
Failed Tasks
Old config
Optimized maxReqInFlight
40%
Lessons Learned – Configuration
Writing controlled size files from Spark:
• Pay attention to data locality
• Writing controlled sized files is hard
• Tuning Spark configuration properly yields surprising results
44
Summary
45
• KPIs on lane marking detection
• DNN for lidar based lane detection
• Tips for testing, configuring and
optimizing Spark
Video
46
https://youtu.be/wNAmxL25Bhk
Thank you for listening!
47#UnifiedDataAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Databricks
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowDatabricks
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Databricks
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowDatabricks
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AIDatabricks
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoDatabricks
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Databricks
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...Databricks
 

What's hot (20)

Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflow
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 

Similar to Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving

Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
The Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIsThe Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIsBigML, Inc
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Khai Tran
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionBATbern
 
AsHES-talk_Final_handouts
AsHES-talk_Final_handoutsAsHES-talk_Final_handouts
AsHES-talk_Final_handoutsMitesh Meswani
 
Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI SnapLogic
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsTechWell
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonSri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordDataWorks Summit
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeDatabricks
 

Similar to Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving (20)

Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
The Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIsThe Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIs
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Mobile gpu cloud computing
Mobile gpu cloud computing Mobile gpu cloud computing
Mobile gpu cloud computing
 
AsHES-talk_Final_handouts
AsHES-talk_Final_handoutsAsHES-talk_Final_handouts
AsHES-talk_Final_handouts
 
Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More Defects
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at Ford
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxbenishzehra469
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatheahmadsaood
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonomous Driving

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Gheorghe Pucea, BMW Group Jennifer Reinelt, BMW Group Lessons Learned from Using Spark for Evaluating Road Detection @ BMW Autonomous Driving #UnifiedDataAnalytics #SparkAISummit
  • 4. Outline 4 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  • 5. BMW AUTONOMOUS DRIVING 5 Car Setup for Autonomous Driving
  • 6. Outline 6 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  • 7. Evaluation of Lane Detection 7 Real lane markings Detected lane markings At 1m? At 50m?At 100m? At 150m? How well does the car detect the lane markings?
  • 8. How well does the car detect the lane markings? Key Performance Indicator (KPI) – Lateral Offset Evaluation of Lane Detection 8 commit 70d9c31 commit c271a01 commit 4e0bcd3 commit 6e3bcd3 150m Functional development time Lateraloffset improvement
  • 9. Challenges: • Where are the real lane markings? How do we get the ground truth? • How do we avoid making the same mistakes as the car when looking for real lane markings? • How do we scale this ground truth generation? Evaluation of Lane Detection 9 Real lane markings Detected lane markings At 1m? At 50m?At 100m? At 150m?
  • 10. How do we get the ground truth? • From manual labels Evaluation of Lane Detection 10 Very accurate Manual Slow Expensive to scale up Bad for Occlusions
  • 11. How do we get the ground truth? • From additional sensors Evaluation of Lane Detection 11 Automated Fast Accurate Expensive to scale up
  • 12. How do we get the ground truth? • Using sophisticated algorithms in the backend Evaluation of Lane Detection 12 Scalable Automated Fast Cheap Lower accuracy
  • 13. Outline 13 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  • 14. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI CalculationRos ConverterData Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Evaluation Pipeline 14 Data Collection InfluxDB
  • 15. Outline 15 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  • 16. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI CalculationRos ConverterData Ingestion Ground Truth Generation Other ApplicationsOther ApplicationsOther Applications 16 Data Collection InfluxDB AI Based Ground Truth
  • 17. AI Based Ground Truth 17 3D Lidar points clouds Semantic Segmentation Lidar intensity in 2D bird‘s eye view Deep Neural Network Lane Marking No Lane Marking
  • 18. Outline 18 • Evaluation of Lane Detection • Evaluation Pipeline • AI Based Ground Truth • Lessons Learned
  • 19. Motivation of Lessons Learned 19 Source: https://twitter.com/bigdataborat?lang=en
  • 20. Motivation of Lessons Learned 20 Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 21. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Lessons Learned – Spark Testing 21 Data Collection InfluxDB
  • 22. Lessons Learned – Spark Testing Typical integration test 22
  • 23. Lessons Learned – Spark Testing Drawback of static ORC‘s commited in the source code 23
  • 24. Test data generation library Lessons Learned – Spark Testing 24 Type classes cats
  • 25. Lessons Learned – Spark Testing Using test data generation library for integration tests 25 Cats FlatMap Type Class Scalacheck generators available with Type Classes
  • 26. Lessons Learned – Spark Testing Sensor data streams as Scala ADT 26
  • 27. Lessons Learned – Spark Testing Example Typeclass for generating Can Messages 27
  • 28. Lessons Learned – Spark Testing Implemeting cats.FlatMap type class 28
  • 29. Lessons Learned – Testing Advantages of using code instead of static Orc files • Compiler helps with breaking changes • Improves test understandability • Flexible manipulation of data using monadic operations 29
  • 30. Lessons Learned – Catalyst Optimizations Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation 30 Data Collection InfluxDB RDD
  • 31. Lessons Learned – Catalyst Optimizations Interested in testing the impact of RDD – Dataset – Dataframe conversion: • Test with 1 GB of Flexray data, ~ 20 runs/experiment • Count the data • Filter data by specific busId 31
  • 32. Lessons Learned – Catalyst Optimizations Running count on ~1GB Flexray data 32 0 50 100 150 200 250 300 350 RDD Dataset Dataframe Processing time(s)
  • 33. Lessons Learned – Catalyst Optimizations How about filtering by busId before counting? 33
  • 34. Lessons Learned – Catalyst Optimizations How about filtering by busId before counting? 34 0 50 100 150 200 250 300 350 RDD Dataset Typed Dataset Untyped Dataframe Processing time(s)
  • 35. Lessons Learned – Catalyst Optimizations Running „explain“ on Dataset yields: 35 Dataset Untyped API Dataset Typed API
  • 36. Lessons Learned – Catalyst Optimizations Which version is applying push down filters? 36 a) left b) right c) both d) none
  • 37. Lessons Learned – Catalyst Optimizations Which version is applying push down filters? 37 a) left b) right c) both d) none busIds: Array[Long] but busId is of type Int
  • 38. Lessons Learned – Optimizations Catalyst optimizations • Types matter for push down filters • Conversion between Dataset Typed and Untyped API might hurt performance • Always check assumptions by looking at metrics/physical execution plan 38
  • 39. Ros bag orc Datacenter: > 230 PB capacity and > 1.500 TB raw data/day > 100.000 Cores and >200 GPUs Reprocessing KPI Calculation Ros Converter Data Ingestion Ground Truth Generation Other Applications Other Applications Other Applications Lessons Learned – Spark Configuration 39 Data Collection InfluxDB > 1GB be available fast be sorted
  • 40. Lessons Learned – Spark Configuration Adding the feature to rosbag converter of writing bags > 1GB Resulted in • Increased processing time • shuffle.FetchFailedException Spark UI showed • Lots of RACK_LOCAL tasks • Task are taking long 40
  • 41. Lessons Learned – Spark Configuration Spark locality parameters 41
  • 42. Lessons Learned – Spark Configuration Tuning Spark locality yields improved processing time 42 0 5 10 15 20 25 30 35 40 #RACK_LOC AL tasks Old config Optimized Spark locality 0 50 100 150 200 250 300 350 400 450 Processing time (s) Old config Optimized Spark locality 100% 20% ~140GB image data ~20 runs
  • 43. Lessons Learned – Spark Configuration Tuning shuffling parameters, spark.reducer.maxReqInFlight 43 0 0.5 1 1.5 2 2.5 3 3.5 4 Failed Tasks Old config Optimized maxReqInFlight 40%
  • 44. Lessons Learned – Configuration Writing controlled size files from Spark: • Pay attention to data locality • Writing controlled sized files is hard • Tuning Spark configuration properly yields surprising results 44
  • 45. Summary 45 • KPIs on lane marking detection • DNN for lidar based lane detection • Tips for testing, configuring and optimizing Spark
  • 47. Thank you for listening! 47#UnifiedDataAnalytics #SparkAISummit
  • 48. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT