BMW at DataWorks Summit 2018 Berlin
18.04.2018
DATA DRIVEN DEVELOPMENT OF
AUTONOMOUS DRIVING AT BMW
ABOUTTHE SPEAKERS
Felix Reuthlinger
§ Data Engineer for AD
§ Joined BMW in 2015
§ Before joining AD, I was
Big Data Architect at BMW central IT
§ Focus: Data center and data flow architecture for AD
§ Strong in: Spark, Scala
§ Co-founding and member of http://munich-datageeks.de/
Dogukan Sonmez
§ Software Engineer for AD
§ Joined BMW in 2017
§ Prior to BMW worked at various big data
and machine learning projects at SAP, Siemens and Sony
§ Focus: Data and Simulation for AD
§ Strong in: Distributed systems and software craftsmanship
§ Hobbies: Building wooden furniture, painting, IoT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 2
AGENDA
Why Autonomous Driving requires data
How we
get data
process data
serve data
ensure data quality
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 3
WHYAUTONOMOUS DRIVING REQUIRES DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 4
AUTONOMOUS DRIVING LEVELS
NO SUPPORT
HANDS ON
ASSISTENCE PARTLYAUTOMATED AUTONOMOUSHIGHLYAUTOMATED FULLYAUTOMATED
Vehicle controls forward and
sideward motion
Vehicle controls
forward motion
Driver has full control
Driver controls steering and
checks forward motion
Driver checks forward and
sideward motion
Driver is ready to take control
at any time
Driver only required for certain
parts of the track
AUFGABE
DES FAHRERS
AUFGABE DES
FAHRZEUGS
0 1 2 3 4 5
G11 / G30 iNEXT iNEXT Pilotserie tbd.
HANDS ON HANDS TEMP. OFF
EYES TEMP. OFF
HANDSOFF
EYESOFF
HANDS OFF
MINDOFF
PASSENGER
TRANSITION OF REPONSIBILITYHUMAN MACHINE
TECHNO-
LOGICAL
‘MOONSHOT’
TECHNO-
LOGICAL
QUANTUM
LEAP
Vehicle requests driver to
take over control based on
situations
Vehicle does not request
driver to take over control
No driver required
*Source: SAE (Society of Automotive Engineers) International Level of Automation
Page 5Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018
Full Range Radar.
Page 6
NIGHTVISION.Side View Camera.
Side Range
Radar.
Surround View Camera.
Ultra-sonic.
Stereo Front Camera.
Rear View Camera.
Side Range Radar.
Ultra-sonic.
STEERING AND LANE CONTROL ASSISTANT INCL.
LANE CHANGE ASSISTANT.
SURROUND VIEW.
ACTIVE CRUISE CONTROL.
SPEED LIMIT ASSIST.
EMERGENCY STEERING ASSIST.
WRONG WAY ASSIST.
CROSSROAD ASSIST.
ADAS* SYSTEM SETUP
(* AUTONOMOUS DRIVING ASSISTANCE SYSTEMS)
23 SENSORS
BMW SERIES 5
DATA DRIVEN DEVELOPMENT FOR AD @ BMW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 7
SeveralTB/h
Upto 500 PB/a
ML Experiments/Training
Test drives
Data Ingest to Data Center
Organize
Structure
KPI report
Deployment of
trained algorithms
ML data sets
Phase out /
Balance datasets
Combinatorial boost of scenarios
Synthetic data
Focus of thistalk
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 8
We got hundreds of PBs
of datato crunch …
Have a lot squirrels do it?
Probably not …
DATA JOURNEY OVERVIEW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 9
Logger File Copy / Ingest Instance
Hadoop
File(s)
Meta store
InputFormat,
Defragmentation, Decoding
Speed Weather
25 km/h Sunny
30 km/h Sunny
Analytics, Functions,
Learning, …
I want to work on
data from a sunny
drive in June, …
HOW WE GET DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 10
FILE FORMAT STANDARD
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 11
MDF4 (Measurement Data Format,version 4)
à https://www.asam.net/standards/detail/mdf/
Standard in automotive industry (by ASAM organization https://www.asam.net/ )
Organized in binary blocks
MDF4 has multiple usagetypes
sorted / unsorted content
for recording (hardware loggers) or calculated data
for data exchange and long-term storage
BMW AG is one of the standard authors
FILE FORMAT – HOW WE USE IT
Logger centric:
Main use case à hardware logger inthe car
Very high data bandwidth à write down data quickly (FIFO)
Our MDF4 files:
Unsorted content
Multiple small blocks for metadata
One continuous big block for storing record payload data
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 12
* Example generated with our custom implementation of Mdf4Writer
* Example hardware logger inthe car
FILE FORMAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 13
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header (ID block)
->this is MDF4 of version X
MDF block
Block header
à Block description, size
link[0]
link[1]
---
Link[n]
Data section
à Fields
MDF block
Block header
à Block description, size
link[0]
link[1]
---
Link[n]
Data Section
à Fields
Data block
Block header
à Block description, size
Data Section
à Records / payloads
à Dynamic record size
à No indexing
è This causes the file to
be not split-able
Substructures, like structs, contain metadata downtothe Data Block
We use only 1 data
block here
It covers 99,99% of
thetotalvolume
….
DATA COLLECTION FLEET
40 VEHICLES IN 2017
BMW 7 SERIES
DATA LOGGING IN THE CAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 15
Logger
SSD
File set 1
File set 2
Logger
Logger config :
car 1, drive 1, …
FIFO
Roll over to next file at 2 GB
(ca. 5s data)
0E 1A 87 …
12 1B AA …
00 01 2A …
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #2, …
87 1B AA
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
Header : car 1, drive 1
file set 2, file #2, …
04 23 0A
HOW WE PROCESS DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 16
DATA PROCESSING
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 17
Hadoop / HDFS Hadoop / Spark
InputFormat
RDD / DF
…
Hadoop / HDFS
Speed Weather
25 km/h Sunny
30 km/h SunnyDrive meta data
Merged header information
Hadoop / HBase
Meta store
Note: we parallelize by scaling out over multiple driving sessions
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #2, …
87 1B AA
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
RDD / DFRDD / DF
RDD / DF
…
RDD / DF
…
read defragment decode store
DEEP DIVE ABOUT REDUCING I/O
Continuous data collection requires continuous processing.
Challenges:
Potentially thousands of files per driving session
MDF4 using dynamic record length, no clear split
Seeks inside file
Defragmentation = groupingtransformation
Goal: reduce network I/O.
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 18
MDF4 INPUT FORMAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 19
CUSTOM INPUT FORMAT IMPLEMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 20
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
RDD / DF
…
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
2 GB file size
dfs.blocksize=2G
è 1 file = 1 input split
Mdf4Record=
Metadata
Payload
(binary)
Mdf4Reader
InputSplit
…
Mdf4Reader
Executor / Partition
Mdf4InputFormat
Executor / Partition
DEFRAGMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 21
DATA REPRESENTATION IN THE CAR BUS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018
Page 22
Image from camera
Ethernet IPv4
SomeIP
UDP UDP Datagram
Ethernet IPv4 fragment
Ethernet IPv4 fragment
Ethernet IPv4 fragment
Ethernet IPv4
SomeIP
UDP UDP Datagram
…
DATA STRUCTURE FRAGMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 23
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
In ~2% of the cases, data overlaps over multiple files
……
12 1A90
…
~98% of the data structures are within one MDF4 fileImage from camera
Key Value
A
A
A
WHY NOT USE WHAT IS ALREADY AVAILABLE
Reduce-by-key / group-by-key will shuffle most / all fragments.
Applied function on grouping has still huge result volume (partial image).
Defragmentation requires completeness, incomplete partial-defragmented results might again require shuffle.
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 24
Key Value
A 12
A 23
B 54
A 47
B 24
Key Sum
A 82
B 78
Key Sum
A
This will not get us a result
Works for aggregation
What if something
is missing?
DEFRAGMENTATION PROCESS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 25
RDD with
fragments
create local-
complete
RDD
Reduce local-
complete RDD
Create local-
incomplete RDD from
remaining fragments
reduce local-
incomplete
RDD
Executor
Partition #1
Message #1
Executor
Partition #1
Message #1
ExecutorExecutor
Partition #1
#2
#3
#4
Partition #2
Message #1
(example: completeness = 4 fragments)
RDD #1
RDD #2 RDD #3
Executor
#4
Partition #2 Executor
Partition #2
#4
ExecutorExecutor
Partition #1
#2
#3
Partition #2
RDD #4
Executor
Partition #1
#2
RDD #5
Executor
Partition #2
#3
Union #3 and #5,
Discard remaining
uncomplete fragments
SHUFFLE RESULTS: EXAMPLE
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 26
This example result shows limited shuffling
HOW WE SERVE DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 27
THE DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 28
Speed Weather Environment
85 km/h Rainy Highway
30 km/h Sunny Urban
V1
LIDAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 29
LIDAR: Light Detection and Ranging
Good for generating a precise 3D map
Not reliable during bad weather conditions
RADAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 30
Long and short range inthe car
Good for detecting moving objects
Reliable during bad weather conditions
IMAGE
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 31
RCCC Image
RCCC format, compressed or uncompressed
Good for object recognitions (traffic lights, street signs, lane lines)
WHO ARE THE DATA USERS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 32
Machine Learning
Engineer
Software Engineer
Algorithm Developer
Robotics Engineer
Applied Scientist
WHICH DATA USERS INTERESTED IN
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 33
WANTED
Parquet or ORC
Drive in
highway at
rainy day
★★★★★★
★ ★
WANTED
jpeg
Camera
images
WANTED
Rosbag
Sensory data
IMU, GPS
★ ★
WANTED
DF or DS
Lidar
and
radar data
★ ★
WANTED
HDF5
Urban drive
with traffic
lights
★★★★★★
★★★★★★
★★★★★★
★★★★★★
WHAT OUR USERS DO WITH THAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 34
Building driving strategy
Signal processing, sensor fusion
Sensor validation
Simulation
OUR PHILOSOPHY FOR DATA PROVISIONING
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 35
Evangelize data driven development
Big datatrainings
On boarding new usersto use our cluster
Abstract away data cluster complexity but also allow user to developtop of it
DATA ACCESS CHALLENGES
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 36
Scalable way of accessing big data
Continuously changing data structure makes it harder to work with data
Variety and complexity of data andtheir formats
Data centers acrossthe world and data shipping (in case privacy is not affected)
DATA ACCESS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 37
Hadoop / HDFS
Speed Weather
25 km/h Sunny
60 km/h Sunny
Meta store
Hadoop / Spark / …
Data search API
RDD / DF
speed Weather front_camera_image
60 km/h Sunny
55 km/h Sunny
select (speed, front_camera_image) where (whether=sunny and speed > 50)
HOW WE ENSURE DATA QUALITY
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 38
WHY DATA QUALITY IS IMPORTANT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 39
We don’t want to wastetime and resources by having unnecessary test drives
We don’t want to store datathat users cannot use
We don’t want to provide bad data
IT’S ALL ABOUT …
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 40
The GOOD The BAD The UGLY
WHAT COULD POSSIBLY GO WRONG
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 41
Logger
Image Frame drops
Calibration Errors
Configuration Errors
Corrupted sensory data
WHICH DATA IS INTERESTING TO USERS
Highway / urban drives
Drive at the night
Rainy day drive
Drive which in cross roads
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 42
ENSURING DATA QUALITY
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 43
Centralized data quality framework
Built top of the spark
Kafka for inter-application communication
CUSTOM INPUT DISCRETIZED STREAM IMPLEMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 44
CustomInputDStream
InputDStream
Creates a new RDD once new data available
Uses streaming scheduler to run continuously
Triggered once a new message is sent
DATA QUALITY FRAMEWORK
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 45
HDFS
Hadoop / HDFS
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
DATA DRIVEN DEVELOPMENT FOR AD @ BMW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 46
SeveralTB/h
Upto 500 PB/a
ML Experiments/Training
Test drives
Data Ingest to Data Center
Organize
Structure
KPI report
Deployment of
trained algorithms
ML data sets
Phase out /
Balance datasets
Combinatorial boost of scenarios
Synthetic data
Focus of thistalk
WE ARE HIRING
The BMW AD organization is growing!
Visit our booth :)
We are also at Strata London in May
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 47
Autonomous Driving Campus
We got PBdata!
EXCITING TIMES AHEAD – THANKYOU FORYOUR INTEREST.

Data Driven Development of Autonomous Driving at BMW

  • 1.
    BMW at DataWorksSummit 2018 Berlin 18.04.2018 DATA DRIVEN DEVELOPMENT OF AUTONOMOUS DRIVING AT BMW
  • 2.
    ABOUTTHE SPEAKERS Felix Reuthlinger §Data Engineer for AD § Joined BMW in 2015 § Before joining AD, I was Big Data Architect at BMW central IT § Focus: Data center and data flow architecture for AD § Strong in: Spark, Scala § Co-founding and member of http://munich-datageeks.de/ Dogukan Sonmez § Software Engineer for AD § Joined BMW in 2017 § Prior to BMW worked at various big data and machine learning projects at SAP, Siemens and Sony § Focus: Data and Simulation for AD § Strong in: Distributed systems and software craftsmanship § Hobbies: Building wooden furniture, painting, IoT Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 2
  • 3.
    AGENDA Why Autonomous Drivingrequires data How we get data process data serve data ensure data quality Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 3
  • 4.
    WHYAUTONOMOUS DRIVING REQUIRESDATA Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 4
  • 5.
    AUTONOMOUS DRIVING LEVELS NOSUPPORT HANDS ON ASSISTENCE PARTLYAUTOMATED AUTONOMOUSHIGHLYAUTOMATED FULLYAUTOMATED Vehicle controls forward and sideward motion Vehicle controls forward motion Driver has full control Driver controls steering and checks forward motion Driver checks forward and sideward motion Driver is ready to take control at any time Driver only required for certain parts of the track AUFGABE DES FAHRERS AUFGABE DES FAHRZEUGS 0 1 2 3 4 5 G11 / G30 iNEXT iNEXT Pilotserie tbd. HANDS ON HANDS TEMP. OFF EYES TEMP. OFF HANDSOFF EYESOFF HANDS OFF MINDOFF PASSENGER TRANSITION OF REPONSIBILITYHUMAN MACHINE TECHNO- LOGICAL ‘MOONSHOT’ TECHNO- LOGICAL QUANTUM LEAP Vehicle requests driver to take over control based on situations Vehicle does not request driver to take over control No driver required *Source: SAE (Society of Automotive Engineers) International Level of Automation Page 5Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018
  • 6.
    Full Range Radar. Page6 NIGHTVISION.Side View Camera. Side Range Radar. Surround View Camera. Ultra-sonic. Stereo Front Camera. Rear View Camera. Side Range Radar. Ultra-sonic. STEERING AND LANE CONTROL ASSISTANT INCL. LANE CHANGE ASSISTANT. SURROUND VIEW. ACTIVE CRUISE CONTROL. SPEED LIMIT ASSIST. EMERGENCY STEERING ASSIST. WRONG WAY ASSIST. CROSSROAD ASSIST. ADAS* SYSTEM SETUP (* AUTONOMOUS DRIVING ASSISTANCE SYSTEMS) 23 SENSORS BMW SERIES 5
  • 7.
    DATA DRIVEN DEVELOPMENTFOR AD @ BMW Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 7 SeveralTB/h Upto 500 PB/a ML Experiments/Training Test drives Data Ingest to Data Center Organize Structure KPI report Deployment of trained algorithms ML data sets Phase out / Balance datasets Combinatorial boost of scenarios Synthetic data Focus of thistalk
  • 8.
    Data Driven Developmentof Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 8 We got hundreds of PBs of datato crunch … Have a lot squirrels do it? Probably not …
  • 9.
    DATA JOURNEY OVERVIEW DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 9 Logger File Copy / Ingest Instance Hadoop File(s) Meta store InputFormat, Defragmentation, Decoding Speed Weather 25 km/h Sunny 30 km/h Sunny Analytics, Functions, Learning, … I want to work on data from a sunny drive in June, …
  • 10.
    HOW WE GETDATA Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 10
  • 11.
    FILE FORMAT STANDARD DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 11 MDF4 (Measurement Data Format,version 4) à https://www.asam.net/standards/detail/mdf/ Standard in automotive industry (by ASAM organization https://www.asam.net/ ) Organized in binary blocks MDF4 has multiple usagetypes sorted / unsorted content for recording (hardware loggers) or calculated data for data exchange and long-term storage BMW AG is one of the standard authors
  • 12.
    FILE FORMAT –HOW WE USE IT Logger centric: Main use case à hardware logger inthe car Very high data bandwidth à write down data quickly (FIFO) Our MDF4 files: Unsorted content Multiple small blocks for metadata One continuous big block for storing record payload data Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 12 * Example generated with our custom implementation of Mdf4Writer * Example hardware logger inthe car
  • 13.
    FILE FORMAT Data DrivenDevelopment of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 13 Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A Header (ID block) ->this is MDF4 of version X MDF block Block header à Block description, size link[0] link[1] --- Link[n] Data section à Fields MDF block Block header à Block description, size link[0] link[1] --- Link[n] Data Section à Fields Data block Block header à Block description, size Data Section à Records / payloads à Dynamic record size à No indexing è This causes the file to be not split-able Substructures, like structs, contain metadata downtothe Data Block We use only 1 data block here It covers 99,99% of thetotalvolume ….
  • 14.
    DATA COLLECTION FLEET 40VEHICLES IN 2017 BMW 7 SERIES
  • 15.
    DATA LOGGING INTHE CAR Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 15 Logger SSD File set 1 File set 2 Logger Logger config : car 1, drive 1, … FIFO Roll over to next file at 2 GB (ca. 5s data) 0E 1A 87 … 12 1B AA … 00 01 2A … Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A Header : car 1, drive 1 file set 1, file #2, … 87 1B AA Header : car 1, drive 1 file set 2, file #1, … 00 01 2A Header : car 1, drive 1 file set 2, file #2, … 04 23 0A
  • 16.
    HOW WE PROCESSDATA Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 16
  • 17.
    DATA PROCESSING Data DrivenDevelopment of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 17 Hadoop / HDFS Hadoop / Spark InputFormat RDD / DF … Hadoop / HDFS Speed Weather 25 km/h Sunny 30 km/h SunnyDrive meta data Merged header information Hadoop / HBase Meta store Note: we parallelize by scaling out over multiple driving sessions Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A Header : car 1, drive 1 file set 1, file #2, … 87 1B AA Header : car 1, drive 1 file set 2, file #1, … 00 01 2A RDD / DFRDD / DF RDD / DF … RDD / DF … read defragment decode store
  • 18.
    DEEP DIVE ABOUTREDUCING I/O Continuous data collection requires continuous processing. Challenges: Potentially thousands of files per driving session MDF4 using dynamic record length, no clear split Seeks inside file Defragmentation = groupingtransformation Goal: reduce network I/O. Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 18
  • 19.
    MDF4 INPUT FORMAT DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 19
  • 20.
    CUSTOM INPUT FORMATIMPLEMENTATION Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 20 Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A RDD / DF … Header : car 1, drive 1 file set 2, file #1, … 00 01 2A 2 GB file size dfs.blocksize=2G è 1 file = 1 input split Mdf4Record= Metadata Payload (binary) Mdf4Reader InputSplit … Mdf4Reader Executor / Partition Mdf4InputFormat Executor / Partition
  • 21.
    DEFRAGMENTATION Data Driven Developmentof Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 21
  • 22.
    DATA REPRESENTATION INTHE CAR BUS Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 22 Image from camera Ethernet IPv4 SomeIP UDP UDP Datagram Ethernet IPv4 fragment Ethernet IPv4 fragment Ethernet IPv4 fragment Ethernet IPv4 SomeIP UDP UDP Datagram …
  • 23.
    DATA STRUCTURE FRAGMENTATION DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 23 Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A In ~2% of the cases, data overlaps over multiple files …… 12 1A90 … ~98% of the data structures are within one MDF4 fileImage from camera
  • 24.
    Key Value A A A WHY NOTUSE WHAT IS ALREADY AVAILABLE Reduce-by-key / group-by-key will shuffle most / all fragments. Applied function on grouping has still huge result volume (partial image). Defragmentation requires completeness, incomplete partial-defragmented results might again require shuffle. Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 24 Key Value A 12 A 23 B 54 A 47 B 24 Key Sum A 82 B 78 Key Sum A This will not get us a result Works for aggregation What if something is missing?
  • 25.
    DEFRAGMENTATION PROCESS Data DrivenDevelopment of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 25 RDD with fragments create local- complete RDD Reduce local- complete RDD Create local- incomplete RDD from remaining fragments reduce local- incomplete RDD Executor Partition #1 Message #1 Executor Partition #1 Message #1 ExecutorExecutor Partition #1 #2 #3 #4 Partition #2 Message #1 (example: completeness = 4 fragments) RDD #1 RDD #2 RDD #3 Executor #4 Partition #2 Executor Partition #2 #4 ExecutorExecutor Partition #1 #2 #3 Partition #2 RDD #4 Executor Partition #1 #2 RDD #5 Executor Partition #2 #3 Union #3 and #5, Discard remaining uncomplete fragments
  • 26.
    SHUFFLE RESULTS: EXAMPLE DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 26 This example result shows limited shuffling
  • 27.
    HOW WE SERVEDATA Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 27
  • 28.
    THE DATA Data DrivenDevelopment of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 28 Speed Weather Environment 85 km/h Rainy Highway 30 km/h Sunny Urban V1
  • 29.
    LIDAR Data Driven Developmentof Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 29 LIDAR: Light Detection and Ranging Good for generating a precise 3D map Not reliable during bad weather conditions
  • 30.
    RADAR Data Driven Developmentof Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 30 Long and short range inthe car Good for detecting moving objects Reliable during bad weather conditions
  • 31.
    IMAGE Data Driven Developmentof Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 31 RCCC Image RCCC format, compressed or uncompressed Good for object recognitions (traffic lights, street signs, lane lines)
  • 32.
    WHO ARE THEDATA USERS Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 32 Machine Learning Engineer Software Engineer Algorithm Developer Robotics Engineer Applied Scientist
  • 33.
    WHICH DATA USERSINTERESTED IN Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 33 WANTED Parquet or ORC Drive in highway at rainy day ★★★★★★ ★ ★ WANTED jpeg Camera images WANTED Rosbag Sensory data IMU, GPS ★ ★ WANTED DF or DS Lidar and radar data ★ ★ WANTED HDF5 Urban drive with traffic lights ★★★★★★ ★★★★★★ ★★★★★★ ★★★★★★
  • 34.
    WHAT OUR USERSDO WITH THAT Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 34 Building driving strategy Signal processing, sensor fusion Sensor validation Simulation
  • 35.
    OUR PHILOSOPHY FORDATA PROVISIONING Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 35 Evangelize data driven development Big datatrainings On boarding new usersto use our cluster Abstract away data cluster complexity but also allow user to developtop of it
  • 36.
    DATA ACCESS CHALLENGES DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 36 Scalable way of accessing big data Continuously changing data structure makes it harder to work with data Variety and complexity of data andtheir formats Data centers acrossthe world and data shipping (in case privacy is not affected)
  • 37.
    DATA ACCESS Data DrivenDevelopment of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 37 Hadoop / HDFS Speed Weather 25 km/h Sunny 60 km/h Sunny Meta store Hadoop / Spark / … Data search API RDD / DF speed Weather front_camera_image 60 km/h Sunny 55 km/h Sunny select (speed, front_camera_image) where (whether=sunny and speed > 50)
  • 38.
    HOW WE ENSUREDATA QUALITY Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 38
  • 39.
    WHY DATA QUALITYIS IMPORTANT Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 39 We don’t want to wastetime and resources by having unnecessary test drives We don’t want to store datathat users cannot use We don’t want to provide bad data
  • 40.
    IT’S ALL ABOUT… Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 40 The GOOD The BAD The UGLY
  • 41.
    WHAT COULD POSSIBLYGO WRONG Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 41 Logger Image Frame drops Calibration Errors Configuration Errors Corrupted sensory data
  • 42.
    WHICH DATA ISINTERESTING TO USERS Highway / urban drives Drive at the night Rainy day drive Drive which in cross roads Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 42
  • 43.
    ENSURING DATA QUALITY DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 43 Centralized data quality framework Built top of the spark Kafka for inter-application communication
  • 44.
    CUSTOM INPUT DISCRETIZEDSTREAM IMPLEMENTATION Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 44 CustomInputDStream InputDStream Creates a new RDD once new data available Uses streaming scheduler to run continuously Triggered once a new message is sent
  • 45.
    DATA QUALITY FRAMEWORK DataDriven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 45 HDFS Hadoop / HDFS Header : car 1, drive 1 file set 1, file #1, … 0E 12 1A Header : car 1, drive 1 file set 2, file #1, … 00 01 2A
  • 46.
    DATA DRIVEN DEVELOPMENTFOR AD @ BMW Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 46 SeveralTB/h Upto 500 PB/a ML Experiments/Training Test drives Data Ingest to Data Center Organize Structure KPI report Deployment of trained algorithms ML data sets Phase out / Balance datasets Combinatorial boost of scenarios Synthetic data Focus of thistalk
  • 47.
    WE ARE HIRING TheBMW AD organization is growing! Visit our booth :) We are also at Strata London in May Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 47 Autonomous Driving Campus We got PBdata!
  • 48.
    EXCITING TIMES AHEAD– THANKYOU FORYOUR INTEREST.