About my work at DeNA Co., Ltd.. We built HD maps with images taken by dash cams for self-driving cars. Deep learning is extensively used for detecting objects on the road and SfM is used for reconstructing 3D points with 2D images. This slide was presented at TechCon 2019 https://techcon.dena.com/2019/
Building HD Maps with Dashcams
AI System Group
DeNA Co., Ltd.
• Who I am
• Our Goal
• Intro to DL and SfM
• 3D Point Reconstruction
• Recognizing Objects
• Putting It All Together
Who I am
• Kosuke Kuzuoka
• 22 years old
• June 2018 - Present
AI Research Engineer at DeNA Co., Ltd.
• March 2017 - June 2018
R&D manager at CONCORE’S, inc.
• Self Driving Cars
• Computer Vision
What I have done before
Detecting objects from construction
plans using deep learning algorithms
Patent pending algorithm that I
developed for detecting pillars across
multiple tiled images
● To create high definition maps at
a lower price
● 3D point reconstruction and
object detection in dashcam
● No use of expensive equipment,
such as LiDAR
Isn’t it like google maps?
● A map designed for humans
● It has useful information for
● A map designed for machines
● It has useful information for cars,
such as where traffic signs exist
Is it for self-driving cars?
● It’s extensively used in self-driving cars,
such as for localization and path planning
● Therefore, the location accuracy for HD
maps need to be within a few centimeters
● A self-driving car needs to know which
direction the lane is leading, where the
traffic signs are, etc.
Introduction to Deep Learning
● The idea of deep learning has existed from the late 1950s, invented by Frank Rosenblatt.
● It was originally called Perceptron, and it was able to solve linearly separable problems.
● Later, it turned out that simple Perceptron wasn’t able to solve non-linearly separable
Why is deep learning popular nowadays?
● Large scale datasets such as ImageNet have been made public for research purposes
● High computational resources such as GPU are more accessible than ever before
Okay, but what can you do with DL?
● Using deep learning, we can
solve object detection and
● Object detection detects
multiple objects in the image,
while instance segmentation
segments object boundaries
● Using deep learning, we can
solve image classification and
image localization problems
● Image classification classifies
what is in the image, while
image localization classifies
what and where in the image
Okay, let’s sum that up
• Deep learning is not new
• Data is important for deep learning
• High computational resources are necessary
• You can do so many things with deep learning
Introduction to SfM
SfM stands for Structure from
Motion, and is an algorithm to
reconstruct 3D points (called
structure) from images taken
with different angles or positions
(called motion). Large scale
applications include for example
reconstructing all of Rome using
only images found on the web.
How does SfM work?
● Extracts features from images. e.g.
corners or edges
● Matches the features in images taken
from different positions
● Calculates the corresponding points
in 3D coordinates using triangulation
● Calculates camera position and
optimizes reconstructed 3D points
What can you do with SfM?
It built a 3D representation of Rome within a day with images found on the web. It used
150k images, and the processing time was around 21 hours using 496 CPU cores.
Let’s sum that up
• SfM can reconstruct 3D shapes from 2D images
• 3D representation of Rome can be built in a day
using images from the web
So we have tools. What now?
● Dashcam images are used for reconstructing 3D points by SfM
● The same images are used for detecting objects in 2D space
● Both results are integrated to get 3D representations of each object
3D Point Reconstruction
● Images are taken by driving in the
highlighted region in Minatomirai
● Dashcam images are used for SfM
and object detection
Overall shape looks good
● 3D modeling in relatively small
region in Minatomirai
● Reconstructed shape matches the
highlighted region in the map
Slightly larger region, still good
● Red arrows indicate the direction
the car was driving
● The reconstructed shape matches
the highlighted region in the map
Hooray, view from top is good
● SfM was applied in a larger region
in the Minatomirai area
● Overall shape still matches the map
What about the closer view?
The detail of road markings and speed
limit signs can be found, though some
information is unnecessary
Lanes are reconstructed well on the left
side, but the the center lane markings on
the right are missing. This is caused by
Some findings with SfM are:
• Reconstructed 3D points contain small details
• GPU can reduce the processing time significantly
• The more images, the better the result
● We chose Faster R-CNN for detecting
● Faster R-CNN was a state-of-the-art
detector in 2016
● Faster R-CNN is a really accurate object
detector when compared to other real-time
detectors, but it’s slower
Objects are detected correctly
● Most of traffic signs are detected correctly, though
there is a small traffic sign missed by the detector
● The network predicts the category for each box,
and there are more than 100 categories to choose
Another example for traffic signs
What now for lane detection?
● We chose LaneNet published in 2018 as a lane detector
● LaneNet transforms an original image to a bird’s eye image with learned parameters
● It can detect multiple lane instances at real-time speed and high accuracy
Deep learning can detect lanes!
● Different colors indicate different instances
● You can see that the lanes are detected correctly
● It can detect curved lanes as well, though they
aren’t in the image
Another example for lane detection
What about road markings?
on bird’s eye image
Faster R-CNN on
bird’s eye image
Deep learning works for road markings!
● Road markings are detected correctly.
● It distinguishes the lane from the stop sign
● The detection fits objects, though not perfectly
Another example for road markings
Let’s sum that up
• Traffic sign recognition with more than 100
categories can be solved with deep learning
• Deep learning works well on complicated tasks
such as lane and road marking detection
• The more data, the better the results
Putting It All Together
● Green points indicate the region used for 3D
● The detection has to be done in frames where
the objects are highlighted in green
Results are now integrated
We can get a 3D representation of
detected objects by integrating both
results. The final result will look like
Now, objects are represented in 3D
● Detected traffic signs and road markings are
converted to 3D
● Each object has a 3D representation after
integrating both SfM and object detection results
We are done!
● Reconstructed 3D view looking from top
● You can see the detected lanes and road
markings now have a 3D representation
Using this technique, we could do:
• Automating process for map creation
• Creating HD maps for other services
• Detecting changes automatically