camera-based Lane detection by deep learning

CAMERA-BASED ROAD LANE DETECTION
BY DEEP LEARNING
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California

OUTLINE
 An Empirical Evaluation of Deep Learning on Highway Driving
 Real-Time Lane Estimation using Deep Features and Extra Trees Regression
 Accurate and Robust Lane Detection based on Dual-View Convolutional Neutral Network
 DeepLanes: E2E Lane Position Estimation using Deep NNs
 Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene
 End-to-End Ego Lane Estimation based on Sequential Transfer Learning for Self-Driving Cars
 Deep Learning Lane Marker Segmentation From Automatically Generated Labels
 VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
 Spatial as Deep: Spatial CNN for Traffic Scene Understanding
 Towards End-to-End Lane Detection: an Instance Segmentation Approach
 LaneNet: Real-Time Lane Detection Networks for Autonomous Driving

AN EMPIRICAL EVALUATION OF DEEP LEARNING ON HIGHWAY DRIVING
Meditated Perception
Lane and vehicle Detection
mask detector

AN EMPIRICAL EVALUATION OF DEEP LEARNING ON HIGHWAY DRIVING
Overfeat-mask Lane boundary ground truth
output of lane detector after DBSCAN clustering
Density-based spatial clustering of applications with noise (DBSCAN)
Lane prediction on test image Lane detection in 3D

REAL-TIME LANE ESTIMATION USING DEEP
FEATURES AND EXTRA TREES REGRESSION
 A real-time lane estimation algorithm by adopting a learning framework using
the CNN and extra trees.
 By utilising the learning framework, it predicts the ego-lane location in the
given image even under conditions of lane marker occlusion or absence.
 This CNN is trained to extract robust features from the road images.
 While the extra trees regression model is trained to predict the ego-lane
location from the extracted road features.
 The extra trees are trained with I-O of road features and ego-lane points.
 The ego-lane image points correspond to Bezier spline control points used to
define the left and right lane markers of the ego- lane.

training
testing

 CNN is utilised to extract the features from the entire road image;
 To extract the road specific features, the weights (filters) and bias of the CNN model pre-
trained on the Places dataset is fine-tuned with a dataset of road images;
 To perform the fine-tuning for road feature extraction, formulate the multiclass Places-
CNN model as a binary road classifier;
 The extra trees are an extension of the random forest regression model, which belong to
the class of decision tree-based ensemble learning methods.
(a-b) curved, (c) shadowed, (d) shadowed and absent.

(e-f) partially occluded, (g) partially occluded, absent with different
colored road surface, (h) occluded and absent, (i-l) absent.

ACCURATE AND ROBUST LANE DETECTION BASED ON
DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
 A Dual-View Convolutional Neutral Network (DVCNN) framework for lane detection.
 DVCNN: the front-view image and the top-view one are optimized simultaneously.
 In the front-view image, exclude false detections including moving vehicles, barriers and
curbs, while in the top-view image non-club-shaped structures are removed such as
ground arrows and words.
 A weighted hat-like filter which not only recalls potential lane line candidates, but also
alleviates the disturbance of the gradual textures and reduces most false detections.
 A global optimization function is designed where the lane line probabilities, lengths,
widths, orientations and the amount are all taken into account.

Three steps of lane detection: lane line candidate extraction, DVCNN framework and global optimization.

DVCNN framework

DEEPLANES: E2E LANE POSITION ESTIMATION USING DEEP
NNS
 Positioning a vehicle btw lane boundaries is the core of a self-driving car.
 Approach to estimate lane positions directly using a deep neural network that
operates on images from laterally-mounted down-facing cameras.
 To create a diverse training set, generate semi-artificial images.
 Estimate the position of a lane marker with sub-cm accuracy at 100 frames/s on
an embedded automotive platform, requiring no pre- or post-processing.
The label ti ∈ [0, . . . , 316] for
image Xi corresponds to the row
with the pixel of the lane marking
that is closest to the bottom
border of the image
two
cameras

DEEPLANES: E2E LANE POSITION ESTIMATION USING DEEP
NNS
 Formulated as the classification task of estimating the lane position.
Using a real world bg, various
types of lane markings have
been artificially placed to
synthesize regular lane
markings (a, b) and varying
light conditions (c, d).
For a given image Xi , the deep NN
computes a softmax prob. output vector
Yi = (y0, . . . , y316), yk - row k in image Xi
for the position of the lane marking.

DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
 A multitask deep convolutional network, which simultaneously detects the
presence of the target and the geometric attributes (location and orientation) of
the target with respect to the region of interest.
 A recurrent neuron layer is adopted for structured visual detection. The recurrent
neurons can deal with the spatial distribution of visible cues belonging to an
object whose shape or structure is difficult to explicitly define.
 The multitask CNN provides auxiliary geometric information to help the
subsequent modeling of the given lane structures.
 The RNN automatically detects lane boundaries, including those areas containing
no marks, without any prior knowledge or secondary modeling.

Diagram of lane recognition using deep
neural networks. The workflow of the two
frameworks for lane recognition. Both
frameworks employ CNNs for feature
extraction. Framework I predicts both the
presence of the targets and the relevant
geometric attributes. Framework II first
processes an input image as a sequence
of ROIs, and applies two steps of feature
extraction on each ROI: by CNNs and by
RNNs. The latter can automatically
recognize global structures over multiple
ROIs. Optionally, higher level models of
the lane structures can be constructed
based on the predictions.

Structure-aware detector based on RNN. The initial steps
represent the input region using the convolution-max
pooling operations. The predictions are made from the layer
of recurrent neurons, which are computed from the
extracted features of the current instance and their own
current status. Since the status of the recurrent neurons
forms the system state, at any moment, the network takes
the previous observations into account to make the current
predictions.
Applying RNN detector on road surface image. The detection
process of lane boundaries on road surface. Each ROI is a
strip. The NN processes the strips in an image from left (near)
to right (far). For each ROI, multiple binary decisions are made,
corresponding to detecting the target (lane boundary) in small
patches within the ROI. Each ROI consists of a stack of such
small patches, and the red patches contain lane boundaries.
“C” - convolution maxpooling layer, “R” - recurrent hidden layer.

IPM image. Each pixel corresponds
to a 0.1×0.1 m2 ground area. An
IPM image can aggregate one or
multiple camera images to a unified
map of the road surface. IPM image
integrates 3 camera observations.

END-TO-END EGO LANE ESTIMATION BASED ON SEQUENTIAL
TRANSFER LEARNING FOR SELF-DRIVING CARS
 Autonomous cars establish driving strategies using the positions of ego lanes.
 A sequential end-to-end transfer learning method to estimate left and right ego
lanes directly and separately without any post-processing.
 Constructed an extensive dataset that is suitable for a deep neural network
training by collecting a variety of road conditions, annotating ego lanes, and
augmenting them systematically.

END-TO-END EGO LANE ESTIMATION BASED ON SEQUENTIAL
TRANSFER LEARNING FOR SELF-DRIVING CARS

DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
 Train a DNN for detecting lane markers in images without manually labeling
any images.
 To project HD maps for AD into the image and correct for misalignments due
to inaccuracies in localization and coordinate frame transformations.
 The corrections are performed by calculating the offset between features
within the map and detected ones in the images.
 By using detections in the image for refining the projections, labels is close to
pixel perfect.
 After a fast, visual quality check, the projected lane markers can be used for
training a fully convolutional network to segment lane markers in images.
 The network regularly detects lane markers at distances of ~ 150 meters on a
1M camera.

Automatically generated label (blue) using a HD
map for automated driving. Lanes are projected into
the image up to a distance of 200 meters.
The labeling pipeline consists of 3 steps:
1.) Coarse pose graph alignment
using only GPS and relative motion
constraints;
2.) Lane alignment by adding lane
marker constraints to the graph;
3.) Pixel-accurate refinement in
image space using re-projection
optimization per image starting from
the correspond. graph pose.

The graph pose vertices (blue) are connected by relative 6-DOF motion measurement
edges (thin solid black). Left: The graph state shows the result of an optimization with only
GPS measurement edges (yellow). The gray areas show the initial lane marker matches
between lane marker measurements (green) and lane marker map (thick solid black). Right:
The graph state after a some iterations with outlier lane marker matches removed based on
a decreasing distance threshold.

 To tightly align the graph to the road, add matches
of detected lane markers to all map lane markers
based on a matching range threshold;
 3D lane marker detections for alignment can be
computed with simple techniques, such as a top-
hat filter and a stereo camera setup;
 To extract line segments from these detections,
run a Douglas-Peucker polygonization and add
the resulting 3D line segments to the correspond.
pose vertices for matching.
 To achieve pixel-accurate labels, a reprojection
optimization with line segments in image space;
After graph alignment of the projected map lane
markers (blue) and the detected lane markers from a
simple detector (green). The perpendicular average
distance btw line segments is used as a matching
criterion for an optimization that solves for the pixel-
accurate corrected 6-DOF camera pose.

 It generates probability maps
over the image without losing info.
such as marker width;
 Based on the pixel- wise output,
possible to model the output
differently, e.g. using splines;
 There are no assumptions about
the number of lanes or type of
marker, e.g. solid or dashed;
 Lane marker detection as a
semantic segmentation problem
by employing FCNs;

Left: Lane markers detected in the image. Center: Correctly detected lane markers are shown in green, false
negatives in blue and false positives in red. Dashed lane markers are extended such that they end up being
completely detected after some distance. False positives are mainly found randomly, around cars, and at lane
markers that are not fully covered by the labels. Right: Number of misclassified pixels within each image line.

VPGNET: VANISHING POINT GUIDED NETWORK FOR LANE AND
ROAD MARKING DETECTION AND RECOGNITION
 A unified end-to-end trainable multi-task network that jointly handles lane and road
marking detection and recognition that is guided by a vanishing point under
adverse weather conditions.
 Images taken under rainy days are subject to low illumination, while wet roads
cause light reflection and distort the appearance of lane and road markings.
 At night, color distortion occurs under limited illumination.
 A lane and road marking benchmark which consists of about 20k images with 17
lane and road marking classes under 4 different scenarios: no rain, rain, heavy
rain, and night.
 VPGNet, can detect and classify lanes and road markings, and predict a vanishing
point with a single forward pass.

VPGNET: VANISHING POINT GUIDED NETWORK FOR LANE AND
ROAD MARKING DETECTION AND RECOGNITION
VPGNet performs four tasks: grid regression, object detection, multi-label classification, and vanishing point prediction.

SPATIAL AS DEEP: SPATIAL CNN FOR TRAFFIC SCENE
UNDERSTANDING
 Though CNN has strong capability to extract semantics
from raw pixels, it can capture spatial relationships of
pixels across rows and columns is not fully explored.
 To learn semantic objects with strong shape priors but
weak appearance coherences, such as traffic lanes,
often occluded or not even painted on the road surface.
 Spatial CNN (SCNN), generalizes traditional deep layer-
by-layer convolutions to slice-by- slice convolutions
within feature maps, thus enabling message passings
between pixels across rows and columns in a layer.
 Such SCNN is particular suitable for long continuous
shape structure or large objects, with strong spatial
relationship but less appearance clues, such as traffic
lanes, poles, and wall.
Comparison btw CNN and SCNN in (a) lane
detection and (b) semantic segmentation.
From left to right are: input image, output of
CNN, output of SCNN. It can be seen that
SCNN could better capture the long continuous
shape prior of lane markings and poles and fix
the disconnected parts in CNN.

UNDERSTANDING
MRF/CRF based method
Spatial CNN

UNDERSTANDING
Message passing directions in (a)
dense MRF/CRF and (b) Spatial
CNN (rightward). For (a), only
message passing to the inner 4
pixels shown.
DeepLa
b
Training model Lane prediction

UNDERSTANDING
Comparison between probmaps of baseline, ReNet, MRFNet, ResNet-101, and SCNN.

TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
 Approaches of lane detection leverage deep learning models, trained for pixel-wise lane
segmentation, even when no markings present in the image due to their big receptive field.
 However, these methods are limited to detecting a pre-defined, fixed number of lanes, e.g.
ego-lanes, and can not cope with lane changes.
 Cast the lane detection as an instance segmentation problem – in which each lane forms its
own instance – that can be trained end-to-end.
 To parametrize the segmented lane instances before fitting the lane, apply a learned
perspective transformation, conditioned on the image.
 Ensure a lane fitting which is robust against road plane changes, unlike existing
approaches that rely on a fixed, pre- defined transformation.
 LaneNet’s architecture is based on the encoder-decoder network ENet, which is
consequently modified into a two-branched network.

Given an input image, LaneNet outputs a
lane instance map, by labeling each lane
pixel with a lane id. Next, the lane pixels are
transformed using the transformation matrix,
outputted by H-Net which learns a
perspective transformation conditioned on
the input image. For each lane a 3rd order
polynomial is fitted and the lanes are
reprojected onto the image.

The segmentation branch is trained to produce a binary lane mask. The embedding branch generates an N-dim
embedding per lane pixel, so that embeddings from the same lane are close together and those from different
lanes are far in the manifold. After masking out the BG pixels using the binary segmentation map from the
segmentation branch, the lane embeddings are clustered together and assigned to their cluster centers.

Curve fitting. Left: The lane points are transformed using
the matrix H generated by H-Net. Mid: A line is fitted
through the transformed points and the curve is evaluated
at different heights (red). Right: The evaluated points are
transformed back to the original image space.

Comparison btw a fixed homography and
a conditional homography (using H-Net)
for lane fitting. The green dots can’t be
fitted correctly using a fixed homography
because of ground plane changes, which
can be resolved by using a conditional
homography using H-Net (last row).

Top: ground-truth lane points. Middle: LaneNet output. Bottom: final lane predicts after lane fitting.

LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
 A deep neural network based method, named LaneNet, to break down the lane detection
into two stages: lane edge proposal and lane line localization.
 Stage one uses a lane edge proposal network for pixel-wise lane edge classification;
 The lane line localization network in stage two detects lane lines based on lane edge proposals.
 Some difficulties on suppressing the false detections on the similar lane marks on the
road like arrows and characters.
 No any assumptions on the lane number or the lane line patterns.
 The high running speed and low computational cost endow LaneNet the capability of
being deployed on vehicle-based systems.

Training the network to predict the point locations
where a lane intersects with the top, middle and the
bottom line of an image, respectively, then use the
average of the l2 distance between predicted key
values and the real key values of each lane as the
loss function and, minimize the distance using
stochastic gradient descent.
Train the network to minimize a combination of the
l2 loss and the min-distance loss

camera-based Lane detection by deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to camera-based Lane detection by deep learning

Similar to camera-based Lane detection by deep learning (20)

More from Yu Huang

More from Yu Huang (20)

Recently uploaded

Recently uploaded (20)

camera-based Lane detection by deep learning