CAMERA-BASED ROAD LANE DETECTION
BY DEEP LEARNING
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
OUTLINE
 An Empirical Evaluation of Deep Learning on Highway Driving
 Real-Time Lane Estimation using Deep Features and Extra Trees Regression
 Accurate and Robust Lane Detection based on Dual-View Convolutional Neutral Network
 DeepLanes: E2E Lane Position Estimation using Deep NNs
 Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene
 End-to-End Ego Lane Estimation based on Sequential Transfer Learning for Self-Driving Cars
 Deep Learning Lane Marker Segmentation From Automatically Generated Labels
 VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
 Spatial as Deep: Spatial CNN for Traffic Scene Understanding
 Towards End-to-End Lane Detection: an Instance Segmentation Approach
 LaneNet: Real-Time Lane Detection Networks for Autonomous Driving
AN EMPIRICAL EVALUATION OF DEEP LEARNING ON HIGHWAY DRIVING
Meditated Perception
Lane and vehicle Detection
mask detector
AN EMPIRICAL EVALUATION OF DEEP LEARNING ON HIGHWAY DRIVING
Overfeat-mask Lane boundary ground truth
output of lane detector after DBSCAN clustering
Density-based spatial clustering of applications with noise (DBSCAN)
Lane prediction on test image Lane detection in 3D
REAL-TIME LANE ESTIMATION USING DEEP
FEATURES AND EXTRA TREES REGRESSION
 A real-time lane estimation algorithm by adopting a learning framework using
the CNN and extra trees.
 By utilising the learning framework, it predicts the ego-lane location in the
given image even under conditions of lane marker occlusion or absence.
 This CNN is trained to extract robust features from the road images.
 While the extra trees regression model is trained to predict the ego-lane
location from the extracted road features.
 The extra trees are trained with I-O of road features and ego-lane points.
 The ego-lane image points correspond to Bezier spline control points used to
define the left and right lane markers of the ego- lane.
REAL-TIME LANE ESTIMATION USING DEEP
FEATURES AND EXTRA TREES REGRESSION
training
testing
REAL-TIME LANE ESTIMATION USING DEEP
FEATURES AND EXTRA TREES REGRESSION
 CNN is utilised to extract the features from the entire road image;
 To extract the road specific features, the weights (filters) and bias of the CNN model pre-
trained on the Places dataset is fine-tuned with a dataset of road images;
 To perform the fine-tuning for road feature extraction, formulate the multiclass Places-
CNN model as a binary road classifier;
 The extra trees are an extension of the random forest regression model, which belong to
the class of decision tree-based ensemble learning methods.
(a-b) curved, (c) shadowed, (d) shadowed and absent.
REAL-TIME LANE ESTIMATION USING DEEP
FEATURES AND EXTRA TREES REGRESSION
(e-f) partially occluded, (g) partially occluded, absent with different
colored road surface, (h) occluded and absent, (i-l) absent.
ACCURATE AND ROBUST LANE DETECTION BASED ON
DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
 A Dual-View Convolutional Neutral Network (DVCNN) framework for lane detection.
 DVCNN: the front-view image and the top-view one are optimized simultaneously.
 In the front-view image, exclude false detections including moving vehicles, barriers and
curbs, while in the top-view image non-club-shaped structures are removed such as
ground arrows and words.
 A weighted hat-like filter which not only recalls potential lane line candidates, but also
alleviates the disturbance of the gradual textures and reduces most false detections.
 A global optimization function is designed where the lane line probabilities, lengths,
widths, orientations and the amount are all taken into account.
ACCURATE AND ROBUST LANE DETECTION BASED ON
DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
Three steps of lane detection: lane line candidate extraction, DVCNN framework and global optimization.
ACCURATE AND ROBUST LANE DETECTION BASED ON
DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
DVCNN framework
ACCURATE AND ROBUST LANE DETECTION BASED ON
DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
DEEPLANES: E2E LANE POSITION ESTIMATION USING DEEP
NNS
 Positioning a vehicle btw lane boundaries is the core of a self-driving car.
 Approach to estimate lane positions directly using a deep neural network that
operates on images from laterally-mounted down-facing cameras.
 To create a diverse training set, generate semi-artificial images.
 Estimate the position of a lane marker with sub-cm accuracy at 100 frames/s on
an embedded automotive platform, requiring no pre- or post-processing.
The label ti ∈ [0, . . . , 316] for
image Xi corresponds to the row
with the pixel of the lane marking
that is closest to the bottom
border of the image
two
cameras
DEEPLANES: E2E LANE POSITION ESTIMATION USING DEEP
NNS
 Formulated as the classification task of estimating the lane position.
Using a real world bg, various
types of lane markings have
been artificially placed to
synthesize regular lane
markings (a, b) and varying
light conditions (c, d).
For a given image Xi , the deep NN
computes a softmax prob. output vector
Yi = (y0, . . . , y316), yk - row k in image Xi
for the position of the lane marking.
DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
 A multitask deep convolutional network, which simultaneously detects the
presence of the target and the geometric attributes (location and orientation) of
the target with respect to the region of interest.
 A recurrent neuron layer is adopted for structured visual detection. The recurrent
neurons can deal with the spatial distribution of visible cues belonging to an
object whose shape or structure is difficult to explicitly define.
 The multitask CNN provides auxiliary geometric information to help the
subsequent modeling of the given lane structures.
 The RNN automatically detects lane boundaries, including those areas containing
no marks, without any prior knowledge or secondary modeling.
DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
Diagram of lane recognition using deep
neural networks. The workflow of the two
frameworks for lane recognition. Both
frameworks employ CNNs for feature
extraction. Framework I predicts both the
presence of the targets and the relevant
geometric attributes. Framework II first
processes an input image as a sequence
of ROIs, and applies two steps of feature
extraction on each ROI: by CNNs and by
RNNs. The latter can automatically
recognize global structures over multiple
ROIs. Optionally, higher level models of
the lane structures can be constructed
based on the predictions.
DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
Structure-aware detector based on RNN. The initial steps
represent the input region using the convolution-max
pooling operations. The predictions are made from the layer
of recurrent neurons, which are computed from the
extracted features of the current instance and their own
current status. Since the status of the recurrent neurons
forms the system state, at any moment, the network takes
the previous observations into account to make the current
predictions.
Applying RNN detector on road surface image. The detection
process of lane boundaries on road surface. Each ROI is a
strip. The NN processes the strips in an image from left (near)
to right (far). For each ROI, multiple binary decisions are made,
corresponding to detecting the target (lane boundary) in small
patches within the ROI. Each ROI consists of a stack of such
small patches, and the red patches contain lane boundaries.
“C” - convolution maxpooling layer, “R” - recurrent hidden layer.
DEEP NEURAL NETWORK FOR STRUCTURAL PREDICTION
AND LANE DETECTION IN TRAFFIC SCENE
IPM image. Each pixel corresponds
to a 0.1×0.1 m2 ground area. An
IPM image can aggregate one or
multiple camera images to a unified
map of the road surface. IPM image
integrates 3 camera observations.
END-TO-END EGO LANE ESTIMATION BASED ON SEQUENTIAL
TRANSFER LEARNING FOR SELF-DRIVING CARS
 Autonomous cars establish driving strategies using the positions of ego lanes.
 A sequential end-to-end transfer learning method to estimate left and right ego
lanes directly and separately without any post-processing.
 Constructed an extensive dataset that is suitable for a deep neural network
training by collecting a variety of road conditions, annotating ego lanes, and
augmenting them systematically.
END-TO-END EGO LANE ESTIMATION BASED ON SEQUENTIAL
TRANSFER LEARNING FOR SELF-DRIVING CARS
END-TO-END EGO LANE ESTIMATION BASED ON SEQUENTIAL
TRANSFER LEARNING FOR SELF-DRIVING CARS
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
 Train a DNN for detecting lane markers in images without manually labeling
any images.
 To project HD maps for AD into the image and correct for misalignments due
to inaccuracies in localization and coordinate frame transformations.
 The corrections are performed by calculating the offset between features
within the map and detected ones in the images.
 By using detections in the image for refining the projections, labels is close to
pixel perfect.
 After a fast, visual quality check, the projected lane markers can be used for
training a fully convolutional network to segment lane markers in images.
 The network regularly detects lane markers at distances of ~ 150 meters on a
1M camera.
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
Automatically generated label (blue) using a HD
map for automated driving. Lanes are projected into
the image up to a distance of 200 meters.
The labeling pipeline consists of 3 steps:
1.) Coarse pose graph alignment
using only GPS and relative motion
constraints;
2.) Lane alignment by adding lane
marker constraints to the graph;
3.) Pixel-accurate refinement in
image space using re-projection
optimization per image starting from
the correspond. graph pose.
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
The graph pose vertices (blue) are connected by relative 6-DOF motion measurement
edges (thin solid black). Left: The graph state shows the result of an optimization with only
GPS measurement edges (yellow). The gray areas show the initial lane marker matches
between lane marker measurements (green) and lane marker map (thick solid black). Right:
The graph state after a some iterations with outlier lane marker matches removed based on
a decreasing distance threshold.
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
 To tightly align the graph to the road, add matches
of detected lane markers to all map lane markers
based on a matching range threshold;
 3D lane marker detections for alignment can be
computed with simple techniques, such as a top-
hat filter and a stereo camera setup;
 To extract line segments from these detections,
run a Douglas-Peucker polygonization and add
the resulting 3D line segments to the correspond.
pose vertices for matching.
 To achieve pixel-accurate labels, a reprojection
optimization with line segments in image space;
After graph alignment of the projected map lane
markers (blue) and the detected lane markers from a
simple detector (green). The perpendicular average
distance btw line segments is used as a matching
criterion for an optimization that solves for the pixel-
accurate corrected 6-DOF camera pose.
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
 It generates probability maps
over the image without losing info.
such as marker width;
 Based on the pixel- wise output,
possible to model the output
differently, e.g. using splines;
 There are no assumptions about
the number of lanes or type of
marker, e.g. solid or dashed;
 Lane marker detection as a
semantic segmentation problem
by employing FCNs;
DEEP LEARNING LANE MARKER SEGMENTATION FROM
AUTOMATICALLY GENERATED LABELS
Left: Lane markers detected in the image. Center: Correctly detected lane markers are shown in green, false
negatives in blue and false positives in red. Dashed lane markers are extended such that they end up being
completely detected after some distance. False positives are mainly found randomly, around cars, and at lane
markers that are not fully covered by the labels. Right: Number of misclassified pixels within each image line.
VPGNET: VANISHING POINT GUIDED NETWORK FOR LANE AND
ROAD MARKING DETECTION AND RECOGNITION
 A unified end-to-end trainable multi-task network that jointly handles lane and road
marking detection and recognition that is guided by a vanishing point under
adverse weather conditions.
 Images taken under rainy days are subject to low illumination, while wet roads
cause light reflection and distort the appearance of lane and road markings.
 At night, color distortion occurs under limited illumination.
 A lane and road marking benchmark which consists of about 20k images with 17
lane and road marking classes under 4 different scenarios: no rain, rain, heavy
rain, and night.
 VPGNet, can detect and classify lanes and road markings, and predict a vanishing
point with a single forward pass.
VPGNET: VANISHING POINT GUIDED NETWORK FOR LANE AND
ROAD MARKING DETECTION AND RECOGNITION
VPGNet performs four tasks: grid regression, object detection, multi-label classification, and vanishing point prediction.
SPATIAL AS DEEP: SPATIAL CNN FOR TRAFFIC SCENE
UNDERSTANDING
 Though CNN has strong capability to extract semantics
from raw pixels, it can capture spatial relationships of
pixels across rows and columns is not fully explored.
 To learn semantic objects with strong shape priors but
weak appearance coherences, such as traffic lanes,
often occluded or not even painted on the road surface.
 Spatial CNN (SCNN), generalizes traditional deep layer-
by-layer convolutions to slice-by- slice convolutions
within feature maps, thus enabling message passings
between pixels across rows and columns in a layer.
 Such SCNN is particular suitable for long continuous
shape structure or large objects, with strong spatial
relationship but less appearance clues, such as traffic
lanes, poles, and wall.
Comparison btw CNN and SCNN in (a) lane
detection and (b) semantic segmentation.
From left to right are: input image, output of
CNN, output of SCNN. It can be seen that
SCNN could better capture the long continuous
shape prior of lane markings and poles and fix
the disconnected parts in CNN.
SPATIAL AS DEEP: SPATIAL CNN FOR TRAFFIC SCENE
UNDERSTANDING
MRF/CRF based method
Spatial CNN
SPATIAL AS DEEP: SPATIAL CNN FOR TRAFFIC SCENE
UNDERSTANDING
Message passing directions in (a)
dense MRF/CRF and (b) Spatial
CNN (rightward). For (a), only
message passing to the inner 4
pixels shown.
DeepLa
b
Training model Lane prediction
SPATIAL AS DEEP: SPATIAL CNN FOR TRAFFIC SCENE
UNDERSTANDING
Comparison between probmaps of baseline, ReNet, MRFNet, ResNet-101, and SCNN.
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
 Approaches of lane detection leverage deep learning models, trained for pixel-wise lane
segmentation, even when no markings present in the image due to their big receptive field.
 However, these methods are limited to detecting a pre-defined, fixed number of lanes, e.g.
ego-lanes, and can not cope with lane changes.
 Cast the lane detection as an instance segmentation problem – in which each lane forms its
own instance – that can be trained end-to-end.
 To parametrize the segmented lane instances before fitting the lane, apply a learned
perspective transformation, conditioned on the image.
 Ensure a lane fitting which is robust against road plane changes, unlike existing
approaches that rely on a fixed, pre- defined transformation.
 LaneNet’s architecture is based on the encoder-decoder network ENet, which is
consequently modified into a two-branched network.
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
Given an input image, LaneNet outputs a
lane instance map, by labeling each lane
pixel with a lane id. Next, the lane pixels are
transformed using the transformation matrix,
outputted by H-Net which learns a
perspective transformation conditioned on
the input image. For each lane a 3rd order
polynomial is fitted and the lanes are
reprojected onto the image.
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
The segmentation branch is trained to produce a binary lane mask. The embedding branch generates an N-dim
embedding per lane pixel, so that embeddings from the same lane are close together and those from different
lanes are far in the manifold. After masking out the BG pixels using the binary segmentation map from the
segmentation branch, the lane embeddings are clustered together and assigned to their cluster centers.
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
Curve fitting. Left: The lane points are transformed using
the matrix H generated by H-Net. Mid: A line is fitted
through the transformed points and the curve is evaluated
at different heights (red). Right: The evaluated points are
transformed back to the original image space.
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
Comparison btw a fixed homography and
a conditional homography (using H-Net)
for lane fitting. The green dots can’t be
fitted correctly using a fixed homography
because of ground plane changes, which
can be resolved by using a conditional
homography using H-Net (last row).
TOWARDS END-TO-END LANE DETECTION: AN INSTANCE
SEGMENTATION APPROACH
Top: ground-truth lane points. Middle: LaneNet output. Bottom: final lane predicts after lane fitting.
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
 A deep neural network based method, named LaneNet, to break down the lane detection
into two stages: lane edge proposal and lane line localization.
 Stage one uses a lane edge proposal network for pixel-wise lane edge classification;
 The lane line localization network in stage two detects lane lines based on lane edge proposals.
 Some difficulties on suppressing the false detections on the similar lane marks on the
road like arrows and characters.
 No any assumptions on the lane number or the lane line patterns.
 The high running speed and low computational cost endow LaneNet the capability of
being deployed on vehicle-based systems.
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
Training the network to predict the point locations
where a lane intersects with the top, middle and the
bottom line of an image, respectively, then use the
average of the l2 distance between predicted key
values and the real key values of each lane as the
loss function and, minimize the distance using
stochastic gradient descent.
Train the network to minimize a combination of the
l2 loss and the min-distance loss
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
LANENET: REAL-TIME LANE DETECTION NETWORKS
FOR AUTONOMOUS DRIVING
Thanks

camera-based Lane detection by deep learning

  • 1.
    CAMERA-BASED ROAD LANEDETECTION BY DEEP LEARNING Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2.
    OUTLINE  An EmpiricalEvaluation of Deep Learning on Highway Driving  Real-Time Lane Estimation using Deep Features and Extra Trees Regression  Accurate and Robust Lane Detection based on Dual-View Convolutional Neutral Network  DeepLanes: E2E Lane Position Estimation using Deep NNs  Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene  End-to-End Ego Lane Estimation based on Sequential Transfer Learning for Self-Driving Cars  Deep Learning Lane Marker Segmentation From Automatically Generated Labels  VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition  Spatial as Deep: Spatial CNN for Traffic Scene Understanding  Towards End-to-End Lane Detection: an Instance Segmentation Approach  LaneNet: Real-Time Lane Detection Networks for Autonomous Driving
  • 3.
    AN EMPIRICAL EVALUATIONOF DEEP LEARNING ON HIGHWAY DRIVING Meditated Perception Lane and vehicle Detection mask detector
  • 4.
    AN EMPIRICAL EVALUATIONOF DEEP LEARNING ON HIGHWAY DRIVING Overfeat-mask Lane boundary ground truth output of lane detector after DBSCAN clustering Density-based spatial clustering of applications with noise (DBSCAN) Lane prediction on test image Lane detection in 3D
  • 5.
    REAL-TIME LANE ESTIMATIONUSING DEEP FEATURES AND EXTRA TREES REGRESSION  A real-time lane estimation algorithm by adopting a learning framework using the CNN and extra trees.  By utilising the learning framework, it predicts the ego-lane location in the given image even under conditions of lane marker occlusion or absence.  This CNN is trained to extract robust features from the road images.  While the extra trees regression model is trained to predict the ego-lane location from the extracted road features.  The extra trees are trained with I-O of road features and ego-lane points.  The ego-lane image points correspond to Bezier spline control points used to define the left and right lane markers of the ego- lane.
  • 6.
    REAL-TIME LANE ESTIMATIONUSING DEEP FEATURES AND EXTRA TREES REGRESSION training testing
  • 7.
    REAL-TIME LANE ESTIMATIONUSING DEEP FEATURES AND EXTRA TREES REGRESSION  CNN is utilised to extract the features from the entire road image;  To extract the road specific features, the weights (filters) and bias of the CNN model pre- trained on the Places dataset is fine-tuned with a dataset of road images;  To perform the fine-tuning for road feature extraction, formulate the multiclass Places- CNN model as a binary road classifier;  The extra trees are an extension of the random forest regression model, which belong to the class of decision tree-based ensemble learning methods. (a-b) curved, (c) shadowed, (d) shadowed and absent.
  • 8.
    REAL-TIME LANE ESTIMATIONUSING DEEP FEATURES AND EXTRA TREES REGRESSION (e-f) partially occluded, (g) partially occluded, absent with different colored road surface, (h) occluded and absent, (i-l) absent.
  • 9.
    ACCURATE AND ROBUSTLANE DETECTION BASED ON DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK  A Dual-View Convolutional Neutral Network (DVCNN) framework for lane detection.  DVCNN: the front-view image and the top-view one are optimized simultaneously.  In the front-view image, exclude false detections including moving vehicles, barriers and curbs, while in the top-view image non-club-shaped structures are removed such as ground arrows and words.  A weighted hat-like filter which not only recalls potential lane line candidates, but also alleviates the disturbance of the gradual textures and reduces most false detections.  A global optimization function is designed where the lane line probabilities, lengths, widths, orientations and the amount are all taken into account.
  • 10.
    ACCURATE AND ROBUSTLANE DETECTION BASED ON DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK Three steps of lane detection: lane line candidate extraction, DVCNN framework and global optimization.
  • 11.
    ACCURATE AND ROBUSTLANE DETECTION BASED ON DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK DVCNN framework
  • 12.
    ACCURATE AND ROBUSTLANE DETECTION BASED ON DUAL-VIEW CONVOLUTIONAL NEUTRAL NETWORK
  • 13.
    DEEPLANES: E2E LANEPOSITION ESTIMATION USING DEEP NNS  Positioning a vehicle btw lane boundaries is the core of a self-driving car.  Approach to estimate lane positions directly using a deep neural network that operates on images from laterally-mounted down-facing cameras.  To create a diverse training set, generate semi-artificial images.  Estimate the position of a lane marker with sub-cm accuracy at 100 frames/s on an embedded automotive platform, requiring no pre- or post-processing. The label ti ∈ [0, . . . , 316] for image Xi corresponds to the row with the pixel of the lane marking that is closest to the bottom border of the image two cameras
  • 14.
    DEEPLANES: E2E LANEPOSITION ESTIMATION USING DEEP NNS  Formulated as the classification task of estimating the lane position. Using a real world bg, various types of lane markings have been artificially placed to synthesize regular lane markings (a, b) and varying light conditions (c, d). For a given image Xi , the deep NN computes a softmax prob. output vector Yi = (y0, . . . , y316), yk - row k in image Xi for the position of the lane marking.
  • 15.
    DEEP NEURAL NETWORKFOR STRUCTURAL PREDICTION AND LANE DETECTION IN TRAFFIC SCENE  A multitask deep convolutional network, which simultaneously detects the presence of the target and the geometric attributes (location and orientation) of the target with respect to the region of interest.  A recurrent neuron layer is adopted for structured visual detection. The recurrent neurons can deal with the spatial distribution of visible cues belonging to an object whose shape or structure is difficult to explicitly define.  The multitask CNN provides auxiliary geometric information to help the subsequent modeling of the given lane structures.  The RNN automatically detects lane boundaries, including those areas containing no marks, without any prior knowledge or secondary modeling.
  • 16.
    DEEP NEURAL NETWORKFOR STRUCTURAL PREDICTION AND LANE DETECTION IN TRAFFIC SCENE Diagram of lane recognition using deep neural networks. The workflow of the two frameworks for lane recognition. Both frameworks employ CNNs for feature extraction. Framework I predicts both the presence of the targets and the relevant geometric attributes. Framework II first processes an input image as a sequence of ROIs, and applies two steps of feature extraction on each ROI: by CNNs and by RNNs. The latter can automatically recognize global structures over multiple ROIs. Optionally, higher level models of the lane structures can be constructed based on the predictions.
  • 17.
    DEEP NEURAL NETWORKFOR STRUCTURAL PREDICTION AND LANE DETECTION IN TRAFFIC SCENE
  • 18.
    DEEP NEURAL NETWORKFOR STRUCTURAL PREDICTION AND LANE DETECTION IN TRAFFIC SCENE Structure-aware detector based on RNN. The initial steps represent the input region using the convolution-max pooling operations. The predictions are made from the layer of recurrent neurons, which are computed from the extracted features of the current instance and their own current status. Since the status of the recurrent neurons forms the system state, at any moment, the network takes the previous observations into account to make the current predictions. Applying RNN detector on road surface image. The detection process of lane boundaries on road surface. Each ROI is a strip. The NN processes the strips in an image from left (near) to right (far). For each ROI, multiple binary decisions are made, corresponding to detecting the target (lane boundary) in small patches within the ROI. Each ROI consists of a stack of such small patches, and the red patches contain lane boundaries. “C” - convolution maxpooling layer, “R” - recurrent hidden layer.
  • 19.
    DEEP NEURAL NETWORKFOR STRUCTURAL PREDICTION AND LANE DETECTION IN TRAFFIC SCENE IPM image. Each pixel corresponds to a 0.1×0.1 m2 ground area. An IPM image can aggregate one or multiple camera images to a unified map of the road surface. IPM image integrates 3 camera observations.
  • 20.
    END-TO-END EGO LANEESTIMATION BASED ON SEQUENTIAL TRANSFER LEARNING FOR SELF-DRIVING CARS  Autonomous cars establish driving strategies using the positions of ego lanes.  A sequential end-to-end transfer learning method to estimate left and right ego lanes directly and separately without any post-processing.  Constructed an extensive dataset that is suitable for a deep neural network training by collecting a variety of road conditions, annotating ego lanes, and augmenting them systematically.
  • 21.
    END-TO-END EGO LANEESTIMATION BASED ON SEQUENTIAL TRANSFER LEARNING FOR SELF-DRIVING CARS
  • 22.
    END-TO-END EGO LANEESTIMATION BASED ON SEQUENTIAL TRANSFER LEARNING FOR SELF-DRIVING CARS
  • 23.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS  Train a DNN for detecting lane markers in images without manually labeling any images.  To project HD maps for AD into the image and correct for misalignments due to inaccuracies in localization and coordinate frame transformations.  The corrections are performed by calculating the offset between features within the map and detected ones in the images.  By using detections in the image for refining the projections, labels is close to pixel perfect.  After a fast, visual quality check, the projected lane markers can be used for training a fully convolutional network to segment lane markers in images.  The network regularly detects lane markers at distances of ~ 150 meters on a 1M camera.
  • 24.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS Automatically generated label (blue) using a HD map for automated driving. Lanes are projected into the image up to a distance of 200 meters. The labeling pipeline consists of 3 steps: 1.) Coarse pose graph alignment using only GPS and relative motion constraints; 2.) Lane alignment by adding lane marker constraints to the graph; 3.) Pixel-accurate refinement in image space using re-projection optimization per image starting from the correspond. graph pose.
  • 25.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS The graph pose vertices (blue) are connected by relative 6-DOF motion measurement edges (thin solid black). Left: The graph state shows the result of an optimization with only GPS measurement edges (yellow). The gray areas show the initial lane marker matches between lane marker measurements (green) and lane marker map (thick solid black). Right: The graph state after a some iterations with outlier lane marker matches removed based on a decreasing distance threshold.
  • 26.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS  To tightly align the graph to the road, add matches of detected lane markers to all map lane markers based on a matching range threshold;  3D lane marker detections for alignment can be computed with simple techniques, such as a top- hat filter and a stereo camera setup;  To extract line segments from these detections, run a Douglas-Peucker polygonization and add the resulting 3D line segments to the correspond. pose vertices for matching.  To achieve pixel-accurate labels, a reprojection optimization with line segments in image space; After graph alignment of the projected map lane markers (blue) and the detected lane markers from a simple detector (green). The perpendicular average distance btw line segments is used as a matching criterion for an optimization that solves for the pixel- accurate corrected 6-DOF camera pose.
  • 27.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS  It generates probability maps over the image without losing info. such as marker width;  Based on the pixel- wise output, possible to model the output differently, e.g. using splines;  There are no assumptions about the number of lanes or type of marker, e.g. solid or dashed;  Lane marker detection as a semantic segmentation problem by employing FCNs;
  • 28.
    DEEP LEARNING LANEMARKER SEGMENTATION FROM AUTOMATICALLY GENERATED LABELS Left: Lane markers detected in the image. Center: Correctly detected lane markers are shown in green, false negatives in blue and false positives in red. Dashed lane markers are extended such that they end up being completely detected after some distance. False positives are mainly found randomly, around cars, and at lane markers that are not fully covered by the labels. Right: Number of misclassified pixels within each image line.
  • 29.
    VPGNET: VANISHING POINTGUIDED NETWORK FOR LANE AND ROAD MARKING DETECTION AND RECOGNITION  A unified end-to-end trainable multi-task network that jointly handles lane and road marking detection and recognition that is guided by a vanishing point under adverse weather conditions.  Images taken under rainy days are subject to low illumination, while wet roads cause light reflection and distort the appearance of lane and road markings.  At night, color distortion occurs under limited illumination.  A lane and road marking benchmark which consists of about 20k images with 17 lane and road marking classes under 4 different scenarios: no rain, rain, heavy rain, and night.  VPGNet, can detect and classify lanes and road markings, and predict a vanishing point with a single forward pass.
  • 30.
    VPGNET: VANISHING POINTGUIDED NETWORK FOR LANE AND ROAD MARKING DETECTION AND RECOGNITION VPGNet performs four tasks: grid regression, object detection, multi-label classification, and vanishing point prediction.
  • 31.
    SPATIAL AS DEEP:SPATIAL CNN FOR TRAFFIC SCENE UNDERSTANDING  Though CNN has strong capability to extract semantics from raw pixels, it can capture spatial relationships of pixels across rows and columns is not fully explored.  To learn semantic objects with strong shape priors but weak appearance coherences, such as traffic lanes, often occluded or not even painted on the road surface.  Spatial CNN (SCNN), generalizes traditional deep layer- by-layer convolutions to slice-by- slice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer.  Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as traffic lanes, poles, and wall. Comparison btw CNN and SCNN in (a) lane detection and (b) semantic segmentation. From left to right are: input image, output of CNN, output of SCNN. It can be seen that SCNN could better capture the long continuous shape prior of lane markings and poles and fix the disconnected parts in CNN.
  • 32.
    SPATIAL AS DEEP:SPATIAL CNN FOR TRAFFIC SCENE UNDERSTANDING MRF/CRF based method Spatial CNN
  • 33.
    SPATIAL AS DEEP:SPATIAL CNN FOR TRAFFIC SCENE UNDERSTANDING Message passing directions in (a) dense MRF/CRF and (b) Spatial CNN (rightward). For (a), only message passing to the inner 4 pixels shown. DeepLa b Training model Lane prediction
  • 34.
    SPATIAL AS DEEP:SPATIAL CNN FOR TRAFFIC SCENE UNDERSTANDING Comparison between probmaps of baseline, ReNet, MRFNet, ResNet-101, and SCNN.
  • 35.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH  Approaches of lane detection leverage deep learning models, trained for pixel-wise lane segmentation, even when no markings present in the image due to their big receptive field.  However, these methods are limited to detecting a pre-defined, fixed number of lanes, e.g. ego-lanes, and can not cope with lane changes.  Cast the lane detection as an instance segmentation problem – in which each lane forms its own instance – that can be trained end-to-end.  To parametrize the segmented lane instances before fitting the lane, apply a learned perspective transformation, conditioned on the image.  Ensure a lane fitting which is robust against road plane changes, unlike existing approaches that rely on a fixed, pre- defined transformation.  LaneNet’s architecture is based on the encoder-decoder network ENet, which is consequently modified into a two-branched network.
  • 36.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH Given an input image, LaneNet outputs a lane instance map, by labeling each lane pixel with a lane id. Next, the lane pixels are transformed using the transformation matrix, outputted by H-Net which learns a perspective transformation conditioned on the input image. For each lane a 3rd order polynomial is fitted and the lanes are reprojected onto the image.
  • 37.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH The segmentation branch is trained to produce a binary lane mask. The embedding branch generates an N-dim embedding per lane pixel, so that embeddings from the same lane are close together and those from different lanes are far in the manifold. After masking out the BG pixels using the binary segmentation map from the segmentation branch, the lane embeddings are clustered together and assigned to their cluster centers.
  • 38.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH Curve fitting. Left: The lane points are transformed using the matrix H generated by H-Net. Mid: A line is fitted through the transformed points and the curve is evaluated at different heights (red). Right: The evaluated points are transformed back to the original image space.
  • 39.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH Comparison btw a fixed homography and a conditional homography (using H-Net) for lane fitting. The green dots can’t be fitted correctly using a fixed homography because of ground plane changes, which can be resolved by using a conditional homography using H-Net (last row).
  • 40.
    TOWARDS END-TO-END LANEDETECTION: AN INSTANCE SEGMENTATION APPROACH Top: ground-truth lane points. Middle: LaneNet output. Bottom: final lane predicts after lane fitting.
  • 41.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING  A deep neural network based method, named LaneNet, to break down the lane detection into two stages: lane edge proposal and lane line localization.  Stage one uses a lane edge proposal network for pixel-wise lane edge classification;  The lane line localization network in stage two detects lane lines based on lane edge proposals.  Some difficulties on suppressing the false detections on the similar lane marks on the road like arrows and characters.  No any assumptions on the lane number or the lane line patterns.  The high running speed and low computational cost endow LaneNet the capability of being deployed on vehicle-based systems.
  • 42.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING
  • 43.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING
  • 44.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING Training the network to predict the point locations where a lane intersects with the top, middle and the bottom line of an image, respectively, then use the average of the l2 distance between predicted key values and the real key values of each lane as the loss function and, minimize the distance using stochastic gradient descent. Train the network to minimize a combination of the l2 loss and the min-distance loss
  • 45.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING
  • 46.
    LANENET: REAL-TIME LANEDETECTION NETWORKS FOR AUTONOMOUS DRIVING
  • 47.