ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION

ANALYSIS OF INSTANCE SEGMENTATION
APPROACH FOR LANE DETECTION
NAME: RAJAT ROY
STUDENT ID: 975107
MSC DATA SCIENCE

Background
• The automobile industry is producing with new vehicles
which come with the latest technologies equipped
• New cars come with inbuilt GPS, rear parking sensors
and blind spot detection and cameras which can give a
complete 360 view around the car
• One of the features is the Advanced Driving Assistance
System and Automatic Lane Keeping Systems which
help the driver take better decision while driving and
also, autonomous vehicles rely on these systems to
cruise through traffic
• Deep learning plays a vital role in designing these
systems because it has proven to be of great use in
many areas related to image classification, image
segmentation, image synthesis etc

Aim & Objectives
• Use deep learning method to develop a model for efficient lane detection
• To explore different hyper-parameters, activation functions and optimizers in the suggested hybrid
model
• To evaluate the performance of suggested model by comparing it with advanced lane detection
algorithms

Significance of the Study
• Autonomous vehicles are the future of automobile industry
• The AI models are the brain behind these high-tech systems should be generalizable enough to
understand human driving behaviors as well in the surrounding environment
• The proposed model performs an accurate and efficient lane detection tasks which can be used in a
wide range of Advanced Driving Assistance Systems (ADAS) features

Introduction
• Lane detection is a challenging task in computer vision
• Models need to be accurate as well as efficient enough to have a low runtime to give out best results
at a lowest time possible
• Discussion of the usage of regression and clustering methods and the end-to-end deep learning
method implemented for lane detection
• The new techniques in deep learning for general computer vision tasks and its improvement over time
with self-attention mechanism embedded into encoder-decoder networks

Regression & Clustering based approach
• Many solutions use an ensemble of CNN
with a polynomial curve fitting model or a
clustering based model
• CNN model helps extract the features from
an image using segmentation which is given
to the post-process block
• The post-process block performs regression
using a higher degree polynomial equation
or a clustering model is used like SVM or
Mean-shift algorithm

End-to-End Deep Learning Methods
• Earlier research based on deep learning for
lane detection utilized the concept of
sequential message passing with CNN
architecture. One of the highly used
methodology is a spatial CNN for traffic
scene detection
• Lane detection can also be translated into an
instance segmentation task
• Encoder performs down-sampling of the
image and Decoder performs up-sampling

Data Description
• TuSimple dataset contains 6408 images of vehicles driving on the highways in the USA under different
weather, day-time and traffic conditions
• The images are captured at a rate of 20 FPS at a resolution of 1280 x 720. There are multiple lanes
present in the images. Annotated polyline for lane markings is available in json format
• Total of 3628 images are used for training and 200 images for testing the model performance.

Proposed Method – Part 1
• The proposed solution adopts the FoloLane
model architecture. The backbone of the
encoder network is ERFNet and BiSeNet
architecture
• FoloLane architecture is based on key-point
estimation technique
• Embedding self-attention into existing
network can improve model performance

Proposed Method – Part 2
• As a contingency plan, an instance
segmentation approach can be applied to
solve the lane detection problem
• Many research papers mention the use of
instance segmentation in object detection
and uses ResNet as a base network
• Following similar approach, the alternate
model can apply an ERFNet model for
performing the instance segmentation task

Implementation Details
• Data Preparation – Polylines of lanes points
are taken from labels json file along with the
image frame. Coordinates array is
converted to image using matplotlib and
stored in a separate labels folder
• Training Environment – The code is
implemented using TensorFlow v2.1 in
python. Used Google Colab Pro
environment with 4 vCPUs, 25 GB RAM and a
single Tesla P100 GPU with 16GB memory
Input Image Label Image

Limitations of FoloLane-SA model
• The task of the decoder network in the FoloLane model is to predict the geometry of local curve using
geometry construction technique
• Proposed method consisted of adding a self-attention mechanism to the existing architecture for
better performance and improve the information for geometry construction
• Due to the complexity of the architecture it is observed that the model is underfitting due the low
batch size and keeping a higher batch size leads to an Out of Memory error due hardware limitation
• The new approach implements instance segmentation with an ERFNet model which has also been
used as a base network for training the FoloLane model

Training Details
• Model training is performed using different batch sizes of 8 and 16
• Different image sizes are also given as an input to the training script for experimentation and
evaluation of the model based on different image width and height of 320 x 240 and 640 x 480
• Number of epochs for model training are 100 and 300
• Adam optimizer is selected for the loss optimization with learning rate set to 5e-4 and with decay of
2e-4 . Loss function used is categorical crossentropy
• Training loss and m-IoU is calculated and logged at the of each epoch using tensorboard

Training Loss & m-IoU
Epoch vs Train Loss graph of best ERFNet model Epoch vs Train m-IoU graph of best ERFNet model
Epoch vs Validation m-IoU graph of best ERFNet model

Model Performance vs Existing Models
SNO.
IMG_WIDTH IMG_HEIGHT BATCH_SIZE EPOCHS
M-IOU%
(Train)
M-IOU%
(Val)
Accuracy%
(Train)
Accuracy%
(Val)
1 640 480 16 100 25.83 17.63 95.52 93.21
2 640 480 8 100 26.17 17.9 95.79 93.81
3 320 240 16 100 27.76 21.77 96.14 93.48
4 320 240 8 100 25.58 19.29 95.31 92.41
5 320 240 16 300 37.29 22.76 97.59 94.62

Visualization of Predicted Samples
First Epoch Last Epoch

Conclusion
• The research contributes a deep-learning based lane detection model for predicting the lane points in
a given image
• The dataset used for training and testing the model is the TuSimple benchmark dataset which consists
of vehicles driving on highway roads in the USA.
• Total of 3628 images are used for training and 200 images for testing the model performance.
• FoloLane with self-attention could not be trained due the complexity of the architecture and hardware
limitation. Proper contingency plan has been applied and instance segmentation approach is selected
as an alternate model.
• Final ERFNet model gave an accuracy of 97.59% in training set and 94.62% in validation set. This
performance is at-par compared to existing state-of-the-art deep-learning models.

Future Works
• Make the FoloLane with self-attention architecture simpler and train the model with a multi-gpu
environment
• Testing the generalizability of the model with other benchmark datasets like CULane and BDD100K
• Retrain the ERFNet model with higher batch size and number of epochs. Also, try different pre-
processing techniques like converting the images from RGB to grayscale and RGB to HSV.

ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION

Recommended

Recommended

More Related Content

Similar to ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION

Similar to ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION (20)

Recently uploaded

Recently uploaded (20)

ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION

Editor's Notes