Towards Light-weight and Real-time Line Segment Detection

Towards Light-weight and Real-time Line Segment Detection
Geonmo Gu*, Byungsoo Ko*, SeoungHyun Go
Sung-Hyun Lee, Jingeun Lee, Minchul Shin
Naver Search Vision
* Equal contribution

Introduction
1. Line Segment Detection?
Line Segment Detection
Purpose: Detect line segments in the given images
Line Segment Detector
(LSD)

Introduction
1. Line Segment Detection?
Applications of LSD
Simultaneous Localization and Mapping (SLAM)1 Relative Pose Estimation2
1 Monocular-vision based SLAM using Line Segments, ICRA07
2 Line-Based Relative Pose Estimation, CVPR11

Introduction
2. Real-time inference is limited
a) Existing methods exploit heavy backbone networks
Backbone
Input Image

Introduction
Backbone
Input Image
• Dilated ResNet50-based FPN
• Stacked hourglass network
• Atrous residual U-net
Existing Methods

Introduction
Backbone
Input Image
• Dilated ResNet50-based FPN
• Stacked hourglass network
• Atrous residual U-net
Existing Methods Ours
• Modified MobileNetV2

Introduction
b) Existing methods include multi-module line prediction process
Backbone
Input Image

Introduction
Backbone
Feature
Maps
Input Image
Line
Segments

Introduction
Multi-module Processing
Attraction Field Maps
Top-Down
(TD) Strategy
Backbone
Feature
Maps
Input Image
Line
Segments
Squeeze Module
Compute
Line Proposal
Search for
Candidates
Verify
Candidates

Introduction
Junction Heatmap
Line
Sampler
Line Proposal
Line
Verification
Bottom-Up
(BU) Strategy
Top-Down
(TD) Strategy
Backbone
Feature
Maps
Input Image
Line
Segments
LoI
Pooling
Squeeze Module
Compute
Line Proposal
Search for
Candidates
Verify
Candidates

Introduction
Junction Heatmap
Line
Sampler
Line Proposal
Line
Verification
Bottom-Up
(BU) Strategy
Top-Down
(TD) Strategy
Backbone
Feature
Maps
Input Image
Line
Segments
LoI
Pooling
Line Map Center Map
Mixture
of Conv.
Mixture
of Conv.
Line
Generation
Displacement Map
Tri-Point
(TP) Strategy
Point Filter
Module
Line
Segmentation
Squeeze Module
Compute
Line Proposal
Search for
Candidates
Verify
Candidates

Introduction
Junction Heatmap
Line
Sampler
Line Proposal
Line
Verification
Bottom-Up
(BU) Strategy
Top-Down
(TD) Strategy
Single Module Processing
Ours Line
Generation
Backbone
Feature
Maps
Input Image
Line
Segments
LoI
Pooling
Line Map Center Map
Mixture
of Conv.
Mixture
of Conv.
Line
Generation
Displacement Map
Tri-Point
(TP) Strategy
Point Filter
Module
Line
Segmentation
Squeeze Module
Compute
Line Proposal
Search for
Candidates
Verify
Candidates
Center / Displacement Map

Introduction
Inference speed comparison
Inference speed of backbone and prediction process on GPU are significantly improved

Introduction
3. Mobile-LSD
Motivation
Existing LSDs are limited in real-time inference, especially on mobile devices.
• Exploit heavy backbone networks
• Include multi-module line prediction process

Introduction
3. Mobile-LSD
Motivation
We design an efficient LSD for resource-constrained environments: Mobile LSD (M-LSD)
• Minimize the backbone network and adopt single module of line prediction process
• Present novel training schemes: Segments of Line segment (SoL) and geometric learning schemes
In this paper
Existing LSDs are limited in real-time inference, especially on mobile devices.
• Exploit heavy backbone networks
• Include multi-module line prediction process

Proposed Method
1. Light-weight Backbone
M-LSD-tiny

M-LSD-tiny
Feature Extractor
Block: 1 11 12 16
Proposed Method

Feature Extractor
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
Block: 1 11
Block type A (12, 14)
12 16
M-LSD-tiny
Proposed Method

Feature Extractor
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 11
Block type A (12, 14) Block type B (13, 15)
12 16
M-LSD-tiny
Proposed Method

Feature Extractor
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 11
3x3 Conv
Dilated rate=5
Block type C (16)
12 16
1x1 Conv
3x3 Conv
M-LSD-tiny
Proposed Method

Feature Extractor
Upscale
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 11
3x3 Conv
Dilated rate=5
Block type C (16)
12 16
1x1 Conv
3x3 Conv
M-LSD-tiny
Proposed Method

Feature Extractor
Upscale
Final Feature Maps
(H/2 x W/2 x 16)
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 11
3x3 Conv
Dilated rate=5
Block type C (16)
12 16
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
1x1 Conv
3x3 Conv
M-LSD-tiny
Proposed Method

Feature Extractor
Upscale
Final Feature Maps
(H/2 x W/2 x 16)
1x1 Conv
1x1 Conv
Upscale
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 11
3x3 Conv
Dilated rate=5
Block type C (16)
12 16
Line Segments
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Line
Generation
1x1 Conv
3x3 Conv
M-LSD-tiny
Proposed Method

Feature Extractor Final Feature Maps
(H/2 x W/2 x 16)
1x1 Conv
1x1 Conv
Upscale†
C
Skip
connection
3x3 Conv
3x3 Conv
+
Block: 1 14
Block type A (15,17,19,21) Block type B (16,18,20,22)
3x3 Conv
Dilated rate=5
Block type C (23)
15 23
Line Segments
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Line
Generation
1x1 Conv
3x3 Conv
† denotes that block 15 skips upscale operation.
M-LSD
Proposed Method

Proposed Method
2. Line Segment Representation
Tri-Point (TP) representation
𝑙𝑙𝑠𝑠
𝑑𝑑𝑠𝑠
𝑑𝑑𝑒𝑒
𝑙𝑙𝑐𝑐
𝑙𝑙𝑒𝑒
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Notation: 𝑥𝑥𝑙𝑙𝑠𝑠
, 𝑦𝑦𝑙𝑙𝑠𝑠
denotes the 𝛼𝛼 point, 𝑑𝑑𝑠𝑠(𝑥𝑥𝑙𝑙𝑐𝑐
, 𝑦𝑦𝑙𝑙𝑐𝑐
) and 𝑑𝑑𝑒𝑒(𝑥𝑥𝑙𝑙𝑐𝑐
, 𝑦𝑦𝑙𝑙𝑐𝑐
)
indicate 2D displacements from the center point 𝑙𝑙𝑐𝑐 to the
corresponding start 𝑙𝑙𝑠𝑠 and end 𝑙𝑙𝑒𝑒 points.

Proposed Method
𝑙𝑙𝑠𝑠
𝑑𝑑𝑠𝑠
𝑑𝑑𝑒𝑒
𝑙𝑙𝑐𝑐
𝑙𝑙𝑒𝑒
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
• For the center loss, we use pos / neg separated binary
classification loss.
• For the displacement loss, we use smooth L1 loss for
regression learning.

Proposed Method
𝑙𝑙𝑠𝑠
𝑑𝑑𝑠𝑠
𝑑𝑑𝑒𝑒
𝑙𝑙𝑐𝑐
𝑙𝑙𝑒𝑒
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
TP representation can be insufficient in cases where,
• A line segment may be too long to manage within the receptive
field size.
• The center points of two distinct line segments are too close to
each other.

Proposed Method
3. SoL Augmentation
Segments of Line segment (SoL) augmentation
SoL Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
𝑙𝑙𝑠𝑠
𝑙𝑙𝑒𝑒
𝑙𝑙1
𝑙𝑙0
𝑙𝑙2
SoL augmentation directly increases the number and the size of
line segments.

Proposed Method
3. SoL Augmentation
Segments of Line segment (SoL) augmentation
SoL Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
𝑙𝑙𝑠𝑠
𝑙𝑙𝑒𝑒
𝑙𝑙1
𝑙𝑙0
𝑙𝑙2
1. We compute 𝑘𝑘 internally dividing points and separate the line
segments into subparts with overlapping portions.
2. Each subpart is trained as if it is a typical line segment.

Proposed Method
4. Learning with Geometric Information
Matching loss
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
1. Take the endpoints of each prediction and measure the
Euclidean distance 𝑑𝑑() to the endpoints of the GT.
𝑙𝑙𝑠𝑠
𝑙𝑙𝑒𝑒
̂
𝑙𝑙𝑠𝑠
̂
𝑙𝑙𝑒𝑒
𝐶𝐶(̂
𝑙𝑙)
ℒ𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚

Proposed Method
Matching loss
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
2. These distances are used to match predicted line segments
with GT line segments that are under a threshold 𝛾𝛾:
𝑙𝑙𝑠𝑠
𝑙𝑙𝑒𝑒
̂
𝑙𝑙𝑠𝑠
̂
𝑙𝑙𝑒𝑒
𝐶𝐶(̂
𝑙𝑙)

Proposed Method
Matching loss
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
3. Compute the matching loss, which aims to minimize
geometric distance of the matched line segments:
𝑙𝑙𝑠𝑠
𝑙𝑙𝑒𝑒
̂
𝑙𝑙𝑠𝑠
̂
𝑙𝑙𝑒𝑒
𝐶𝐶(̂
𝑙𝑙)

Junction and Line segmentation
ℒ𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗
ℒ𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1
Center point and displacement vectors are highly related to pixel-
wise junctions and line segments in the segmentation maps.
Proposed Method

Junction and Line segmentation
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1
For the junction and line losses, we use pos / neg separated binary
classification loss.
Proposed Method

Length and Degree regression
As displacement vectors can be derived from the length and
degree of line segments, they can be additional geometric cues.
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Proposed Method

Length and Degree regression
For the length and degree losses, we use smooth L1 loss
for regression loss
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Proposed Method

Proposed Method
5. Final Loss Functions
Loss for each map
Final Loss Function
SoL Maps
(H/2 x W/2 x 7)
TP Maps
(H/2 x W/2 x 7)
Displacement
map x4
Center map x1
Length map x1
Degree map x1
Segmentation Maps
(H/2 x W/2 x 2)
Junction map x1
Line map x1

Experiments
1. Experimental Setting
Dataset
• Wireframe: 5,000 training images, 462 test images
• YorkUrban: 102 test images
Evaluation Metrics
• Heatmap-based metric FH
• Structural average precision (sAP)
• Line matching average precision (LAP)
Optimization
• Tesla V100 GPU
• Input size: 320 or 512
• Used ImageNet pretrained MobileNetV2

Experiments
2. Ablation Study and Interpretability
Baseline and Augmentation
Performance

Experiments
Baseline and Augmentation
Performance Input augmentations
• Horizontal / vertical flips
• Shearing
• Rotation
• Scaling

Experiments
Matching Loss
Saliency map
Performance
Saliency maps generated from TP center map

Experiments
Line & Junction segmentation
Saliency map
Performance
Saliency maps generated from each feature map

Experiments
Length & Degree regression
Saliency map
Performance
Saliency maps generated from each feature map

Experiments
SoL augmentation
Saliency map
Performance
Saliency maps generated from TP center map

Experiments
3. Comparison with Other Methods
Quantitative comparisons with existing LSD methods

Qualitative evaluation
(a) (b) (c) (d)
Ground
truth
M-LSD-tiny-320
M-LSD-512
Experiments
4. Visualization

Inference speed and memory usage on mobile devices
Experiments
5. Deployment on Mobile Devices
We use iPhone (A14 Bionic chipset) and Android phone (Snapdragon 865 chipset), where FP denotes floating point.

Real-time box detection on a mobile device
(a) Input image (b) Line detection (c) Box candidates (d) Box detection
Experiments
6. Applications
The potential of real-time LSD can further to: book scanners,
wireframe to image translation, SLAM, and pose estimation

Experiments
6. Applications
Real-time box detection on a mobile device
The potential of real-time LSD can further to: book scanners,
wireframe to image translation, SLAM, and pose estimation

Conclusion
We design an efficient LSD for resource-constrained environments: Mobile LSD (M-LSD)
• Minimize the backbone network and adopt single module of line prediction process
• Present novel training schemes: Segments of Line segment (SoL) and geometric learning schemes
In this paper

Conclusion
• Compared to TP-LSD-Lite, M-LSD-tiny achieves competitive performance with 2.5% of model size
and an increase of 130.5% in inference speed on GPU.
• Our model runs at 56.8 and 48.6 FPS on Android and iPhone, which is the first real-time method
available on mobile devices.
We achieve

Supplementary
- Metrics
Precision and Recall of Line Heat Maps (FH)
1. Given a vectorized representation (line), it generates a confidence heat map.
2. It compares with the ground truth heat map by bipartite matching that treats each pixel independently
as a graph node is ran to match between two heat maps.
3. Then, precision and recall curve is computed according to the matching and confidence of each pixel.

Supplementary
- Metrics
Structural Average Precision (sAP)
• Defined on vectorized wireframes rather than on a heat map.
• Recall is the proportion of the correctly detected line segments (up to a cutoff score) to all the ground
truth line segments.
• Precision is the proportion of the correctly detected line segments above that cutoff to all the detected
line segments.

Supplementary
- Metrics
Line Matching Average Precision (LAP)
• Line Matching Score (LMS)
• Score_theta: the differences in angle and position
• Score_l: the matching degree in length
• LMS = Score_theta X Score_l
• Using LMS to determine true positive, i.e., a detected line segment is considered to be true positive if
LMS > 0.5, we can calculate the LAP on the entire test set
• LAP is defined as the area under the precision recall curve.

Supplementary
- Details of M-LSD
Architecture

Supplementary
- Details of M-LSD
Final feature maps

Supplementary
- Extended Experiments
Ablation Study of Architecture

Supplementary
Needs of Offset Maps

Supplementary
Impact of SoL Augmentation

Supplementary
Threshold of Matching Loss

Supplementary
Precision and Recall Curve

Towards Light-weight and Real-time Line Segment Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Towards Light-weight and Real-time Line Segment Detection

Similar to Towards Light-weight and Real-time Line Segment Detection (20)

Recently uploaded

Recently uploaded (20)

Towards Light-weight and Real-time Line Segment Detection