[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations (CVPR2021)

BoxInst: High-Performance Instance
Segmentation with Box Annotations
Zhi Tian et al (CVPR 2021)
Choi Dongmin

Abstract
• A high-performance mask-level instance segmentation 
- with only bounding-box annotations for training 
- dramatic improvement; AP 21.1% → 31.6 on COCO 
- redesign the loss of learning masks 
- can supervise the mask training without relying on mask annotations 
- narrows the performance gap between weakly and fully supervision

Introduction
• Instance Segmentation
https://ai-pool.com/d/could-you-explain-me-how-instance-segmentation-works

• Weakly-Supervised Instance Segmentation (WSIS) 
- Image-level annotation (e.g., CAM) 
- Bounding-box annotation
Introduction
Chang et al. Weakly-Supervised Semantic Segmentation via Sub-category Exploration. CVPR 2020

• BBTB: Previous S.O.T.A of WSIS 
- still low performance (21.1 AP on COCO benchmark)
Introduction
Hsu et al. Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior. NIPS 2019

• BoxInst 
- only with bounding-box annotation 
- COCO AP: 21.1 (previous S.O.T.A BBTA) → 31.6 
- with aug. and 3x scheduling, achieved 32.1 AP
Introduction
NOTE: Mask R-CNN 
achieved 37.5 AP

• FCOS 
- anchor-free one-stage object detector 
- FPN + Detection head
Related Work
Tian et al. FCOS: Fully Convolutional One-Stage Object Detection. ICCV 2019

• CondInst 
- solve instance segmentation in an RoI-free fully conv. way 
- employ dynamic
fi
lters instead of a
fi
xed mask head 
- FCOS + Mask head
Related Work
Tian et al. Conditional Convolutions for Instance Segmentation. ECCV 2020

• CondInst 
- FCOS + Mask head
Related Work

• CondInst 
- FCOS + Mask head
Related Work
predict the parameters of the mask head

• Two proposed loss term: 1. Projection loss term 
- projections of the mask and box should be the same
Approach
Lproj = L(Projx(m̃), Projx(b)) + L(Projy(m̃), Projy(b))
: Dice Loss

: the network predictions for the instance mask

: ground-truth bounding box
L( ⋅ , ⋅ )
m̃ ∈ (0,1)H×W
b

• Two proposed loss term: 2. Pairwise a
ffi
nity loss term 
- supervise the mask in a pairwise way
Approach
an undirected graph ; : the set of pixels, : the set of edges

the label for the edge

if the two pixels have the same G.T. label or if not 
 
G = (V, E) V E
ye ∈ {0,1}
ye = 1 ye = 0
P(ye = 1) = m̃i,j ⋅ m̃k,l + (1 − m̃i,j) ⋅ (1 − m̃k,l)

ffi
nity loss term 
Approach
ye
(i, j) (k, l)
m̃i,j m̃k,l
an undirected graph ; : the set of pixels, : the set of edges

the label for the edge

if the two pixels have the same G.T. label or if not 
 
G = (V, E) V E
ye ∈ {0,1}
ye = 1 ye = 0

ffi
nity loss term 
Approach
ye
(i, j) (k, l)
m̃i,j m̃k,l

P(ye = 0) = 1 − P(ye = 1)
Lpairwise = −
1
N ∑
e∈Ein
ye log P(ye = 1) + (1 − ye)log P(ye = 0)

ffi
nity loss term 
Approach
ye
(i, j) (k, l)
m̃i,j m̃k,l

P(ye = 0) = 1 − P(ye = 1)
Lpairwise = −
1
N ∑
e∈Ein
ye log P(ye = 1) + (1 − ye)log P(ye = 0)
However, we only have bounding box annotations…

ffi
nity loss term 
- Learning without mask annotations: color similarity 
- idea: if two pixels have similar colors, they are likely to have the
same labels as well
Approach

assign if (a color similarity threshold)
Se = S(ci,j, cl,k) = exp(−
||ci,j − cj,k ||
θ
)
ye = 1 Se > τ
: color similarity of the edge

: the color vector of the pixel (LAB color space)

: hyper-parameter
Se e
ci,j (i, j)
θ = 2
Lpairwise = −
1
N ∑
e∈En
1{Se>τ} log P(ye = 1)
only for the positive edges

• Two proposed loss term 
- Projection loss term + Pairwise a
ffi
nity loss term
Approach
Lpairwise = −
1
N ∑
e∈En
1{Se>τ} log P(ye = 1)
Lproj = L(Projx(m̃), Projx(b)) + L(Projy(m̃), Projy(b))
Lmask = Lproj + Lpairwise

• Projection and Pairwise A
ffi
nity Loss for Mask Learning 
- the redesigned mask loss can have similar performance to the
original pixel mask loss (e.g., Dice loss)
Experiments

• Box-supervised Instance Segmentation 
- Varying the threshold of color similarity
Experiments
proportion of true positive edges
only with box annotations

- Varying the neighborhood of the pixels
Experiments

- The contribution of each loss term
Experiments

- Failure cases
Experiments

• Comparison with State-of-the-art on COCO
Experiments

Experiments
BoxInst outperforms the previous S.O.T.A BBTP with a large margin

Experiments
BoxInst even outperforms the fully supervised YOLACT and PolarMask

Experiments
BoxInst vs. Mask R-CNN and CondInst

• Experiments on PASCAL VOC
Experiments

• Semi-supervised Instance Segmentation
Experiments

Abstract
• BoxInst 
- achieved high-quality instance segmentation w/ only box annotation 
- core idea: the projection and pair-wise a
ffi
nity mask loss 
- excellent instance segmentation performance w/o mask on COCO
and PASCAL VOC

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations (CVPR2021)

More Related Content

What's hot

Similar to [Review] BoxInst: High-Performance Instance Segmentation with Box Annotations (CVPR2021)

More from Dongmin Choi

Recently uploaded

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations (CVPR2021)