WBF.pptx

Weighted boxes fusion: Ensembling boxes from
different object detection models
2022/04/11, Changjin Lee

Introduction
● When real-time inference is not required, ensembling different models can bring about a
performance boost in object detection task
Model1
Model2
Model3
Ensembled
Predictions
Ensemble

Non-maximum suppression (NMS)
● Sort bounding boxes in decreasing order of their confidence scores
● For each class, starting from the highest-confidence box (box A), remove all the “redundant” boxes
whose IoU with box A > iou_thresh

Soft-NMS
● Instead of removing bounding boxes, soft-NMS reduces the confidences of the proposals
proportional to the IoU value
● Soft-NMS shows a noticeable improvement over the plain NMS method

Problems with NMS / Soft-NMS
Both NMS and Soft-NMS discard redundant boxes
They cannot produce averaged localization predictions from different models
Q) Rather than discarding “redundant” boxes, why don’t we make all of them to contribute to the prediction?

Weighted Boxes Fusion (WBF)
● Unlike NMS and Soft-NMS that discard redundant boxes, WBF makes them to all contribute to the
prediction proportionally to their confidence score, producing an averaged prediction
○ Although each prediction might not be optimal, they must contribute
at least a little bit

Step 0 - N Different Models
● Suppose we have bounding boxes for the same image from N different models
● Bounding box coordinates must be normalized
● Construct boxes_list, scores_list, labels_list
model 1
model 2

Step 1 - Merge bboxes to B
● Add each predicted box from each model to a single list B
○ Filter boxes with confidence < score_thr
● Sort in decreasing order of the confidence score C
B = [b1, b2, b3, b4]

Step 2 - Boxes cluster L / Fused Box F
● Declare an empty list L for boxes clusters - each position of L contains a set of boxes
● Declare an empty list F for fused boxes - each position of F contains a single box
B = [b1, b2, b3, b4]
L = [ ] (2D list)
F = [ ] (1D list)

Step 3 - Loop through B and find a match from F
● Iterate through each box in B and find the best matching box in F
○ Highest IoU with B
○ best_iou > iou_thr
B = [b1, b2, b3, b4]
L = [ ]
F = [ ]
B = [b1, b2, b3, b4]
F = [f1, f2, f3]
best_f_idx
best_iou

Step 4 - If no match found
● If no match found, add the box from B to the end of lists L and F
● Then, go back to Step 3
● If no match found for all boxes in B, the algorithm never proceeds to the next step
each position of L contains a set of boxes
B = [b1, b2, b3, b4]
L = [ ]
F = [ ]
B = [b1, b2, b3, b4]
L = [[b1]]
F = [b1]

Step 5 - If match found
● If match found,
○ Add b_i to L at position best_f_idx corresponding to the matching box in F
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [b1]
best_f_idx
B = [b1, b2, b3, b4]
L = [[b1]]
F = [b1]
iou > iou_thr

Step 6 - Perform WBF
● Perform WBF on the set of boxes in L which now has one more element
● Fused Confidence Score = Averaged confidence of all boxes from the cluster
● Fused box coordinates = Weighted average of all boxes from the cluster
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [b1]
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [f1]

Step 7 - Re-scale confidence scores in F
● After all boxes in B are processed, re-scale the confidence scores in F by
○ multiply by min(# of boxes in a cluster, N) - len(L[i])
○ divide by a # of models N
● The fused box predicted by many boxes is likely to be more accurate
B = [b1, b2, b3, b4]
L = [[b1, b2, b3], b4]
F = [f1, f2]

[No Match]
B = [b1, b2, b3, b4]
L = [ ]
F = [ ]

[No Match] - Add to L and F
B = [b1, b2, b3, b4]
L = [[b1]]
F = [b1]

[Match]
B = [b1, b2, b3, b4]
L = [[b1]]
F = [b1]
best_f_idx

[Match] - Add to L[best_f_idx]
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [b1]
best_f_idx

[Match] - WBF
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [f1]

[No Match]
B = [b1, b2, b3, b4]
L = [[b1, b2]]
F = [f1]

[No Match] - Add to L and F
B = [b1, b2, b3, b4]
L = [[b1, b2], [b3]]
F = [f1, b3]

[Match]
B = [b1, b2, b3, b4]
L = [[b1, b2], [b3]]
F = [f1, b3]
best_f_idx

[Match] - Add to L
B = [b1, b2, b3, b4]
L = [[b1, b2, b4], [b3]]
F = [f1, b3]
best_f_idx

[Match] - WBF
B = [b1, b2, b3, b4]
L = [[b1, b2, b4], [b3]]
F = [f1’, b3]

Performance: Ensemble of two different models

Performance: Ensemble of same model with TTA

Performance: Ensemble of many different models

Personal Notes
● Section 2.1 NMS IoU threshold
● How does WBF perform better for side-by-side
objects than NMS?

References
[1] https://arxiv.org/pdf/1910.13302.pdf
[2] https://github.com/ZFTurbo/Weighted-Boxes-Fusion

WBF.pptx

More Related Content

Similar to WBF.pptx

More from Changjin Lee

Recently uploaded

WBF.pptx