B-FPGM: Lightweight Face Detection via
Bayesian-Optimized Soft FPGM Pruning
Nikolaos Kaparinos, Vasileios Mezaris
CERTH-ITI, Thermi, Thessaloniki, Greece
Real-World Surveillance
Workshop @ WACV 2025
The Growing Demand for Compact AI Models
● The deployment of AI models on mobile devices, such as smartphones and
drones, is increasingly common.
● Thus, the need for compact and efficient AI models has dramatically
increased.
● Face detectors are a type of model commonly deployed on mobile devices.
● Lightweight face detectors have been proposed in the literature.
● They utilize lightweight backbone networks and other optimization techniques,
such as pruning.
2
Network Pruning
● Network pruning is a technique used to reduce the number of parameters in a
model.
● Pruning methods can also be classified into uniform and non-uniform
approaches.
● FPGM pruning is a structured pruning approach that has demonstrated high
performance in the literature.
● Soft Filter Pruning (SFP) is a pruning method that allows the pruned filters to
be updated during subsequent training steps.
3
B-FPGM
● This work proposes, B-FPGM, a novel non-uniform face detection network
pruning technique.
● This work represents the first application of Bayesian optimization to
structured pruning as well as non-uniform pruning in the literature.
● B-FPGM divides the network layers into 6 groups and employes Bayesian
optimization to optimize the pruning rate of each group.
● The optimal pruning rates are then applied alongside FPGM pruning and
SFP.
4
B-FPGM Advantages
● B-FPGM offers flexibility through its non-universal pruning approach.
● It eliminates the need for engineering expertise to define rules for optimal
pruning rates, effectively taking the ‘human out of the loop’.
● At the same time, it avoids utilizing Reinforcement Learning, which comes
with significant drawbacks.
5
B-FPGM overall pipeline
6
Bayesian optimization step
● The Bayesian optimization step is employed to identify the optimal pruning
rate for each layer group, given a target overall sparsity.
● In each iteration, the pre-trained network is soft-pruned and trained for one
epoch.
● The objective function value is equal to the validation loss, plus an additional
term to ensure that the network is pruned approximately at the target overall
sparsity.
7
Network Layer Groups
8
The number of parameters
in each network layer group.
EResFD model architecture
and layer groups.
Overall B-FPGM algorithm
9
Experimental Setup
● All our experiments were applied to EResFD, the currently smallest (in number of
parameters) well-performing face detector of the literature.
● A small ablation study with a second small face detector, EXTD, is also reported.
● The experiments were performed using the WIDER FACE dataset.
○ 12941 training images
○ Three validation subsets based on difficulty: Easy (1146 images), Medium (1079 images), Hard (1001
images)
● Experiments were conducted with target pruning rates ranging from 10% to 60%.
10
Results on EResFD using the WIDER FACE dataset
11
Hard Subset
Group pruning rates determined by Bayesian
optimization. T is the target pruning rate.
10%
10%
20%
20%
30%
30%
40%
40%
50%
50%
60%
60%
Comparison with SoA models
12
Robustness to Randomness
13
Mean mAP ± standard deviation of B-FPGM on
EResFD across five runs, using different random
seeds, for 20% target pruning rate.
Number of layer groups ablation
14
MAP of B-FPGM on EResFD, on WIDER FACE (Easy, Medium,
Hard subsets), for different network layer groupings. N is the
number of layer groups and T is the target pruning rate.
Inference visual example
15
EResFD 50% pruned using B-FPGM
Thank you for your attention!
Questions?
Nikolaos Kaparinos, kaparinos@iti.gr
Vasileios Mezaris, bmezaris@iti.gr
Source code and pruned models available at:
https://github.com/IDT-ITI/B-FPGM
This work was supported by the EU Horizon Europe and Horizon 2020 programmes
under grant agreements 101070093 vera.ai and 951911 AI4Media, respectively.
16

B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning

  • 1.
    B-FPGM: Lightweight FaceDetection via Bayesian-Optimized Soft FPGM Pruning Nikolaos Kaparinos, Vasileios Mezaris CERTH-ITI, Thermi, Thessaloniki, Greece Real-World Surveillance Workshop @ WACV 2025
  • 2.
    The Growing Demandfor Compact AI Models ● The deployment of AI models on mobile devices, such as smartphones and drones, is increasingly common. ● Thus, the need for compact and efficient AI models has dramatically increased. ● Face detectors are a type of model commonly deployed on mobile devices. ● Lightweight face detectors have been proposed in the literature. ● They utilize lightweight backbone networks and other optimization techniques, such as pruning. 2
  • 3.
    Network Pruning ● Networkpruning is a technique used to reduce the number of parameters in a model. ● Pruning methods can also be classified into uniform and non-uniform approaches. ● FPGM pruning is a structured pruning approach that has demonstrated high performance in the literature. ● Soft Filter Pruning (SFP) is a pruning method that allows the pruned filters to be updated during subsequent training steps. 3
  • 4.
    B-FPGM ● This workproposes, B-FPGM, a novel non-uniform face detection network pruning technique. ● This work represents the first application of Bayesian optimization to structured pruning as well as non-uniform pruning in the literature. ● B-FPGM divides the network layers into 6 groups and employes Bayesian optimization to optimize the pruning rate of each group. ● The optimal pruning rates are then applied alongside FPGM pruning and SFP. 4
  • 5.
    B-FPGM Advantages ● B-FPGMoffers flexibility through its non-universal pruning approach. ● It eliminates the need for engineering expertise to define rules for optimal pruning rates, effectively taking the ‘human out of the loop’. ● At the same time, it avoids utilizing Reinforcement Learning, which comes with significant drawbacks. 5
  • 6.
  • 7.
    Bayesian optimization step ●The Bayesian optimization step is employed to identify the optimal pruning rate for each layer group, given a target overall sparsity. ● In each iteration, the pre-trained network is soft-pruned and trained for one epoch. ● The objective function value is equal to the validation loss, plus an additional term to ensure that the network is pruned approximately at the target overall sparsity. 7
  • 8.
    Network Layer Groups 8 Thenumber of parameters in each network layer group. EResFD model architecture and layer groups.
  • 9.
  • 10.
    Experimental Setup ● Allour experiments were applied to EResFD, the currently smallest (in number of parameters) well-performing face detector of the literature. ● A small ablation study with a second small face detector, EXTD, is also reported. ● The experiments were performed using the WIDER FACE dataset. ○ 12941 training images ○ Three validation subsets based on difficulty: Easy (1146 images), Medium (1079 images), Hard (1001 images) ● Experiments were conducted with target pruning rates ranging from 10% to 60%. 10
  • 11.
    Results on EResFDusing the WIDER FACE dataset 11 Hard Subset Group pruning rates determined by Bayesian optimization. T is the target pruning rate. 10% 10% 20% 20% 30% 30% 40% 40% 50% 50% 60% 60%
  • 12.
  • 13.
    Robustness to Randomness 13 MeanmAP ± standard deviation of B-FPGM on EResFD across five runs, using different random seeds, for 20% target pruning rate.
  • 14.
    Number of layergroups ablation 14 MAP of B-FPGM on EResFD, on WIDER FACE (Easy, Medium, Hard subsets), for different network layer groupings. N is the number of layer groups and T is the target pruning rate.
  • 15.
    Inference visual example 15 EResFD50% pruned using B-FPGM
  • 16.
    Thank you foryour attention! Questions? Nikolaos Kaparinos, kaparinos@iti.gr Vasileios Mezaris, bmezaris@iti.gr Source code and pruned models available at: https://github.com/IDT-ITI/B-FPGM This work was supported by the EU Horizon Europe and Horizon 2020 programmes under grant agreements 101070093 vera.ai and 951911 AI4Media, respectively. 16