Thesis presentation Slides Ph.D. Aliouat Ahcen

Introduction Related work Dataset / Setup Region-of-Interest based Video Coding Strategy for Low Bitrate Surveilla
Study and Implementation of an Object-based Video Encoder
for Embedded Wireless Video Surveillance Systems
Thesis defense
Aliouat Ahcen
Supervisors: Dr. Nasreddine Kouadria & Dr. Saliha Harize
LASA Laboratory, Electronics Department, Faculty of Technology, Badji Mokhtar - Annaba University
June 12, 2023
Ahcen Badji Mokhtar - Annaba University 1 / 91

Outline
1 Introduction
2 Related work
3 Dataset / Setup
4 Region-of-Interest based Video Coding Strategy for Low Bitrate
Surveillance Systems (Contribution 1)
5 An efficient Low Complexity Region-of-Interest Detection for
Video Coding in Wireless Visual Surveillance (Contribution 2)
6 Multi-Threshold-based frame segmentation for content-aware
video coding in WMSN (Contribution 3)
7 Region-of-interest based video coding strategy for rate/energy-
constrained smart surveillance systems using WMSNs
(Contribution 4)
8 General Conclusion

Background Background Background
Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Working framework
1 In our problem, we consider a surveillance system that uses a wireless multimedia
sensor network (WMSN) as a backbone for capturing and delivering multimedia
data.
2 We consider also Wireless connections which have challenges in terms of bandwidth
requirement and energy consumption.
3 We are addressing this problem by developing low-cost pre-encoders to reduce the
overall cost of the video encoder in terms of bitrate and energy consumption.
1
1Conservation X Labs product (Edge Cloud AI solution)

Framework
This thesis has been conducted as part of the Franco-Algerian Cooperation
project PHC Tassili.
The PHC Tassili project aims to propose solutions for migratory waterbird
monitoring using WMSN and Artificial Intelligence (AI).
This project proposes a combination of image, video, and audio solutions.
In this scope, the thesis is contributing in the project by detecting and
compressing birds’ ROIs prior to transmission.
PHC Tassili project

Visual Sensor Node
Wirless Multimedia Sensor Network
Transmission
Network
Display System
wireless Link wire Link
Nodes continually capture images / Equipped with batteries (limited energy
source)/ Wireless communication.
Advantages: Surveillance using WMSN
Their ability to cover critical and far zones (military, wild, lakes..) without intervention /
Cover larger zones / Real-time communication of the data / Cooperation of network
nodes
Challenges: WMSN
High data size / Limited energy / Limited bandwidth / High network congestion
Lets consider one sensor node. . .

Wireless Visual Sensor Node
C
a
p
t
u
r
e
d
F
r
a
m
e
Standard
Video / Imgae
Encoder
Buffering and
Radio Transmission
Compressed
Data
The whole frame
High Energy Consumption
High data rate
Fast battery drop
Equal priority to important and non important regions in the frame
Bitstream
Coding efficiency influences directly: Energy/bitrate/memory usage/image quality
The standard approach: processing the whole frame equally, ∀ blocks ,
without priority.
Alternatives: Adding a pre-processing step before performing compression,
called: Region of Interest detection step

Image/video coding in sensor node
Considering a ROI detection as pre-encoder for video compression
The pre-processing step is an aid to the encoder to achieve the desired tradeoff.
fdfd
gfgg
fgfgf
To overcome the challenges of complexity/quality/bitrate trade-off, the
encoder must :
Be source side-friendly (the sensor node as a source).
Ensure very low bitrate output.
Achieve an acceptable frame rate.
dfdf
fdfd
gfgg
fgfgf
Applying a ROI detection means applying moving object detection in the video
sequence . . . So, what are moving object detection approaches?

Moving object detection in video sequence
background model
F(n - 1)
Background Subtraction
Frame Difference
Edge Detection
F(n)
F(n)
F(n)
F(n - 1)
Other techniques
Most of the other techniques are a combination or variant of those methods

Moving Object as Region of Interest (ROI)
What is a region of Interest?
In a video frame, the different regions are not of the same interest
The human eye is interested in the object in the frame, either the moving or the still
object.
Example: ROI can include
A pedestrian walking / a car in the street / a flying bird / any object that creates
movement between frames
Moving Objects as Region of Interest
How to process the ROI?
Block based processing of the ROI is better for compression, which allow achieving high
detection accuracy.

Impact of ROI detection on compression
To encode frame based on ROIs
We are trying to avoid energy/bitrate wasting in encoding unnecessary data.
Unnecessary data are those blocks with no or negligible changes.
Benefit of coding the frame based on ROI
Important gain in data rate and energy / Achieving real-time conditions with High
ROI quality.
What are the conditions?
High accuracy in detecting all the moving regions to avoid artifacts.
R
O
I
n
o
n
-
R
O
I
C
a
p
t
u
r
e
d
F
r
a
m
e
Pre-encoder
(ROI Detector)
ROI-based
Video Encoder
ROI
Recommendation ROI: Region-Of-Interest
Buffering
and Transmission
Compressed
Data
The research question then arises...

Research question of the thesis
Video/Image
Coding
Wireless
Multimedia
Sensor Network
ROI detection for video
coding in WMSN
(our approachs)
Object
Detection
(Region-of-Interest)
E
n
e
r
g
y
Network lifetime/rate Accuracy
C
o
m
p
l
e
x
i
t
y
Q
o
S
/
Q
o
E
B
i
t
r
a
t
e
Research question
How can we detect ROI in a captured video to ensure high-quality encoding and
transmission over a WMSN while minimizing bitrate and energy consumption?
The context and objective are then clear...

Expected results from this thesis
R
O
I
n
o
n
-
R
O
I
Wireless Visual Sensor Node (Transmitter)
C
a
p
t
u
r
e
d
F
r
a
m
e
Pre-encoder
ROI Detector
ROI-based
Video Encoder
ROI
Buffering and
Radio Transmission
Compressed
Data
Video Analysis
(Decision?)
Video Decoder
based on ROI
ROI: Region-Of-Interest
Receiver
Compressed
Data
Channel Conditions
Bitstream
Decide
Recognize
Destination (Receiver)
Classify
Monitor
Recommand
Recovered
Data
Overall scheme of the thesis contribution conditions

Organization of the contributions
Thesis
Contributions plan
Binary
Classification
Multi-class
Classification
Contribution 1
Contribution 2
Contribution 3
Contribution 4
PART 1
PART 2

Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Related work: ROI based video coding
1 Kouadria et al.: ‘Region-of-interest based image compression using the discrete
tchebichef transform in wireless visual sensor networks’ - 2019.
Detect and transmit only the ROI using SAD.
Gain:
Very low bitrate (about 2kB needed for an image of size 320x360).
Very low complexity adapted for WMSN.
Limits:
Limited accuracy of the ROI detection algorithm.
Validated on small dataset/Limited number of evaluation metrics.

Related works: ROI based video coding
2 Rehman et al.: “A novel energy efficient object detection and image transmission
approach for wireless multimedia sensor networks” – 2016.
Separate the frame into 4 blocks and transmit only the active blocks.
Gain:
Moderate bitrate for transmission with simple detection.
Limits:
High detection and compression complexity.
Validated on small dataset.
Limited number of evaluation metrics.
Can be optimized to have lower bitrate.

Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Used Dataset over the contributions
Surveillance datasets used in our work experiments.
Work State / Source Number of sequence
Contribution 1 [1] Multiple sequence / Multiple Dataset 9 video sequences
Contribution 2 [2] Complete dataset : CDnet 2014 51 video (15000 frame)
Contribution 3 [3] Multiple sequences / Multiple Dataset 3 video sequences
Contribution 4 [4] Multiple sequence / Multiple Dataset 9 video sequences
Condition of the captured scences of the datasets
Indoor/Outdoor surveillance sequences.
Human, highway, pedestrians, battlefield . . . objects are contained in the sequences.
Color, gray-scale, thermal images.
Weather conditions: rain, snowfall (noisy background), sunny . . .
QCIF, CIF, . . . HD resolutions.
Night and day time capturing.

Embedded environment conditions
for the sake of precise validation, an embedded environment conditions are applied.
We assume an STM32 ARM cortex M3 motherboard as an embedded system.
The energy consumption of basic arithmetic operations is considered
(addition/subtraction/division/multiplication).
STM32 ARM Cortex M3 characteristics (contributions [2] and [4])
Sensor Processor Cortex M3
Clock rate 72 MHz
Processor power 23 mW
Cycles count Add. (1), Sub.(1), Mult.(1 or 2), Div.(1 to 12).

Performance evaluation and used metrics
Metric
PSNR | SSIM | MS-SSIM | VIF | Balanced-Accuracy |
Recall | Precision | Sensitivity | Specificity | FPR
| FNR| PWC | TP | FP | TN | FN | F-measure
Value/Score
Reference Frame / Original Frame/ Ground Trouth
Resulted Frame
Evaluation metrics used.

Model/Method Results and discussion Recap.
Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Contribution 1:
Title: Region-of-Interest based Video Coding Strategy for Low Bitrate
Surveillance Systems Conference paper published in the 19th
IEEE SSD
International Multi-Conference on Systems, Signals and Devices.
Region-of-Interest based Video Coding Strategy for
Low Bitrate Surveillance Systems
Ahcen Aliouat1
, Nasreddine Kouadria1
, Moufida Maimour2
, and Saliha Harize 1
1
LASA Laboratory, Badji Mokhtar University, Annaba, Algeria
{ahcen2300,kouadria.n,shrz.dj}@gmail.com
2
CRAN laboratory, Lorraine University, Nancy, France
{moufida.maimour}@univ-lorraine.fr
Abstract—In this work, we propose a fast and efficient Region-
of-Interest based video coding strategy for surveillance systems
involving low bitrate. The proposed algorithm is based on a com-
bination of three major techniques, namely, edge detection, frame
differencing and sum of absolute differences. We improve the
algorithm accuracy through the use of morphological operations.
A thresholding is performed to classify the frame blocks into
moving and non-moving blocks. This allows to compress and
sent to the destination only moving blocks in an object-based
video coding scenario. The obtained results prove the efficiency
of our proposal in terms of accurate detection, data reduction
and bitrate saving.
Index Terms—ROI, Object Detection, WMSN, Video Coding
I. INTRODUCTION
Video coding techniques can be divided into two main
approaches, namely, noise-robust video coding and non-noise-
robust video coding [1]. Noise-robust video coding like Op-
tical Flow [2] and blocks matching approaches [3], perform
motion estimation approaches based on relatively high com-
video coding strategy using a ROI coding. We start with
a ROI detection phase where we exploit the efficiency of
the absolute difference between edge maps to extract the
difference between successive frames using Edge detection
(ED) technique on each frame. The map of absolute difference
of ED is enhanced by summing up squared (typically 4 × 4)
non overlapping blocks to construct a smaller activity map.
The activity map scores are morphologically changed to widen
the high score zones and get a larger ROI after the thresholding
step. The last step consists of establishing a strategy to avoid
image quality degradation and eliminates error propagation at
the destination.
The remainder of this paper is organized as follows. The
background and the related work are presented in Section
II. The proposed method is detailed in Section III and its
evaluation results on different data sets are presented and
discussed in Section IV. Finally, a conclusion is drawn in
Section V.
WK,QWHUQDWLRQDO0XOWLRQIHUHQFHRQ6VWHPV6LJQDOV 'HYLFHV 66'
654-7108-4/22/$31.00
©2022
IEEE
|
DOI:
10.1109/SSD54932.2022.9955963

Model/Method
We proposed an ROI detection method based on Edge Detection + Frame
Difference
The detection is enhanced by the ROF+FGS filter
The method resolve the error propagation problem by proposing error
correction approach.
Block diagram of the proposed ROI detection method

Model/Method
proposed algorithm for ROI
detection/compression
Using Sobel for edge detection
Using SAD on edge feature
2-D ROF and FGS to enhance
Error propagation avoidance by
whole frame compression after
each GOP
Compress and transmit
only the ROI blocks
Receive, decode,
and update the display
Sobel Edge Detector
Input Frame n
Input Frame n-1
Edge Difference map Calculator
Sum of 4x4 Elements (SAD)
2-D Rank Odrer Filter of 8x8 window
Fast Global Smoother
Score Threshold? binary mask(block) = 0
binary mask(block) = 1
No
Yes
Compress then Transmit the Block
Skip the Block
GOP acheaved?
No
Yes
Compress
and Transmit
the whole Frame
Input Frame n-1 Input Frame n

Results and discussion
ROI detection results
The ROI mask includes all the objects, the results show high accuracy for ROI detection.

Quality evaluation
PSNR, SSIM, MS-SSIM and VIF results for the used dataset
SSIM, MS-SSIM, PSNR and VIF for: atrium

Data reduction
Mean number of blocks to be transmitted for each strategy
Sequence
name
Sequence
size
ROI-based
(ours)
Classical
approach
Saving(%)
(wr. to classical)
Traffic2 640x360 4695 14400 67.4%
Atrium 640x360 589 14400 96%
Highway 320x240 1345 4800 72%
freeway 316x236 530 4661 88.6%
peds 232x152 719 2204 67.4%
rain 308x228 2132 4389 51.5%
traffic 378x282 1768 6662 73.5%
traffic3 160x120 428 1200 64.3%
Advantage
Energy saving: between 51% to 96%.

Limits in terms of visual Quality
Missing Information
due to wrong moving region detection
Blocks to be transmitted
Corect detection
(white pixels)
Error Probagation
due to continues wrong
object detection
Blocks to be Skipped
no information change
Between each GOP, a wrong detection of the ROI leads to a propagation of the visual
artifacts.

Limits in the context of ROI detection for video
coding in WMSN
Some limits
1 High energy consumption is expected due to the used edge detection method.
2 Limited size of the used dataset
3 Quantitative evaluation of the detection performances is not performed
4 The energy consumption in an embedded environment is not evaluated.
5 A comparison to the state of the art is not shown.
How to solve this?
The next contribution shows a very low energy consumption method evaluated on
a large corpus dataset while the energy consumption is modeled and the results
are compared to the state of the art.

Recap.
The proposed approach is efficient when applied in surveillance camera.
Edge feature detection and error correction using fixed GOP intervals.
Sobel edge detection is effective in identifying frame changes, and SAD
accurately locates the ROI.
Achieves significant bandwidth savings, energy reduction (51.5%-96%), and
high-quality frame reconstruction.
Sobel edge detector consumes excessive energy in sensor node.
The study lucks: Accuracy and energy consumption analysis.
The upcoming contribution will address these limitations.

Context and Motivation Model/Method Results and disc
Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Title: ”An Efficient Low Complexity Region-of-Interest Detection for Video Coding
in Wireless Visual Surveillance” IEEE Access, IF=3.41.

Main contribution
Block-based movIng Region Detection (BIRD)
Method: Low complexity ROI detection method for video coding in constrained WVS.
Accuracy: Improved detection accuracy through a combination of fast Gaussian
smoother and rank-order filter.
Performance: Algorithm assessed using several metrics to evaluate detection
performance and confirm superiority over SOTA techniques in constrained WVS.
Benefits: Bitrate and energy savings achieved using algorithm as a pre-encoder of a
baseline JPEG compression chain.
Viability: Algorithm’s viability for implementation in WVS demonstrated based on
energy/memory consumption modeling using ARM Cortex M3 characteristics.
R
O
I
n
o
n
-
R
O
I
C
a
p
t
u
r
e
d
F
r
a
m
e
Pre-encoder
(ROI Detector)
ROI-based
Video Encoder
ROI
Buffering
and Transmission
Compressed
Data
Accurate pre-encoder to encode only moving frames

Proposed Method
Block diagram of the proposed algorithm (BIRD)
SFD : ϕn(x, y) =
1
w2
w−1
X
u=0
w−1
X
v=0
Fn(wx + u, wy + v), (1)

Proposed method
Difference between maps:
∆(w, y) = |ϕn(x, y) − ϕm(x, y)| (2)
Impact of FGS and ROF

Performances of BIRD over the CDnet 2014
Visual results (binary mask): Recall (TPR) is optimized to cover all objects (ROI)
A. Aliouat et al.: Region-of-Interest Detection for Wireless Visual Surveillance
TABLE 3: Samples of ROI extraction mask results
Sequence Original ground-truth mask ROI
Highway #1475
SnowFall #2784
Pedestrians #476
Blizzard #1406
WinterDriveway
#1860
tunnelExit #2329
Sofa #1185
PTZ #1240 Ahcen Badji Mokhtar - Annaba University 37 / 91

Quantitative results: All the categories
Category Recall Specificity FPR FNR PBC Precision F-Measure
PTZ 0.9662 0.6443 0.3556 0.0337 35.3016 0.0401 0.0753
badWeat. 0.9208 0.8948 0.1051 0.0791 10.1795 0.2747 0.3904
baseline 0.7619 0.9437 0.0562 0.2380 6.6360 0.3268 0.4047
cameraJ. 0.8504 0.6446 0.3553 0.1495 34.5590 0.1383 0.2238
dynamic. 0.7593 0.9512 0.0487 0.2406 4.9399 0.1962 0.2801
intermi. 0.4186 0.8603 0.1396 0.5813 16.4228 0.1566 0.2242
lowFram. 0.8161 0.7905 0.2094 0.1838 20.2242 0.1315 0.1919
nightVi. 0.9455 0.8374 0.1625 0.0544 15.9206 0.1193 0.2108
shadow 0.8775 0.8500 0.1499 0.1224 14.8039 0.2416 0.3740
thermal 0.7548 0.8894 0.1105 0.2451 13.4618 0.3575 0.4095
turbule. 0.8216 0.8870 0.1129 0.1783 11.3767 0.1000 0.1607
Overall 0.8084 0.8357 0.1642 0.1915 16.7115 0.1893 0.2678
Detection results of the proposed algorithm over CDnet 2014 dataset

Quantitative results: Compared to SOTA
Technique Recall Specificity FPR FNR PWC F-Measure Precision
KNN [1] 0.6650 0.9802 0.0198 0.3350 3.3200 0.5937 0.6788
GMM1 [2] 0.6846 0.9750 0.0250 0.3154 3.7667 0.5707 0.6025
KDE [3] 0.7375 0.9519 0.0481 0.2625 5.6262 0.5688 0.5811
MahaD [4] 0.1644 0.9931 0.0069 0.8356 3.4750 0.2267 0.7403
GMM2 [5] 0.6604 0.9725 0.0275 0.3396 3.9953 0.5566 0.5973
EucD [4] 0.6803 0.9449 0.0551 0.3197 6.5423 0.5161 0.5480
BIRD 0.8084 0.8357 0.1642 0.1915 16.7115 0.1893 0.2678
Comparison of BIRD with classical techniques over CDnet 2014 dataset

Quantitative results:: Comparison with SOTA
Category-wise comparison of BIRD with SOTA on CDnet 2014 dataset
Category
Recall Specificity Balanced Acc.
BIRD Savas[6] Cwizar[7] BIRD Savas[6] Cwizar[7] BIRD Savas [6] Cwizar[7]
Dynamic. 0.7593 0.6436 0.8144 0.9512 0.9962 0.9985 0.8553 0.8199 0.9064
PTZ 0.9662 0.7685 0.3833 0.6443 0.9977 0.9968 0.8053 0.8831 0.6901
BadWeat. 0.9208 0.5647 0.6697 0.8948 0.9985 0.9993 0.9078 0.7816 0.8345
Baseline 0.7619 0.6214 0.8972 0.9437 0.8213 0.9980 0.8528 0.7213 0.9476
CameraJ. 0.8504 0.4567 0.7436 0.6446 0.9788 0.9931 0.7475 0.7177 0.8683
Intermi. 0.4186 0.5547 0.8324 0.8603 0.9979 0.9911 0.6394 0.7763 0.9118
LowFram. 0.8161 0.5490 0.6659 0.7905 0.7464 0.9949 0.8033 0.6477 0.8304
nightVi. 0.9455 0.4593 0.4511 0.8374 0.9583 0.9874 0.8915 0.7088 0.7193
Shadow 0.8775 0.8365 0.8786 0.8500 0.9828 0.9910 0.8638 0.9097 0.9348
Thermal 0.7548 0.4650 0.7268 0.8894 0.9647 0.9949 0.8221 0.7148 0.8609
Turbule. 0.8216 0.7421 0.7122 0.8870 0.9883 0.9997 0.8543 0.8652 0.8559
Overall 0.8084 0.6056 0.6608 0.8357 0.9483 0.9948 0.8220 0.7770 0.8509
*bold values are the best category-wise, red values are the best overall, blue values are the second best

Energy consumption
The total energy consumption in the node is equal to:
Etotal = EDetection + Ecompress, (3)
While Ecompress is estimated from [8]2
, EDetection is equal to:
EDetection = ESF D + EF GS + EROF + ET hreshold (4)
2Energy-efficient image compression for resource-constrained platforms, Lee Dong-U et al.,
IEEE Transactions on Image Processing,2009.

Energy consumption: proportion of blocks
considered for transmission, and the gain
Statistics of the energy gain variable threshold values.
Threshold Highway Pedestrians Snowfall
∆ energy ∆ energy ∆ energy
- mean (ROI) ratio (ROI) mean (ROI) ratio (ROI) mean (ROI) ratio (ROI)
10 149 12.41% +87.59% 49 03.63% +96.37% 68 01.26% +98.74%
9 160 13.33% +86.67% 52 03.85% +96.15% 74 01.37% +98.63%
7 192 16.00% +84.00% 60 04.44% +95.56% 87 01.61% +98.39%
5 249 20.75% +79.25% 76 05.63% +94.37% 110 02.04% +97.96%
3 291 24.25% +75.75% 120 08.89% +91.11% 190 03.52% +96.48%
1 621 51.75% +48.25% 273 20.22% +79.78% 1857 34.39% +65.61%
0 1003 83.58% +16.42% 598 44.30% +55.70% 4360 80.74% +19.26%
Max 1200 100% - 1350 100% - 5400 100% -

Energy consumption: results
Per-frame Edetection cost of the method compared to state-of-the-art for size
(240 × 320)3
Method Energy Budget (mJ/Frame)
min (Cyclesdiv = 1) max (Cyclesdiv = 12)
MoG [2] 649.95
CS-MoG [9] 116.44
CoSCS-MoG [10] 125.96
EBSCAM [11] 3.4
FD 0.5069
BIRD (proposed) 0.3723 0.6891
3While we have calculated to best and the worst case, the other techniques have not reported
extreme values. Reported values of other works are shown here.

Limitations / Open challenges
Limitations of the method:
Critical step: threshold value selection.
ROI prioritization requires multi-class classification.
Multi-level classification of the block by its importance is not applied
Solutions
Adaptive and automatic threshold selection resolved in next contribution.
Multi-class ROI classification problem solved in next contribution.

Recap.
We have proposed an energy-efficient ROI detection method for WVS.
The method showed good balance between accuracy, efficiency, and memory
when evaluated on a standard dataset.
The method reduces the processing and compression burden for
resource-constrained surveillance devices.
Next contribution focus on drawback of this method: Multi-class
classification of the ROI, and automatic threshold selection.

Model/Method Results and discussion
Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Title: Multi-Threshold-based frame segmentation for content-aware video coding
Book Chapter

Context and Motivation
What is the problem?
Adaptive thresholding for ROI extraction is challenging.
Multi-class region classification based on activity improves encoder process.
ROI resource allocation improves QoS and delivery.
Exploring activity statistics improve classification accuracy.
Needed improvements
Multi-class region classification based on activity (can improves the encoder
process).
ROI resource allocation can improves QoS and delivery.
Exploring activity statistics can improve classification accuracy.
Limits of the SOTA
SOTA ROI detection methods use binary classification.
Fixed threshold is a drawback of SOTA methods.
Accurate adaptive threshold selection in WMSN conditions is challenging.
Solution: What we propose Ahcen Badji Mokhtar - Annaba University 48 / 91

Compress then analyze (CTA) vs. Analyze then
Compress (ATC).
Image acquisition Image Compression Image Transmission
Image Visualization
and Analyze
bitstream
decompression
bitstream reception
Compress-Then-Analyze Paradigm (CTA)
Compress the frame, then analyze it at the destination (CTA).
Image acquisition
ROI-based
Compression
ROI-based
Transmission
Image Visualization
bitstream
deCompression
bitstream reception
ROI Detection and
Analyze
Analyze-Then-Compress Paradigm (ATC)
Analyze the frame, then compress it adaptively (ATC).

Proposed Method
Image acquisition
ROI-based
Compression
ROI-based
Transmission
ROI Detection and
Analyze
Frame n-1 Frame n
Canny Edge Detector
Sum of 8x8 block Absolute
Difference
Fast Gloal Smoother + Maximum
Rank Order Filtering
Automatic Thresholding
Outsu Multi-Threshold
(2 Thresholds )
Activity map masks
First ROI?
set QF=X
set QF=ZY
8x8 block
DCT + Quantization + Huffman coding
yes
no
Buffer
for
ROI 1
bistream
Buffer
for
ROI 2
bitstream
Buffer
for
ROI 2
bitstream
Second ROI?
set QF=YX
yes
no
Proposed multi-class classification of the ROI in an ATC paradigm.

Setup and parameters
Multi-level
Otsu
Thresolding
Used parameters/Techniques for each step
Parameter Value
Edge Detector Canny
SAD 8
FGS
Window size σ
8 0.05
ROF
n p
4 100
Thresholding mult-class Otsu
JPEG
Compression technique Entropic Coding
8-DCT Huffman
Classes QF
X Y
90 50

Classification results
Frame Segmentation for Content-Aware Video Coding in WMSN 7
6: Results of multi-QF based coding. from left to right: 1- Original Frame 2-
mentation Results 3- Decompression results (JPEG chain with ROI1: QF=90,
2:QF=50, ROI3:QF=10 - PSNR=33.9308 , SSIM=0.7618) 4- ROI visual quality
ame bitrate(proposed left, MJPEG right).
sen for comparison due to its low complexity compared to resent encoders and
e it shows large implementation in WMSN. It is shown that the PSNR value
wer for the case of multi-QF in comparison with MJPEG. The reduction is
ROI1: QF=90, ROI2: QF=50, ROI3: QF=10
PSNR=33.9308 dB, SSIM=0.7618
small boxes: ROI visual quality for same bitrate (proposed left, MJPEG right).

Results and discussion: Quality evaluation
0 50 100 150 200 250 300
Frame #
0
10
20
30
40
50
PSNR
[dB]
Proposed (R1-QF=90 R2-QF=50 R3-QF=10
MJPEG
(a) Hall sequence
0 20 40 60 80 100 120
Frame #
0
10
20
30
40
50
PSNR
[dB]
Proposed (R1-QF=90 | R2-QF=50 | R3-QF=10)
MJPEG
(b) Traffic sequence
PSNR value of ROI-based coding compared to MJPEG (at the reception)
The quality of the whole frame marks a degradation of about 9dB compared to
classical method due to used QF values.

Results and discussion: Quality evaluation
MJPEG proposed multi-QF ROI-1
0
10
20
30
40
50
mean
PSNR
[dB]
traffic sequence @bitrate = 1 bpp
(a) Hall sequence
MJPEG proposed multi-QF ROI-1
0
10
20
30
40
50
mean
PSNR
[dB]
hall sequence @bitrate = 1 bpp
The mean PSNR of the whole frame of the proposed method, the high priority ROI, and
the whole frame of MJPEG at the same bitrate
The ROI quality is guaranteed, and is higher than non-ROI region and classical
method for a fixed bitrate of 1bpp.

Results and discussion: Bitrate saving
0 50 100 150 200 250 300
Frame #
0
2
4
6
8
Bitrate
(kB)
MJPEG(QF=90)
(a) Hall sequence
0 20 40 60 80 100 120
Frame #
0
2
4
6
8
Bitrate
(kB)
MJPEG (QF=90)
bitrate needed for ROI-based strategy against MJPEG based coding in a Wireless sensor
node
A bit rate gain of almost 50%

Gain and benefits
Discussion
Large reduction of transmission bitrate (generally more than 50%) with respect to
MJPEG..
Reduced bandwidth usage leads to less contention in the channel in WMSN.
For a multi-hop scenarios, energy-constrained nodes relay frames.
For this scenario, the method offers increasing energy savings.
As the number of hops increases, the energy savings increase.
The bit reduction propagates across the network.

Limits and open question
Limits
The proposed method:
Has been tested on limited dataset (3 sequences) / Can be improved in complexity.
Shows relatively large complexity as it consists of many steps.
Has not been evaluated in terms of energy consumption.
solution
The next contribution classifies the frames into multi regions using a novel
addition-based method.
It includes and evaluates detailed energy consumption model.

Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

Title: Region-of-interest based video coding strategy for rate/energy-constrained
smart surveillance systems using WMSNs: Ad Hoc Networks Journal (Elsevier), IF= 4.9

Methodology: Context and Motivation
Problematic
Reducing the bitrate can affect the quality of the image at the reception.
Enabling some smart tasks at the reception is needed for new networks paradigms.
Better the quality of the ROI, the better the accuracy of the monitoring tasks

Objectives
Reduce power consumption in in-node processing.
Minimize required bandwidth for transmission.
Maintain a high QoS level.
Solution
A novel ROI detector named successive summation of the absolute differences
(S-SAD).
Advantages of the solution:
A tradeoff between quality, bitrate, energy consumption, and object recognition.
An efficient human-based and machine-based smart monitoring tasks.
The method outperforms state-of-the-art and MJPEG techniques using YOLOv3
model.
The energy consumption model confirms the method’s feasibility for IoT nodes.

ROI Detection
captured
video
Frame Difference
Frame n-1
map1
Sum of 8x8 pixels
Frame n
map2
sum of 4x4 map1
scores
map3
sum of 2x2 map2
scores
1
2
3
4
Thresholding and
blocks labeling
ROI-1
ROI-2
ROI-3
non-ROI
k-1
k
Proposed ROI detection step

Mathematically speaking
SADmap(x, y) =
1
w2
1
w1−1
X
u=0
w1−1
X
v=0
D(w1x + u, w1y + v) (5)
Rmap(x, y) =
1
w2
2
w2−1
X
u=0
w2−1
X
v=0
SADmap(w2x + u, w2y + v) (6)
Gmap(x, y) =
1
w2
3
w3−1
X
u=0
w3−1
X
v=0
Rmap(w3x + u, w3y + v) (7)
ROI-1 ⊂ ROI-2 ⊂ ROI-3

ROI Detection
Pyramidal view of the calculation and decision

Coding Strategy
Image acquisition ROI-based Compression ROI-based Transmission
ROI Detection and Analyze
Frame k-1 Frame k
Sum of Absolute Differences of
each block (SADmap)
sum of each blocks of
SADmap to get Region Activity
map (Rmap)
sum of each blocks of
Rmap to get Global Activiy map
(Gmap)
Class ?
set
QF=
drop the
block
DCT + Quantization + Huffman coding
yes no
Buffer for
bistream
(high priority)
Buffer for
bitstream
(low priority)
Class ?
set
QF=
yes
no
Thresholding
ROI-1 mask
Thresholding
ROI-2 mask
Thresholding
GMR mask
Global activity map (Gmap)
Succesive Sum
of Absolute
Difference (SSAD)
block
Complete scheme: Coding strategy

Coding Strategy
1 The first priority class C1 = ROI-1 represents the blocks that are
in, and only in the first ROI. Class C1 blocks having the highest interest are
coded with a higher MJPEG quality factor Q1 before being transmitted
2 The second priority class C2 = ROI-2 - ROI-1 includes the labeled moving
blocks that are in ROI-2 but not in ROI-1. Class C2 blocks having a medium
interest are coded, prior to their transmission, with a lower MJPEG quality
factor Q2 Q1
3 The third priority class C3 = GMR - ROI-2 includes the blocks that are
in the ROI but are not in ROI-2. These class blocks are considered to be of
low interest and are simply dropped.

Coding Performances
Performances: assessed for both human-based/machine-based monitoring.
Human-based monitoring: Image quality metrics with and without reference.
Machine-based monitoring: Using Deep Learning model (YOLOv3) for object
recognition.

Results: Visual results Ad Hoc Networks 140 (2023) 1
et al.
Table 2
Visual binary mask for the moving region.
4
4Kouadria et al.:”Region-of-interest based image compression using the discrete Tchebichef
transform in wireless visual sensor networks, 2019.

Comparison with Other Methods: Image Quality
Overall mean quality metrics
Sequence
proposed [12] MJPEG
PSNR SSIM VIF PSNR SSIM VIF PSNR SSIM VIF
Highway 31.7414 0.7865 0.6042 30.4667 0.7053 0.5385 32.7808 0.7700 0.7351
HighwayI 28.8923 0.6716 0.5744 31.8053 0.6138 0.4874 37.9583 0.8374 0.7934
HighwayII 28.4600 0.7055 0.4637 29.3400 0.6965 0.4506 33.0100 0.8208 0.7207
campus 31.3614 0.7055 0.4637 29.5200 0.6965 0.4501 35.7400 0.8208 0.7207
intellegentroom 31.7727 0.8036 0.5916 30.4667 0.7053 0.5385 32.7808 0.7700 0.7351
laboratory 32.1748 0.6214 0.5748 30.9583 0.5894 0.5297 34.6275 0.6790 0.7492
Traffic 30.0569 0.6559 0.6230 28.2246 0.5710 0.5030 30.4093 0.6625 0.6493
StreetCornerAtNight 33.3806 0.4686 0.4775 32.1323 0.4402 0.4258 42.4140 0.9034 0.9216
Results
MJPEG performs best for all sequences due to high-quality factor encoding.
Our proposed method achieves second best results for all metrics.
Outperforms MJPEG in SSIM for Intelligentroom and Highway sequences.
Superiority in SSIM is due to stable background with no change over time.
Frames quality decreases for HighwayI and II due to low fps and high motion.
High movement leads to lower quality of ROI-2 (Q2 = 20), decreasing the quality.

Comparison with other Methods: PSNR
0 20 40 60 80 100 120
Frame no.
0
10
20
30
40
50
PSNR(dB)
Proposed(50-20)
Kouadria et al. (2019) QF=50
MJPEG QF=50
0 100 200 300 400 500 600 700 800
Frame no.
0
10
20
30
40
50
PSNR(dB)
Proposed (QF=50-20)
MJPEG QF=50
0 50 100 150 200 250 300 350 400
Frame no.
0
10
20
30
40
50
PSNR(dB)
proposed (QF=50-20)
MJPEG QF=50
PSNR results for Traffic, Highway and SteertatNight sequences
Results
The quality is very comparable to the standard and better compared to SOTA.

Comparison with Other Methods: SSIM
0 20 40 60 80 100 120
Frame no.
0
0.2
0.4
0.6
0.8
1
SSIM
Proposed(50-20)
MJPEG QF=50
0 100 200 300 400 500 600 700 800
Frame no.
0
0.2
0.4
0.6
0.8
1
SSIM
Proposed (QF=50-20)
MJPEG QF=50
0 50 100 150 200 250 300 350 400
Frame no.
0
0.2
0.4
0.6
0.8
1
SSIM
proposed (QF=50-20)
MJPEG QF=50
SSIM results for Traffic, Highway and SteertatNight sequences
Results
The quality is very comparable to the standard and better than to SOTA.

Comparison with other Methods: VIF
0 20 40 60 80 100 120
Frame no.
0
0.2
0.4
0.6
0.8
1
VIF
Proposed(50-20)
MJPEG QF=50
0 100 200 300 400 500 600 700 800
Frame no.
0
0.2
0.4
0.6
0.8
1
VIF
Proposed (QF=50-20)
MJPEG QF=50
0 50 100 150 200 250 300 350 400
Frame no.
0
0.2
0.4
0.6
0.8
1
VIF
proposed (QF=50-20)
MJPEG QF=50
VIF results for Traffic, Highway and SteertatNight sequences
Results
The quality is very comparable to the standard and better than to SOTA.

Comparison with other Methods: BRISQUE
0 20 40 60 80 100 120
Frame no.
15
20
25
30
35
40
45
50
BRISQUE
score
Original
Proposed (QF=50-20)
MJPEG QF=50
0 100 200 300 400 500 600 700 800
Frame no.
15
20
25
30
35
40
45
BRISQUE
score
Original
Proposed (QF=50-20)
MJPEG QF=50
0 50 100 150 200 250 300 350 400
Frame no.
25
30
35
40
45
50
55
60
BRISQUE
score
Original
proposed (QF=50-20)
MJPEG QF=50
BRISQUE results for Traffic, Highway and SteertatNight sequences
Results
Comparable to the SOTA, the method does not presents no degradation related to no
referential evaluation.

Comparison with other Methods: data size
0 20 40 60 80 100 120
Frame no.
0
0.5
1
1.5
2
2.5
3
data
size
(
kB)
Proposed(QF=50-20)
MJPEG QF=50
0 100 200 300 400 500 600 700 800
Frame no.
0
2
4
6
8
10
Data
size
(
kB)
Proposed (QF=50-20)
MJPEG QF=50
0 50 100 150 200 250 300 350 400
Frame no.
0
2
4
6
8
10
12
Data
size
(
kB)
proposed (QF=50-20)
MJPEG QF=50
Requred Data size for Traffic, Highway and SteertatNight sequences
Results
For the campus sequence (fps = 10), our method requires a mean bitrate of 3.358
kB/s, which is 27 times less than the required bitrate with respect to MJPEG
(93.06 kB/s). This represents a saving of 96.4%.
For the highway sequence (fps = 25), we achieve a saving of about 76.3% of the
required bitrate.

Table 4
Bounding box insertion results for the used dataset.
Bounding box insertion results (at the reception)
Video quality remains intact while achieving higher recognition accuracy.

Recognition Accuracy
Traffic Highway HighwayI HighwayII campus intellegentroom laboratory streetCornerAtNight
100
101
102
103
104
Number
of
Recognized
Objects
Original(no compression)
MJPEG QF=50
Proposed (QF=50-20)
Mean number of detected objects
Results
Preserving a high quality only for the ROI while ensuring a good ROI detection is
sufficient to enable more accurate smart tasks at the destination like recognition.

Recognition Accuracy
Traffic Highway HighwayI HighwayII campus intellegentroom laboratory streetCornerAtNight
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Recognition
score
Original(no compression)
MJPEG QF=50
Proposed (QF=50-20)
Recognition probability
Results
We achieved higher recognition accuracy at lower bitrate and energy budgets, enabling
more accurate smart machine-based tasks (22% enhancement).

Energy consumption: model
Approach
The energy consumption model is computed based on the arithmetic operations
performed in each step.
In our case, the energy model is proportional to the size of the frame and the size
of the window in each step (SAD, Rmap,Gmap) w1 w2 w3.
Eprocessing = Edetection + Ecompression (8)
Edetection = ESADmap + ERmap + EGmap + ET hresh (9)

Energy consumption: Results on ROI detection
Per frame energy cost (mJ) of our ROI detection
Sequence Edetection Eprocessing % extra cost
campus 0.8827 14.24 6.20%
highway 0.6699 16.02 4.18%
traffic 0.1671 20.22 0.83%
Per frame energy consumption (mJ).
Sequence
Proposed MJPEG saving (%)
max min std. dev. mean mean w/r MJPEG
campus 211.14 0.90 24.93 14.24 205.92 93.08
highway 53.38 0.90 10.96 16.02 156 89.74
traffic 40.18 0.23 16.11 20.22 39 48.16

Energy consumption: Results
0 200 400 600 800 1000 1200
Frame no.
0
50
100
150
200
250
Energy
Consumption
(mJ)
27
28
29
30
31
32
33
34
35
36
PSNR(dB)
proposed ROI-based Coding
MJPEG
0 50 100 150 200 250 300
Frame no.
0
20
40
60
80
100
120
140
160
Energy
Consumption
(mJ)
31
32
33
34
35
36
PSNR(dB)
MJPEG
0 20 40 60 80 100
Frame no.
0
10
20
30
40
50
Energy
Consumption
(mJ)
0
10
20
30
40
50
PSNR(dB)
MJPEG
Total processing energy consumption and the corresponding PSNR.
Results analysis
Analysis of energy consumption and quality in terms of PSNR.
Three sequences with frame sizes 352 × 288 ,320 × 240, 160 × 120 are considered.
Energy consumption oscillates based on the size of the ROI.
Our method uses less energy and maintains quality compared to MJPEG (30-35 dB)
The classical method registers a sufficiently stable higher energy consumption value.
Less data processed and sent = less energy consumed.

Limitations and Future Work
Limitations
The channel conditions are not considered here.
The AI-based inference is done only for one task.
Future work
Further study: Further study is recommended to illustrate unusual coding
conditions and issues like the occurrence of outlier frames and/or outlier
blocks during the processing and transmission of the frame.

Recap.
It has been shown that the quality sacrificed of the non-ROI does not
influence the intelligent tasks at the destination but enhances them by virtue
of the content-aware strategy used.
Adopted for large-scale video monitoring: The proposed video coding strategy
could be adopted for large-scale video monitoring in an edge–cloud processing
paradigm using WMSN, where in-network-based scenarios should be
elaborated and assessed.

Scientific output of the thesis
Outline
1 Introduction
2 Related work
3 Dataset / Setup
(Contribution 4)

General Conclusion
We have treated in this thesis the problem of ROI detection and its
implementation as pre-encoder in wireless embedded surveillance systems.
This problem has been studied in the literature and still has many challenges
related to efficiency and accuracy.
We have worked on proposing multiple contributions that develop a pre-encoder
with very low overhead on the total system budget.
The pre-encoder has contributed in the saving of energy and bitrate, achieving
98% of gain.

General Conclusion
Either the detection efficiency and the gain have been assessed and validated
trough the conducted evaluation strategy and the used dataset/metrics.
The developed system has the capacity to enable easy monitoring for long lifetime
and with acceptable QoS
The developed system has also the capacity to enable both human based
monitoring and Machine-based monitoring opening the door to Cloud Edge based
AI applications for wireless surveillance.

Perspectives
The work can be extended to cover other modules of the wireless sensor node,
especially the used compression algorithm: which can be replaced by fast
transform algorithms.
It can also be extended to cover the adaptation of low-cost transmission protocols
to the context of ROI-based video coding.
After a software validation has been guaranteed trough this thesis, the work can
open the door to an implementation in embedded systems.

Peer Reviewed Journal Articles:
JP
[J1] Ahcen Aliouat, Nasreddine Kouadria, Moufida Maimour, Saliha Harize, and
Noureddine Doghmane. ”Region-of-interest based video coding strategy for
rate/ energy-constrained smart surveillance systems using WMSNs.” Ad Hoc
Networks 140 (2023): 103076. IF:4.9
[J2] Ahcen Aliouat, Nasreddine Kouadria, Saliha Harize and Moufida Maimour. ”An
Efficient Low Complexity Region-of-Interest Detection for Video Coding in
Wireless Visual Surveillance.” IEEE Access, 11, 26793-26806. IF: 3.41
[J3] Ahcen Aliouat, Nasreddine Kouadria, Doru Florin Chiper ”x-DTT: A package for
calculating Real and Integer Discrete Tchebichef Transform kernels based on
Orthogonal Polynomials” SoftwareX journal (Minor revision). IF=2.89
[J4] Ahcen Aliouat, Nasreddine Kouadria, Moufida Maimour and Saliha Harize.
”EVBS-CAT: Enhanced Video Background Subtraction with a Controlled
Adaptive Threshold for Constrained Wireless Video-surveillance” Under review:
Journal of Real-Time Image processing (Springer), IF=2.29

Peer-reviewed Conference
Publications/Proceedings
CP
[C1] Ahcen Aliouat, Nasreddine Kouadria, Moufida Maimour, and Saliha Harize.
”Region-of-interest based video coding strategy for low bitrate surveillance
systems.” In 2022 19th International Multi-Conference on Systems, Signals
Devices (SSD), pp. 1357-1362. IEEE, 2022.
[C1] Ahcen Aliouat, Nasreddine Kouadria, Saliha Harize, and Moufida Maimour.
”Multi-threshold-based frame segmentation for content-aware video coding in
WMSN.” In Advances in Computing Systems and Applications: Proceedings of the
5th Conference on Computing Systems and Applications, pp. 337-347. Cham:
Springer International Publishing, 2022.

Poster
Poster
[P1] Ahcen Aliouat, Nasreddine Kouadria and Saliha Harize. ”Low-Cost
Region-of-Interest Detection for Wireless Video Sensor Nodes” In Doctoral Days of
the LASA Laboratory, UBMA, June 2021.

References
Z. Zivkovic and F. Van Der Heijden, “Efficient adaptive density estimation
per image pixel for the task of background subtraction,” Pattern recognition
letters, vol. 27, no. 7, pp. 773–780, 2006.
C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for
real-time tracking,” in Proceedings. 1999 IEEE computer society conference
on computer vision and pattern recognition (Cat. No PR00149), vol. 2.
IEEE, 1999, pp. 246–252.
A. Elgammal, D. Harwood, and L. Davis, “Non-parametric model for
background subtraction,” in European conference on computer vision.
Springer, 2000, pp. 751–767.
Y. Benezeth, P.-M. Jodoin, B. Emile, H. Laurent, and C. Rosenberger,
“Comparative study of background subtraction algorithms,” Journal of
Electronic Imaging, vol. 19, no. 3, p. 033003, 2010.
Z. Zivkovic, “Improved adaptive gaussian mixture model for background
subtraction,” in Proceedings of the 17th International Conference on Pattern
Recognition, 2004. ICPR 2004., vol. 2. IEEE, 2004, pp. 28–31.
M. F. Savaş, H. Demirel, and B. Erkal, “Moving object detection using an

Thank You!

Thesis presentation Slides Ph.D. Aliouat Ahcen

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Thesis presentation Slides Ph.D. Aliouat Ahcen

Similar to Thesis presentation Slides Ph.D. Aliouat Ahcen (20)

Recently uploaded

Recently uploaded (20)

Thesis presentation Slides Ph.D. Aliouat Ahcen