Seal of Good Local Governance (SGLG) 2024Final.pptx
When Remote Sensing Meets Artificial Intelligence
1. When Remote Sensing Meets
Artificial Intelligence
Taoyuan, Taiwan
4 Desember 2021
UNIVERSITAS NEGERI MAKASSAR
WAHYU RAHMANIAR, Ph.D.
Postdoctoral Researcher
National Taipei University of Technology
2. Biography
Education
▪ S.Si.: Elektronika dan Instrumentasi (ELINS), Universitas Gajah Mada
▪ M.Sc. - Ph.D.: Electrical Engineering, National Central University
Works Experience
▪ Laboratory Assistant, ELINS, Universitas Gajah Mada
▪ Research Assistant, Fasilkom, Universitas Indonesia
▪ Research Assistant, FTIF, Institut Teknologi Sepuluh Nopember
▪ Freelance
▪ Image Processing/Computer Vision Algorithm, Issa Technology
Internship Experience
▪ Ubiik, Hsinchu, Taiwan
▪ Shibaura Institute of Technology, Toyosu, Tokyo, Japan
Occupation
Office: National Taipei University of Technology, Taiwan
Position: Postdoctoral Researcher
Research Interest
Robotics, image processing, computer vision, intelligent control, and artificial intelligence
3. Biography
Undergraduate Research
Robotic Prototype For Volcano Eruption Monitoring Based on Temperature Detection Using Amplitude Shift Keying
Modulation Telemetry
Master Research
▪ Sensor Integration for Real-Time Data Acquisition in Aerial Surveillance
▪ A Novel Object Detection Method based on Fuzzy Sets Theory and Speed-Up Robust Feature
Doctoral Research
▪ Real-Time Automated Segmentation and Classification of Calcaneal Fractures in CT Images
▪ Real-Time Detection and Recognition of Multiple Moving Objects for Aerial Surveillance
▪ Online Digital Image Stabilization for an Unmanned Aerial Vehicle
▪ Real-Time Bi-Directional People Counting using an RGB-D Camera
▪ Distance Measurements of Unmanned Aerial Vehicles using Vision-Based Systems in Unknown Environments
H-index: 4 (Google Scholar); 4 (Scopus)
Journal Reviewer: Journal of Robotics and Control (JRC), Big Data and Cognitive Computing - MDPI, Sensor Review
(IF: 1.442) , Circuit World (IF: 1.698) , Journal of Electronic Imaging (IF: 0.78), Scientific Reports – Nature (IF: 4.379)
GitHub: https://github.com/wahyurahmaniar
7. Image Processing Example
1) Rescaling: zoom in, zoom out, cropping
2) Correcting illumination: brightness and contrast
3) Color manipulations: black and white, gray-scale, HSV, RGB, BGR
4) Filter: mean, median, low pass, high pass, gaussian, laplacian
5) Edge detection
Original Sobel Laplacian Canny
8. Image Processing Example
6) Morphology:
- Erode (local minimum)
- Dilate (local maximum)
- Opening (erosion then dilation)
- Closing (dilation then erosion)
- Morphology Gradient (MG)
(difference between dilation and erosion)
- Top Hat (TH)
(difference between image and opening)
- Black Hat
(difference between image and closing)
10. Artificial Intelligence (AI)
▪ AI is intelligence demonstrated by machines, as
opposed to the natural intelligence displayed by
humans or animals.
▪ Goals: reasoning, problem solving, knowledge
representation, planning, learning, natural
language processing, perception, prediction,
motion and manipulation, social intelligence, and
general intelligence.
▪ Media: image, sound, and data.
▪ AI before ML and DL: rules engines, expert
systems, knowledge graphs or symbolic AI.
▪ ML: supervised learning, unsupervised learning,
semi-supervised learning, reinforcement learning,
dimensionality reduction.
▪ DL: deep neural networks, deep belief networks,
deep reinforcement learning, recurrent neural
networks and convolutional neural networks.
13. Traditional AI
1) Traditional computer vision method: background subtraction
Disadvantage:
- Can't handle some lighting changes, including when it's dark
- Often detects 2 cars or more as 1 car
- Slow and less accurate than machine learning and deep learning
- Can't recognize the type and color of the car
14. Machine Learning
2) Machine learning: Haar cascade
Advantage:
- Faster computation time
Disadvantage:
- Can't handle dark illumination
- Less accurate than deep learning
- Can't recognize the type and color of the car
15. Deep Learning
2) Deep learning: YOLOv3
Advantage:
- Can handle any illumination
- Very accurate
- Can recognize the type and color of the car
Disadvantage:
- Requires large computation
18. Research 1: Moving Object Detection
Contribution
- Real-time computing performance for the detection and
recognition of multiple moving objects
- Accurate detection of moving objects using a UAV in complex
backgrounds
Rahmaniar, W.; Wang, W.-J.; Chen, H.-C. “Real-time detection and recognition of
multiple moving objects for aerial surveillance”, Electronics, vol. 8, no. 12, pp.
1373-1390, 2019. (SCI/IE, Q2, IF: 2.397)
System overview of moving objects detection and recognition
UAV movement modelling
19. Research 1: Object Detection
Cascade classifier for object detection and recognition
The angular value of the motion vector:
(a) (b)
Optical flow estimation.
(a) Original image. (b) Motion vectors.
( ) ( ) ( )
( ) ( )
1 180
tan
x
y
t
t
t
−
=
20. Research 1: Result
a. Motion vectors result without image stabilization
b. Motion vectors result with image stabilization
Image stabilization result
Method
Computation
Time (fps)
Accuracy (%)
PR Recall F-Measure
Proposed 47.08 94 91 92
Yazid et al. 1 70 76 72
Saif et al. - 66 86 74
Maier et al. - 94 89 91
Kalantar et al. 1.6 - - 73
Cai et al. 5 - - 76
Comparison of performance results
22. Research 2: Bi-directional People Counting
System overview of the proposed people counting
Contribution
- Fast and accurate object detection in the Depth images
- Map and register objects in the RGB images
- Fast and accurate tracking and counting objects
- Apply the algorithm to the embedded system
- Detect and count objects for “in” and “out” directions simultaneously
Rahmaniar, W.; Wang, W.-J.; Chiu, C.-W.; Hakim, N. L. “Real-time
bi-directional people counting using an RGB-D camera”, Sensor
Review, vol. 41, no. 4, pp. 341-349, 2021. (SCI/IE, Q2, IF: 1.442)
23. Research 2: Object Detection
(d) (e)
(a) (b) (c)
Foreground segmentation.
(a) Background removal.
(b) Closer object segmentation.
(c) Farther object segmentation.
(d) Binary image of the closer object.
(e) Binary image of the farther object.
24. Research 2: Object Registration
• The registered object’s heads are mapped to an RGB image:
(a) (b)
Object registration.
(a) Object detected in the depth RGB image. (b) Object registered in the RGB image.
( , ) ( ( , ) 2) ( , )
RGB
head n t head n t R shift x y
= +
25. Research 2: Result
Hardware setup
➢ Current precision rate: 99%.
➢ Average computation time: 30~40 fps.
➢ Detect and track up to 6 people in a frame.
Comparison of performance results
Method
Computation time (fps) Accuracy
PC Embedded system PR Recall F-Measure
Proposed 35 18 0.99 0.96 0.97
Cho et al. 20 4 0.96 0.98 0.97
Tokta et al. 10 - 0.87 - -
Garcia et al. 28 - 0.96 - -
Bondi et al. 20 - 0.97 0.89 0.93
27. Research 3: Indoor UAV distance measurement
System overview
Contribution
- Fast and accurate distance measurement of the UAV in the building
- UAV positioning in the unknown environments
- Camera stabilization and positioning within UAV movement
Rahmaniar, W.; Wang, W.-J.; Caesarendra, W.; Glowacz, A.;
Oprzędkiewicz, K; Sulowicz, M, Irfan, M. “Distance Measurement of
Unmanned Aerial Vehicles Using Vision-based Systems in Unknown
Environments”, Electronics, vol. 10, no. 14, pp.1647-1660, 2021.
(SCI/IE, Q2, IF: 2.397)
Indoor segmentation
29. Research Example
▪ Land Use and Land Cover Mapping
▪ Object Detection in Optical, Radar and Lidar Images
▪ Object Detection from Aerial and Space Platforms
▪ Moving Object Detection and Control
▪ Robotics and AI for Infrastructure Inspection and Monitoring
▪ Marine, Ocean and Climate Change Monitoring
▪ Plant Disease and Tree Health Detection
31. Research 1: Small Object Detection
▪ To improve detection accuracy on remote sensing images, researchers have used deep convolutional
neural network (CNN)-based super-resolution (SR) techniques to generate artificial images and then
detect objects
▪ Generative Adversarial Network (GAN)-based methods such as super-resolution GAN (SRGAN) and
enhanced super-resolution GAN (ESRGAN) showed remarkable performance in enhancing low-resolution
(LR) images with and without noise
▪ These models have two subnetworks: a generator and a discriminator. Both subnetworks consist of deep
CNNs.
▪ Datasets containing HR and LR image pairs are used for training and testing the models. The generator
generates HR images from LR input images, and the discriminator predicts whether generated image is a
real HR image or an upscaled LR image.
▪ After sufficient training, the generator generates HR images that are similar to the ground truth HR
images, and the discriminator cannot correctly discriminate between real and fake images anymore.
32. Research 1: Small Object Detection
▪ Detection on LR (low-resolution) images (60cm/pixel)
is shown in (I); in (II), we show the detection on
generated SR (super resolution) images (15 cm/pixel).
▪ The first row of this figure represents the COWC (car
overhead with context) dataset, and the second row
represents the OGST (oil and gas storage tank) dataset.
▪ FRCNN (faster region-based CNN) detector is used for
LR images for detection.
▪ Then EESRGAN (edge-enhanced SRGAN) and FRCNN
architecture (EESRGAN-FRCNN) are used to generate
SR images and simultaneously detect objects from the
SR images.
▪ The red bounding boxes represent true positives, and
yellow bounding boxes represent false negatives.
33. Research 1: Small Object Detection
▪ The generator G generates intermediate super-
resolution (ISR) images, and then final SR images are
generated after applying the EEN network.
▪ The discriminator (DRa) discriminates between ground
truth (GT) HR images and ISR. The inverted gradients of
DRa are backpropagated into the generator G in order to
create SR images allowing for accurate object detection.
▪ Edge information is extracted from ISR, and the EEN
network enhances these edges. Afterwards, the
enhanced edges are again added to the ISR after
subtracting the original edges extracted by the Laplacian
operator and the output SR images are generated with
enhanced edges.
▪ Finally, objects are detected from the SR images using
the detector network.
▪ Two different loss functions are used for EEN: one
compares the difference between SR and ground truth
images, and the other compares the difference between
the extracted edge from ISR and ground truth.
36. Research 2: Building Segmentation
▪ A deep multi-task learning (MTL) framework is proposed to simultaneously address building footprint
segmentation as the main task; building boundary extraction, and image reconstruction as auxiliary tasks for very
high-resolution RS images
▪ The proposed framework takes input image, ground truth map, and boundary extracted ground truth map as an
input and outputs; (1) segmentation map, (2) boundary segmentation map, and (3) reconstructed input image.
37. Research 2: Building Segmentation
▪ The objectives aimed to optimize in this paper are, cross-entropy for semantic segmentation and boundary
extraction tasks and mean absolute error (MAE or L1 loss) for the reconstruction task.
▪ The canny edge detection is used algorithm with a radius of 3, together with dilation to extract the
boundaries from whole ground truth masks.
▪ During experiments, the input images and corresponding masks are randomly cropped by 512 x 512 patches
as network inputs. The dataset is partitioned by a 7:2:1 ratio for train, validation, and test set, respectively.
▪ Data augmentation is applied to increase the dataset size artificially by deploying the following
transformations: horizontal-vertical flip, additive Gaussian noise, perspective, random brightness, gamma,
contrast, blur.
▪ ResNet101 is used as a backbone for the feature encoder.
▪ The encoder-decoder type network architecture is based on U-Net, with five encoder and following decoder
blocks with the filter sizes of (256, 128, 64, 32, 16)