Training deep learning models to count using synthetic images

TRAINING DEEP LEARNING MODELS TO COUNT
USING SYNTHETIC IMAGES
6 Sept. 2019DR. ANDREAS KAMILARIS
@DL-UAV19, PROC. OF CAIP 2019

Problem
• Difficult to create ground truth data
• UCSD Pedestrian Database
• Video of pedestrians on UCSD walkways, taken
from a stationary camera

Problem
• Difficult to create ground truth data
• UCF CC 50 dataset
• Counts of persons range between 94 and 4543, with an average
of 1280 individuals per image

Motivation
• Generate data by simulation, which are easy then to label.
• Create synthetic ground truth data
Rahnemoonfar, M. and Sheppard, C., 2017. Deep count: fruit counting based on deep
simulated learning. Sensors, 17(4), p.905.

First try
• Generated simulated data in Python
• Goal: Detection of fires in forests
• 160 images, 80 of forest, 80 of fire
• 80% training, 20% testing

First try
• Model: Inception-v3 vs. Custom

First try
• False negatives where there was smoke AND fire!
Data Custom Inception-v3
Only
generative
0.65 0.62
Only real 0.8 0.69
Combined,
augmented
dataset
0.9 0.71

Second try
• Better representation of fire

Second try
• Improvement of results!
• Very few false negatives
Data Custom Inception-v3
Only
generative
0.95 0.72
Only real 0.8 0.69
Combined,
augmented
dataset
1.0 0.79

Generative data case
Training based on 2,000
synthetic images (labelled as
fire or forest)
Testing based on 100 real-world aerial
photos (classified as 50 images of
forest and 50 images of fire)

Hypothesis 1
Generating synthetic data can help to train deep learning models,
without the need to create expensive (in terms of time and effort)
ground truth data!

Hypothesis 2
Generating synthetic data can help to train deep learning models
not only to classify, but also to count!
… not only simple problems, but also more advanced ones…

Application: Counting houses from aerial photos
 60 photos taken from satellite images in urban areas of Tanzania
 Manually counted the number of houses, to create out
testing/validation dataset.
 Each photo has [0,38] houses.

 Created synthetic training data, with automated house counting
 First naïve try: Involved only squares!
MSE = 41

 Second try: Added trees and small shadows
MSE = 29

 Third try: Added grass, fences and different orientation of
houses. Also added images without any houses
MSE = 20

 Many decisions to take along the way…
• Dropout rate (35-50% works well)
• Stride (2 is small, 10 is too big)
• Convolutions (7x7 initially seems a good option)
• Pre-training (ImageNet is not helping a lot)
• Max-pooling better than average pooling
• Dense layers at the end of the network, ReLU function

Adapted,
custom
topology
Inception-ResNet
7x7 input filter with
large stride
Dense fully-
connected layer
with ReLU

 Training vs. Testing MSE
The model can predict the number of houses with an error of 4,47 houses.
For example, for a photo with 20 houses, the model would predict in the
range of [16, 24].
Training based on 10,000
synthetic images (labelled
with exact number of houses)
Testing based on 60 real-
world aerial photos (labelled
with exact number of houses)

 Best Vs. Worst predictions

 Next steps:
• Crop houses from training dataset and reuse based on
random combinations in semi-synthetic images
• More realistic generation of data
• (GAN for counting? )
• Accountability
• Other domains:
o Agriculture (counting animals in farms)
o Energy (renewable energy in roofs)
o Environment and Climate (counting trees, plants,
endangered species of animals etc.)
o Microbiology (blood test analysis etc.)

 State of the art (published in 2019)
Kar, A., Prakash, A., Liu, M.Y., Cameracci, E., Yuan, J., Rusiniak, M., Acuna, D., Torralba, A. and Fidler, S.,
2019. Meta-Sim: Learning to Generate Synthetic Datasets. arXiv preprint arXiv:1904.11621.
Future Work and Research Direction

Future Work and Research Direction
 State of the art (published in 2019)
 Combining Counting CNN model with the ResNeXt architecture
Tian, M. et al., Automated pig counting using deep learning,
Computers and Electronics in Agriculture, vol. 163, pp. 1-10, 2019. MAE = 4.47
MAE = 2.77

Conclusion
• Synthetic data can be used for training DL models
• Can be applied in UAV-related applications (classification vs.
counting problems)
• More advanced techniques are required for improving
performance (e.g. probabilistic scene graph generation, density
maps)

THANKS FOR YOUR ATTENTION!
DR. ANDREAS KAMILARIS
EMAIL: A.KAMILARIS@UTWENTE.NL

Training deep learning models to count using synthetic images

Recommended

Recommended

More Related Content

Similar to Training deep learning models to count using synthetic images

Similar to Training deep learning models to count using synthetic images (20)

More from Andreas Kamilaris

More from Andreas Kamilaris (20)

Recently uploaded

Recently uploaded (20)

Training deep learning models to count using synthetic images