Convolutional Neural Network for pixel-wise skyline detection
1. Convolutional Neural Network for pixel-wise
skyline detection
Darian Frajberg
Piero Fraternali
Rocio Nahime Torres
Department of Electronics, Information and Bioengineering, Politecnico di Milano
September 15, 2017
26th International Conference
on Artificial Neural Networks
2. Deep learning is a hot topic and it has achieved outstanding results outperforming previous techniques
in a very wide variety of applications (e.g., computer vision, speech recognition, NLP, etc)
Augmented Reality (AR) applications is an emerging class of software that is getting massive attention
(e.g., PokemonGo) and its market is projected to be huge
The integration of Artificial Intelligence and Augmented Reality applications can definitely lead to very
successful results, capable of attracting people to voluntarily execute diverse tasks
Goals to accomplish
• High accuracy
• Low power devices support
• High real-time performance
• Acceptable memory usage
• Acceptable battery consumption
2
Introduction and motivation
3. Use case
– Convolutional Neural Network (CNN) for mountain skyline
detection
– Integration of CNN for the development of an AR mobile app for
mountain peaks identification
Mountain skyline detection
– Simple scenarios
– Complex scenarios
3
Introduction and motivation
4. Mountain skyline detection for simple scenarios
– Comprises clear sky and continuous skylines
4
Introduction and motivation
(Input) ( Output)
5. Mountain skyline detection for complex scenarios
– May comprise fuzzy or interrupted skylines with obstacles
(e.g., clouds, trees, houses, cables, people, etc)
5
Introduction and motivation
(Input) ( Output)
6. Heuristic methods for skyline detection
– Edge-based
– Dynamic programming
– Solves simple scenarios
– Does not solve complex scenarios
Image-level CNN methods for skyline detection
– Semantic segmentation (seen as foreground-background problem)
– Solves simple scenarios
– To solve complex scenarios it would require ground truth extremely
difficult to generate
6
Related work
7. Successful pixel-level CNN methods for other purposes
– Detection of cancer in biomedical images
– Edges extraction
Our approach
– Use pixel-wise CNN for mountain skyline detection
7
Related work
12. Model architecture
12
Skyline extraction with CNN
Layer Type Input Kernel Stride Pad Output
Layer1 Conv 29 x 29 x 3 6 1 0 24 x 24 x 20
Layer2 Pool (max) 24 x 24 x 20 2 2 0 12 x 12 x 20
Layer3 Conv 12 x 12 x 20 5 1 0 8 x 8 x 50
Layer4 Pool (max) 8 x 8 x 50 2 2 0 4 x 4 x 50
Layer5 Conv 4 x 4 x 50 4 1 0 1 x 1 x 500
Layer6 Relu 1 x 1 x 500 - 1 0 1 x 1 x 500
Layer7 Conv 1 x 1 x 500 1 1 0 1 x 1 x 2
Layer8 Softmaxloss 1 x 1 x 2 - 1 0 1 x 1 x 2
13. Training
– Caffe framework
– Workstation with NVIDIA GeForce GTX 1080
– 61 minutes
– 428.732 learned parameters
13
Skyline extraction with CNN
14. Deployment of Fully Convolutional Network
– Input: Image
– Output: Spatial map in which each pixel is assigned a probability
of being positive (0..255 range)
14
Skyline extraction with CNN
(Input) ( Output)
17. Accuracy evaluated at image level with test dataset
– Average Skyline Accuracy (ASA)
– Average No Skyline Accuracy (ANSA)
– Average Accuracy (AA)
17
Evaluation
19. 19
Evaluation
Evaluation example
– Average Skyline Accuracy: 98%
– Average No Skyline Accuracy: 73%
– Average Accuracy: 94%
Ground truth annotation pixel
Correctly predicted skyline pixel
Incorrectly predicted skyline pixel
(Annotation) ( Evaluation)
20. Accuracy on test dataset images
20
Evaluation
Images Pixels
Per
Column
Threshold Average
Skyline
Accuracy
Average
No
Skyline
Accuracy
Average
Accuracy
Continuous skyline images from
test dataset
1 0 94,45% - 94,45%
Complete test dataset images 1 100 92,45% 20,14% 86,87%
21. Runtime performance
The dimension of a frame image impacts over:
– Accuracy
– Memory consumption
– Execution time
Good balance
– 321 x 241 px
We built our own library in native code for the deployment of the CNN on
the mobile
21
Evaluation
23. PeakLens is an outdoor AR mobile application that identifies
mountain peaks and overlays them in real-time on the view.
It extracts the mountain skyline with CNN and aligns it with
respect to the terrain skyline of the user’s current location.
23
Usage experience
100k installs
in Android
25. Concept
– CNN model for mountain skyline extraction trained with a large set of
annotated images taken in uncontrolled conditions
– Definition of metrics to evaluate the quality of the resulting skyline
– Support for its deployment over low-end mobile devices
– Integration of the module on an AR mobile app
Future work
– Optimization of the CNN model to achieve a faster execution time
– Improvement of obstacles management
– Improvement of pre-processing and post-processing steps
– Runtime performance comparison vs. Caffe2 and TensorFlow with
MobileNets (both released after ICANN’s submission deadline)
25
Conclusions
26. 26
Thanks For Your Attention!
Convolutional Neural Network
for pixel-wise skyline detection
Darian Frajberg
Piero Fraternali
Rocio Nahime Torres
darian.frajberg | piero.fraternali | rocionahime.torres
@polimi.it