Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
In this deck from the 2018 Swiss HPC Conference, Gilles Fourestey from EPFL presents: Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software.
"LENSTOOL is a gravitational lensing software that models mass distribution of galaxies and clusters. It was developed by Prof. Kneib, head of the LASTRO lab at EPFL, et al., starting from 1996. It is used to obtain sub-percent precision measurements of the total mass in galaxy clusters and constrain the dark matter self-interaction cross-section, a crucial ingredient to understanding its nature.
However, LENSTOOL lacks efficient vectorization and only uses OpenMP, which limits its execution to one node and can lead to execution times that exceed several months. Therefore, the LASTRO and the EPFL HPC group decided to rewrite the code from scratch and in order to minimize risk and maximize performance, a bottom-up approach that focuses on exposing parallelism at hardware and instruction levels was used. The result is a high performance code, fully vectorized on Xeon, Xeon Phis and GPUs that currently scales up to hundreds of nodes on CSCS’ Piz Daint, one of the fastest supercomputers in the world."
Watch the video: https://wp.me/p3RLHQ-ili
Learn more: https://infoscience.epfl.ch/record/234382/files/EPFL_TH8338.pdf?subformat=pdfa
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/introduction-to-simultaneous-localization-and-mapping-slam-a-presentation-from-gareth-cross/
Independent game developer (and former technical lead of state estimation at Skydio) Gareth Cross presents the “Introduction to Simultaneous Localization and Mapping (SLAM)” tutorial at the May 2021 Embedded Vision Summit.
This talk provides an introduction to the fundamentals of simultaneous localization and mapping (SLAM). Cross aims to provide foundational knowledge, and viewers are not expected to have any prerequisite experience in the field.
The talk consists of an introduction to the concept of SLAM, as well as practical design considerations in formulating SLAM problems. Visual inertial odometry is introduced as a motivating example of SLAM, and Cross explains how this problem is structured and solved.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/efficient-deep-learning-for-3d-point-cloud-understanding-a-presentation-from-facebook/
Bichen Wu, Research Scientist at Facebook Reality Labs, presents the “Efficient Deep Learning for 3D Point Cloud Understanding” tutorial at the May 2021 Embedded Vision Summit.
Understanding the 3D environment is a crucial computer vision capability required by a growing set of applications such as autonomous driving, AR/VR and AIoT. 3D visual information, captured by LiDAR and other sensors, is typically represented by a point cloud consisting of thousands of unstructured points.
Developing computer vision solutions to understand 3D point clouds requires addressing several challenges, including how to efficiently represent and process 3D point clouds, how to design efficient on-device neural networks to process 3D point clouds, and how to easily obtain data to train 3D models and improve data efficiency. In this talk, Wu shows how his company addresses these challenges as part of its “SqeezeSeg” research and presents a highly efficient, accurate, and data-efficient solution for on-device 3D point-cloud understanding.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-benosman
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Ryad B. Benosman, Professor at the University of Pittsburgh Medical Center, Carnegie Mellon University and Sorbonne Universitas, presents the "What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications" tutorial at the May 2018 Embedded Vision Summit.
In this presentation, Benosman introduces neuromorphic, event-based approaches for image sensing and processing. State-of-the-art image sensors suffer from severe limitations imposed by their very principle of operation. These sensors acquire the visual information as a series of “snapshots” recorded at discrete point in time, hence time-quantized at a predetermined frame rate, resulting in limited temporal resolution, low dynamic range and a high degree of redundancy in the acquired data. Nature suggests a different approach: Biological vision systems are driven and controlled by events happening within the scene in view, and not – like conventional image sensors – by artificially created timing and control signals that have no relation to the source of the visual information.
Translating the frameless paradigm of biological vision to artificial imaging systems implies that control over the acquisition of visual information is no longer imposed externally on an array of pixels but rather the decision making is transferred to each individual pixel, which handles its own information individually. Benosman introduces the fundamentals underlying such bio-inspired, event-based image sensing and processing approaches, and explores their strengths and weaknesses. He shows that bio-inspired vision systems have the potential to outperform conventional, frame-based vision acquisition and processing systems and to establish new benchmarks in terms of data compression, dynamic range, temporal resolution and power efficiency in applications such as 3D vision, object tracking, motor control and visual feedback loops, in real-time.
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Tomohiro Fukuda
This document describes a proposed method for estimating sky view factor (SVF) using semantic segmentation with deep learning networks. Specifically:
- It develops a system using SegNet and U-Net deep learning models to perform pixel-wise semantic segmentation of sky and non-sky areas from images to calculate SVF ratios.
- The system was trained on 300 manually segmented images and tested on 100 fisheye photographs, achieving 98% accuracy in estimating SVF under different sky conditions.
- Future work is needed to apply the system to live video streams rather than static images. The method provides an efficient, high-precision way to estimate important urban environmental metrics like SVF.
This document summarizes research on using multi-sensor image fusion over networked decision support systems to improve target recognition in complex environments. The research involved simulating a wireless mesh network to disseminate imagery data and evaluate quality of service. Experiments analyzed fused visual and thermal images from different sensors and their classification performance using computer vision and machine learning techniques. Results found that fused multi-spectral images had the best classification accuracy compared to single sensor images. The conclusions recommend further evaluating the mesh network, expanding experiments to more sensors and fusion techniques, and testing capabilities in tactical field environments.
Critical Infrastructure Monitoring Using UAV Imageryaditess
The use of two rapidly evolving approaches, the Unmanned Aerial Vehicles (UAVs) and Dense Image Matching (DIM) techniques is an attractive solution to extract high quality photogrammetric products like 3D point clouds and orthoimages.
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
In this deck from the 2018 Swiss HPC Conference, Gilles Fourestey from EPFL presents: Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software.
"LENSTOOL is a gravitational lensing software that models mass distribution of galaxies and clusters. It was developed by Prof. Kneib, head of the LASTRO lab at EPFL, et al., starting from 1996. It is used to obtain sub-percent precision measurements of the total mass in galaxy clusters and constrain the dark matter self-interaction cross-section, a crucial ingredient to understanding its nature.
However, LENSTOOL lacks efficient vectorization and only uses OpenMP, which limits its execution to one node and can lead to execution times that exceed several months. Therefore, the LASTRO and the EPFL HPC group decided to rewrite the code from scratch and in order to minimize risk and maximize performance, a bottom-up approach that focuses on exposing parallelism at hardware and instruction levels was used. The result is a high performance code, fully vectorized on Xeon, Xeon Phis and GPUs that currently scales up to hundreds of nodes on CSCS’ Piz Daint, one of the fastest supercomputers in the world."
Watch the video: https://wp.me/p3RLHQ-ili
Learn more: https://infoscience.epfl.ch/record/234382/files/EPFL_TH8338.pdf?subformat=pdfa
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/introduction-to-simultaneous-localization-and-mapping-slam-a-presentation-from-gareth-cross/
Independent game developer (and former technical lead of state estimation at Skydio) Gareth Cross presents the “Introduction to Simultaneous Localization and Mapping (SLAM)” tutorial at the May 2021 Embedded Vision Summit.
This talk provides an introduction to the fundamentals of simultaneous localization and mapping (SLAM). Cross aims to provide foundational knowledge, and viewers are not expected to have any prerequisite experience in the field.
The talk consists of an introduction to the concept of SLAM, as well as practical design considerations in formulating SLAM problems. Visual inertial odometry is introduced as a motivating example of SLAM, and Cross explains how this problem is structured and solved.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/efficient-deep-learning-for-3d-point-cloud-understanding-a-presentation-from-facebook/
Bichen Wu, Research Scientist at Facebook Reality Labs, presents the “Efficient Deep Learning for 3D Point Cloud Understanding” tutorial at the May 2021 Embedded Vision Summit.
Understanding the 3D environment is a crucial computer vision capability required by a growing set of applications such as autonomous driving, AR/VR and AIoT. 3D visual information, captured by LiDAR and other sensors, is typically represented by a point cloud consisting of thousands of unstructured points.
Developing computer vision solutions to understand 3D point clouds requires addressing several challenges, including how to efficiently represent and process 3D point clouds, how to design efficient on-device neural networks to process 3D point clouds, and how to easily obtain data to train 3D models and improve data efficiency. In this talk, Wu shows how his company addresses these challenges as part of its “SqeezeSeg” research and presents a highly efficient, accurate, and data-efficient solution for on-device 3D point-cloud understanding.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-benosman
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Ryad B. Benosman, Professor at the University of Pittsburgh Medical Center, Carnegie Mellon University and Sorbonne Universitas, presents the "What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications" tutorial at the May 2018 Embedded Vision Summit.
In this presentation, Benosman introduces neuromorphic, event-based approaches for image sensing and processing. State-of-the-art image sensors suffer from severe limitations imposed by their very principle of operation. These sensors acquire the visual information as a series of “snapshots” recorded at discrete point in time, hence time-quantized at a predetermined frame rate, resulting in limited temporal resolution, low dynamic range and a high degree of redundancy in the acquired data. Nature suggests a different approach: Biological vision systems are driven and controlled by events happening within the scene in view, and not – like conventional image sensors – by artificially created timing and control signals that have no relation to the source of the visual information.
Translating the frameless paradigm of biological vision to artificial imaging systems implies that control over the acquisition of visual information is no longer imposed externally on an array of pixels but rather the decision making is transferred to each individual pixel, which handles its own information individually. Benosman introduces the fundamentals underlying such bio-inspired, event-based image sensing and processing approaches, and explores their strengths and weaknesses. He shows that bio-inspired vision systems have the potential to outperform conventional, frame-based vision acquisition and processing systems and to establish new benchmarks in terms of data compression, dynamic range, temporal resolution and power efficiency in applications such as 3D vision, object tracking, motor control and visual feedback loops, in real-time.
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Tomohiro Fukuda
This document describes a proposed method for estimating sky view factor (SVF) using semantic segmentation with deep learning networks. Specifically:
- It develops a system using SegNet and U-Net deep learning models to perform pixel-wise semantic segmentation of sky and non-sky areas from images to calculate SVF ratios.
- The system was trained on 300 manually segmented images and tested on 100 fisheye photographs, achieving 98% accuracy in estimating SVF under different sky conditions.
- Future work is needed to apply the system to live video streams rather than static images. The method provides an efficient, high-precision way to estimate important urban environmental metrics like SVF.
This document summarizes research on using multi-sensor image fusion over networked decision support systems to improve target recognition in complex environments. The research involved simulating a wireless mesh network to disseminate imagery data and evaluate quality of service. Experiments analyzed fused visual and thermal images from different sensors and their classification performance using computer vision and machine learning techniques. Results found that fused multi-spectral images had the best classification accuracy compared to single sensor images. The conclusions recommend further evaluating the mesh network, expanding experiments to more sensors and fusion techniques, and testing capabilities in tactical field environments.
Critical Infrastructure Monitoring Using UAV Imageryaditess
The use of two rapidly evolving approaches, the Unmanned Aerial Vehicles (UAVs) and Dense Image Matching (DIM) techniques is an attractive solution to extract high quality photogrammetric products like 3D point clouds and orthoimages.
This document discusses using deep learning for seismic tomography. It begins with an overview of seismic tomography and the forward and inverse problems. It then discusses using deep learning approaches like empirical risk minimization with neural networks to solve the inverse problem. Several deep learning architectures are evaluated including those using semblance cubes, spectrograms of raw seismic data, and raw seismic data directly as input. Recurrent neural networks with LSTM and GRU cells are also explored for image reconstruction. The document concludes that while performance is good on simple models, more data and increased network capacity is needed for complex geology. It also lists several related publications.
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)Matthew O'Toole
Recent advances in both computational photography and displays have given rise to a new generation of computational devices. Computational cameras and displays provide a visual experience that goes beyond the capabilities of traditional systems by adding computational power to optics, lights, and sensors. These devices are breaking new ground in the consumer market, including lightfield cameras that redefine our understanding of pictures (Lytro), displays for visualizing 3D/4D content without special eyewear (Nintendo 3DS), motion-sensing devices that use light coded in space or time to detect motion and position (Kinect, Leap Motion), and a movement toward ubiquitous computing with wearable cameras and displays (Google Glass).
This short (1.5 hour) course serves as an introduction to the key ideas and an overview of the latest work in computational cameras, displays, and light transport.
The document describes a study that combined optical camera images and synthetic aperture radar (SAR) data to monitor glacier flow using remote and proximal sensing techniques. Fast correlation algorithms were used to calculate glacier displacement from both data sources. The results were then fused to derive 3D displacement vectors, highlighting both successes and challenges of the multi-sensor approach. Computation times were significantly reduced through algorithm optimization and parallelization.
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...Sergio Orts-Escolano
Slides used for the thesis defense of the PhD candidate Sergio Orts-Escolano.
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time.This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problems and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
This document provides an overview and summary of a presentation on Simultaneous Localization and Mapping (SLAM). It introduces the speaker, Dong-Won Shin, and his background and research in SLAM. The contents of the presentation are then outlined, including an introduction to SLAM, traditional SLAM approaches like Extended Kalman Filter SLAM and FastSLAM, efforts towards large-scale mapping like graph-based SLAM and loop closure detection, modern state-of-the-art systems like ORB SLAM, KinectFusion and Lidar SLAM, and applications of SLAM. Key algorithms in visual odometry, backend optimization, and loop closure detection are also summarized.
Towards Exascale Simulations for Regional-Scale Earthquake Hazard and Riskinside-BigData.com
The document discusses the goals and progress of the Department of Energy's Exascale Computing Project (ECP) to develop exascale simulations for regional-scale earthquake hazard and risk assessments. The ECP aims to (1) develop computational frameworks coupling geophysics and infrastructure modeling codes, (2) increase frequency resolution and reduce runtimes through advances in hardware, software, and algorithms, and (3) establish performance benchmarks to track progress towards exascale capabilities. Initial regional demonstrations in 2017 showed promising realism in simulated ground motions and infrastructure response. Further work includes waveform inversions, GPU optimizations, and assessing how far simulations can augment probabilistic hazard assessments.
This document analyzes KinectFusion, a real-time 3D reconstruction system using a moving depth camera. It introduces SLAMBench, a benchmarking framework for KinectFusion. The document describes the KinectFusion pipeline including preprocessing, tracking, integration and raycasting steps. It evaluates several RGB-D datasets and identifies the Washington RGB-D Scenes dataset as most suitable. It notes drawbacks in KinectFusion like noisy trajectories and inconsistent models. Future work proposed is reducing tracking noise using a Kalman filter.
Real-time large scale dense RGB-D SLAM with volumetric fusion extends KinectFusion to larger scales. It represents the volumetric reconstruction as a rolling buffer that translates as the camera moves. It estimates camera pose through combined geometric and photometric constraints. It closes loops by non-rigidly deforming the map with constraints from loop closures and jointly optimizes the camera poses and map. Evaluation shows it produces large, globally consistent, real-time dense reconstructions.
FastCampus 2018 SLAM Workshop
You can find the code diagrams via the link below.
https://www.dropbox.com/sh/u76i5hzdecd4ey7/AADgs9XzXt6k1j971vyBrFTea?dl=0
1. The document describes a mobile image recognition system using a CNN model called Network-in-Network. It was implemented as iOS and Android apps that can recognize food images without needing an online server.
2. The system achieves high accuracy of 78.8% for top-1 and 95.2% for top-5 recognition of food images from the UECFOOD100 dataset, with processing speeds of 55.7ms. It uses techniques like batch normalization and multi-threading to optimize performance on mobile.
3. The architecture was modified from the original Network-in-Network by adding batch normalization, reducing layers and kernels, and using multiple image sizes to balance recognition accuracy and speed. Global average pooling
The document presents a vision-based traffic surveillance system that uses digital image processing techniques. The system works to improve image quality by enhancing contrast and removing noise and blurring. It then uses edge detection and morphological processing to segment vehicles. The number of vehicles in each lane is counted and used to determine the time allotted for that lane, with accuracy of 90% compared to existing systems.
The document describes a process for analyzing drone images to generate geospatial data including extracting EXIF metadata from images, using overlapping images to generate a 3D point cloud, texturing the point cloud to create a mesh, deriving a digital elevation model from the mesh, and orthorectifying and georeferencing images.
2019年6月13日、SSII2019 Organized Session: Multimodal 4D sensing。エンドユーザー向け SLAM 技術の現在。登壇者:武笠 知幸(Research Scientist, Rakuten Institute of Technology)
https://confit.atlas.jp/guide/event/ssii2019/static/organized#OS2
This document summarizes research conducted using supercomputers to enable artificial intelligence applications for analyzing large earth science data. It discusses two major functions: designing efficient simulations and developing intelligent data mining methods. Specific projects are described, including simulating climate models on the Sunway TaihuLight, simulating earthquakes, and using remote sensing data and deep learning to map land cover more accurately, detect oil palm trees, and create more accurate urban land use maps. The research enables digital earth modeling to simulate, analyze, understand, predict, and mitigate earth science issues.
An Open Source solution for Three-Dimensional documentation: archaeological a...Giulio Bigliardi
The modern techniques of Structure from Motion (SfM) and Image-Based Modelling
(IBM) open new perspectives in the field of archaeological documentation, providing
a simple and accurate way to record three dimensional data.
The software Python Photogrammetry Toolbox (PPT) is an Open Source solution that
implements a pipeline to perform 3D reconstruction from a set of pictures. It takes
pictures as input and performs automatically the 3D reconstruction for the images for
which 3D registration is possible.
It is composed of python scripts that automate the different steps of the workflow.
The entire process is reduced in two commands, calibration and dense reconstruction.
The user can run it from a graphical interface or from terminal command. Calibration
is performed with Bundler while dense reconstruction is done through CMVS/PMVS.
Despite the automation, the user can control the final result choosing two initial
parameters: the image size and the feature detector. Acting on the first parameter
determines a reduction of the computation time and a decreasing density of the point
cloud. Acting on the feature detector influences the final result: PPT can work both
with SIFT (patent of the University of British Columbia - freely usable only for
research purpose) and with VLFEAT (released under GPL v.2 license). The use of
VLFEAT ensures a more accurate result, though it increases the time of calculation.
Python Photogrammetry Toolbox, released under GPL v.3 license, is a classical
example of FLOSS project in which instruments and knowledge are shared. The community works for the development of the software, sharing code modification,
feed-backs and bug-checking.
This document lists MATLAB project titles from 2009-2014 related to various IEEE transactions and conferences. It includes over 50 projects covering topics like image processing, signal processing, power electronics, renewable energy, and more. Contact information is provided for Triple Tech Soft to inquire about these MATLAB projects.
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
Introduction to computer vision with Convoluted Neural Networks - going over history of CNNs, describing basic concepts such as convolution and discussing applications of computer vision and image recognition technologies
This document provides an introduction to computer vision with convoluted neural networks. It discusses what computer vision aims to address, provides a brief overview of neural networks and their basic building blocks. It then covers the history and evolution of convolutional neural networks, how and why they work on digital images, their limitations, and applications like object detection. Examples are provided of early CNNs from the 1980s and 1990s and recent advancements through the 2010s that improved accuracy, including deeper networks, inception modules, residual connections, and efforts to increase performance like MobileNets. Training deep CNNs requires large datasets and may take weeks, but pre-trained networks can be fine-tuned for new tasks.
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
The document describes a method to automatically detect window regions in 3D point cloud data of indoor environments collected using a backpack sensor system. The method is based on R-CNN and uses MCG to generate region proposals, extracts features from proposals using a CNN, and classifies proposals as windows or non-windows using a random forest. Experiments on a dataset of 400 images achieved an F1 score of 89.79% and mAP of 96.64% for window detection, outperforming an existing method. Adding a small amount of manually labeled data further improved results.
This document discusses using deep learning for seismic tomography. It begins with an overview of seismic tomography and the forward and inverse problems. It then discusses using deep learning approaches like empirical risk minimization with neural networks to solve the inverse problem. Several deep learning architectures are evaluated including those using semblance cubes, spectrograms of raw seismic data, and raw seismic data directly as input. Recurrent neural networks with LSTM and GRU cells are also explored for image reconstruction. The document concludes that while performance is good on simple models, more data and increased network capacity is needed for complex geology. It also lists several related publications.
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)Matthew O'Toole
Recent advances in both computational photography and displays have given rise to a new generation of computational devices. Computational cameras and displays provide a visual experience that goes beyond the capabilities of traditional systems by adding computational power to optics, lights, and sensors. These devices are breaking new ground in the consumer market, including lightfield cameras that redefine our understanding of pictures (Lytro), displays for visualizing 3D/4D content without special eyewear (Nintendo 3DS), motion-sensing devices that use light coded in space or time to detect motion and position (Kinect, Leap Motion), and a movement toward ubiquitous computing with wearable cameras and displays (Google Glass).
This short (1.5 hour) course serves as an introduction to the key ideas and an overview of the latest work in computational cameras, displays, and light transport.
The document describes a study that combined optical camera images and synthetic aperture radar (SAR) data to monitor glacier flow using remote and proximal sensing techniques. Fast correlation algorithms were used to calculate glacier displacement from both data sources. The results were then fused to derive 3D displacement vectors, highlighting both successes and challenges of the multi-sensor approach. Computation times were significantly reduced through algorithm optimization and parallelization.
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...Sergio Orts-Escolano
Slides used for the thesis defense of the PhD candidate Sergio Orts-Escolano.
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time.This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problems and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
This document provides an overview and summary of a presentation on Simultaneous Localization and Mapping (SLAM). It introduces the speaker, Dong-Won Shin, and his background and research in SLAM. The contents of the presentation are then outlined, including an introduction to SLAM, traditional SLAM approaches like Extended Kalman Filter SLAM and FastSLAM, efforts towards large-scale mapping like graph-based SLAM and loop closure detection, modern state-of-the-art systems like ORB SLAM, KinectFusion and Lidar SLAM, and applications of SLAM. Key algorithms in visual odometry, backend optimization, and loop closure detection are also summarized.
Towards Exascale Simulations for Regional-Scale Earthquake Hazard and Riskinside-BigData.com
The document discusses the goals and progress of the Department of Energy's Exascale Computing Project (ECP) to develop exascale simulations for regional-scale earthquake hazard and risk assessments. The ECP aims to (1) develop computational frameworks coupling geophysics and infrastructure modeling codes, (2) increase frequency resolution and reduce runtimes through advances in hardware, software, and algorithms, and (3) establish performance benchmarks to track progress towards exascale capabilities. Initial regional demonstrations in 2017 showed promising realism in simulated ground motions and infrastructure response. Further work includes waveform inversions, GPU optimizations, and assessing how far simulations can augment probabilistic hazard assessments.
This document analyzes KinectFusion, a real-time 3D reconstruction system using a moving depth camera. It introduces SLAMBench, a benchmarking framework for KinectFusion. The document describes the KinectFusion pipeline including preprocessing, tracking, integration and raycasting steps. It evaluates several RGB-D datasets and identifies the Washington RGB-D Scenes dataset as most suitable. It notes drawbacks in KinectFusion like noisy trajectories and inconsistent models. Future work proposed is reducing tracking noise using a Kalman filter.
Real-time large scale dense RGB-D SLAM with volumetric fusion extends KinectFusion to larger scales. It represents the volumetric reconstruction as a rolling buffer that translates as the camera moves. It estimates camera pose through combined geometric and photometric constraints. It closes loops by non-rigidly deforming the map with constraints from loop closures and jointly optimizes the camera poses and map. Evaluation shows it produces large, globally consistent, real-time dense reconstructions.
FastCampus 2018 SLAM Workshop
You can find the code diagrams via the link below.
https://www.dropbox.com/sh/u76i5hzdecd4ey7/AADgs9XzXt6k1j971vyBrFTea?dl=0
1. The document describes a mobile image recognition system using a CNN model called Network-in-Network. It was implemented as iOS and Android apps that can recognize food images without needing an online server.
2. The system achieves high accuracy of 78.8% for top-1 and 95.2% for top-5 recognition of food images from the UECFOOD100 dataset, with processing speeds of 55.7ms. It uses techniques like batch normalization and multi-threading to optimize performance on mobile.
3. The architecture was modified from the original Network-in-Network by adding batch normalization, reducing layers and kernels, and using multiple image sizes to balance recognition accuracy and speed. Global average pooling
The document presents a vision-based traffic surveillance system that uses digital image processing techniques. The system works to improve image quality by enhancing contrast and removing noise and blurring. It then uses edge detection and morphological processing to segment vehicles. The number of vehicles in each lane is counted and used to determine the time allotted for that lane, with accuracy of 90% compared to existing systems.
The document describes a process for analyzing drone images to generate geospatial data including extracting EXIF metadata from images, using overlapping images to generate a 3D point cloud, texturing the point cloud to create a mesh, deriving a digital elevation model from the mesh, and orthorectifying and georeferencing images.
2019年6月13日、SSII2019 Organized Session: Multimodal 4D sensing。エンドユーザー向け SLAM 技術の現在。登壇者:武笠 知幸(Research Scientist, Rakuten Institute of Technology)
https://confit.atlas.jp/guide/event/ssii2019/static/organized#OS2
This document summarizes research conducted using supercomputers to enable artificial intelligence applications for analyzing large earth science data. It discusses two major functions: designing efficient simulations and developing intelligent data mining methods. Specific projects are described, including simulating climate models on the Sunway TaihuLight, simulating earthquakes, and using remote sensing data and deep learning to map land cover more accurately, detect oil palm trees, and create more accurate urban land use maps. The research enables digital earth modeling to simulate, analyze, understand, predict, and mitigate earth science issues.
An Open Source solution for Three-Dimensional documentation: archaeological a...Giulio Bigliardi
The modern techniques of Structure from Motion (SfM) and Image-Based Modelling
(IBM) open new perspectives in the field of archaeological documentation, providing
a simple and accurate way to record three dimensional data.
The software Python Photogrammetry Toolbox (PPT) is an Open Source solution that
implements a pipeline to perform 3D reconstruction from a set of pictures. It takes
pictures as input and performs automatically the 3D reconstruction for the images for
which 3D registration is possible.
It is composed of python scripts that automate the different steps of the workflow.
The entire process is reduced in two commands, calibration and dense reconstruction.
The user can run it from a graphical interface or from terminal command. Calibration
is performed with Bundler while dense reconstruction is done through CMVS/PMVS.
Despite the automation, the user can control the final result choosing two initial
parameters: the image size and the feature detector. Acting on the first parameter
determines a reduction of the computation time and a decreasing density of the point
cloud. Acting on the feature detector influences the final result: PPT can work both
with SIFT (patent of the University of British Columbia - freely usable only for
research purpose) and with VLFEAT (released under GPL v.2 license). The use of
VLFEAT ensures a more accurate result, though it increases the time of calculation.
Python Photogrammetry Toolbox, released under GPL v.3 license, is a classical
example of FLOSS project in which instruments and knowledge are shared. The community works for the development of the software, sharing code modification,
feed-backs and bug-checking.
This document lists MATLAB project titles from 2009-2014 related to various IEEE transactions and conferences. It includes over 50 projects covering topics like image processing, signal processing, power electronics, renewable energy, and more. Contact information is provided for Triple Tech Soft to inquire about these MATLAB projects.
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
Introduction to computer vision with Convoluted Neural Networks - going over history of CNNs, describing basic concepts such as convolution and discussing applications of computer vision and image recognition technologies
This document provides an introduction to computer vision with convoluted neural networks. It discusses what computer vision aims to address, provides a brief overview of neural networks and their basic building blocks. It then covers the history and evolution of convolutional neural networks, how and why they work on digital images, their limitations, and applications like object detection. Examples are provided of early CNNs from the 1980s and 1990s and recent advancements through the 2010s that improved accuracy, including deeper networks, inception modules, residual connections, and efforts to increase performance like MobileNets. Training deep CNNs requires large datasets and may take weeks, but pre-trained networks can be fine-tuned for new tasks.
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
The document describes a method to automatically detect window regions in 3D point cloud data of indoor environments collected using a backpack sensor system. The method is based on R-CNN and uses MCG to generate region proposals, extracts features from proposals using a CNN, and classifies proposals as windows or non-windows using a random forest. Experiments on a dataset of 400 images achieved an F1 score of 89.79% and mAP of 96.64% for window detection, outperforming an existing method. Adding a small amount of manually labeled data further improved results.
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...CodeOps Technologies LLP
Deep Learning is enabling a wide range of computer vision applications from advanced driver assistance systems to sophisticated medical diagnostic devices. However, designing and deploying these applications involve a lot of challenges like handling large datasets, developing optimized models, effectively performing GPU computing and efficiently deploying deep learning models to embedded boards like NVIDIA Jetson. This session illustrates how MATLAB supports all phases of this workflow starting with algorithm design to automatically generating portable and optimized CUDA code helping engineers and scientists address the commonly observed challenges in deep learning workflow
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
Speaker: Paul Navrátil, Texas Advanced Computing Center (TACC)
The design emphasis for supercomputing systems has moved from raw performance to performance-per-watt, and as a result, supercomputing architectures are converging on processors with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in software. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this “software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be performed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization).
This talk presents recent work in high-fidelity visualization using the OSPRay ray tracing framework on TACC’s local and remote visualization systems. We present work using OSPRay within ParaView Catalyst in situ framework from Kitware, including capitalizing on opportunities to reduce data costs migrating through VTK filters for visualization. We highlight the performance opportunities and advantages of Intel® Advanced Vector Extensions 512, the memory system improvements possible with Intel® Xeon Phi™ processor multi-channel DRAM (MCDRAM) and the Intel® Omni-Path Architecture interconnect.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
Real time-image-processing-applied-to-traffic-queue-detection-algorithmajayrampelli
This document describes a real-time image processing algorithm for detecting traffic queues. The algorithm uses two operations: motion detection and vehicle detection. Motion detection involves differencing consecutive image frames and comparing the difference to a threshold. Vehicle detection applies edge detection techniques to image profiles. The algorithm aims to measure queue parameters like length, occurrence period, and slope in real-time using low-cost systems. It processes image sub-profiles to reduce computation time. Experimental results found the algorithm could measure queue length with 95% accuracy.
Leading water utility company in USA was facing a challenge to improve pipeline inspection process to reduce human errors and manual inspection time.Pipeline Anomaly Detection automates the process of identification of defects in pipeline videos, by a camera which notes the observations and lastly it generates the report.
1. The document discusses using deep learning techniques for surface defect detection, focusing on strategies for dealing with imbalanced training data.
2. It proposes using generative adversarial networks (GANs) to generate synthetic defect samples in order to address the class imbalance problem. Convolutional neural networks (CNNs) are then used for classification.
3. Autoencoding models like convolutional autoencoders (CAE) and variational autoencoders (VAE) can also be used for unsupervised defect detection based on image reconstruction.
The document discusses several projects and implementations done by Karishma Jain related to computer vision and deep learning. These include visual question answering using CNNs and RNNs, parallelizing an ADABOOST classifier on different platforms, designing a lane departure warning system using monocular camera, and implementing various CNN architectures for MNIST classification achieving up to 97.74% accuracy.
This document discusses high-speed, high-resolution inspection of flat panel displays. It introduces a distributed image sensor computing system (DISCS) that uses GPUs for parallel processing to enable fast, in-line inspection. The DISCS uses dark-field illumination and line scan cameras for 2D defect detection. Algorithms like binarization and edge detection are implemented on the GPUs. Experimental results on touch panels and glass show inspection of megapixel images in under 3 seconds. Stereoscopic line scanning and moire topography techniques are discussed for 3D surface profiling with nanometer resolution and micrometer depth detection. Phase shifting interferometry is used to extract height maps. The system is designed for industrial inspection and could integrate
URBAN OBJECT DETECTION IN UAV RESNETpptxbalajimankena
This document proposes developing a ResNet neural network for object detection in urban areas using UAV images. It discusses limitations in existing methods and the need for an effective deep learning model specifically designed for UAV data. The proposed method uses a ResNet architecture combined with YOLO for real-time object detection. The network is trained on COCO and PASCAL VOC datasets. Evaluation shows the ResNet model achieves 95% accuracy on a test dataset. Future work involves classifying satellite images to improve accuracy over traditional methods.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMcsandit
Computer vision approaches are increasingly used in mobile robotic systems, since they allow
to obtain a very good representation of the environment by using low-power and cheap sensors.
In particular it has been shown that they can compete with standard solutions based on laser
range scanners when dealing with the problem of simultaneous localization and mapping
(SLAM), where the robot has to explore an unknown environment while building a map of it and
localizing in the same map. We present a package for simultaneous localization and mapping in
ROS (Robot Operating System) using a monocular camera sensor only. Experimental results in
real scenarios as well as on standard datasets show that the algorithm is able to track the
trajectory of the robot and build a consistent map of small environments, while running in near
real-time on a standard PC.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Weapon Detection using Machine Learning and Deep
Learning.
Technologies Used: SSD Algorithm, Faster RCNN, YOLO Algorithm. •
● Automatic weapon detection using a Convolution Neural
Network (CNN) based SSD and Faster RCNN algorithms.
● The primary goal of this project is to enhance security and
public safety.
● The process of detecting weapons in real-time or through the
analysis of recorded data, such as video feeds or images
This document describes a proposed method for real-time object detection using Single Shot Multi-Box Detection (SSD) with the MobileNet model. SSD is a single, unified network for object detection that eliminates feature resampling and combines predictions. MobileNet is used to create a lightweight network by employing depthwise separable convolutions, which significantly reduces model size compared to regular convolutions. The proposed SSD with MobileNet model achieved improved accuracy in identifying real-time household objects while maintaining the detection speed of SSD.
Similar to Convolutional Neural Network for pixel-wise skyline detection (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
Convolutional Neural Network for pixel-wise skyline detection
1. Convolutional Neural Network for pixel-wise
skyline detection
Darian Frajberg
Piero Fraternali
Rocio Nahime Torres
Department of Electronics, Information and Bioengineering, Politecnico di Milano
September 15, 2017
26th International Conference
on Artificial Neural Networks
2. Deep learning is a hot topic and it has achieved outstanding results outperforming previous techniques
in a very wide variety of applications (e.g., computer vision, speech recognition, NLP, etc)
Augmented Reality (AR) applications is an emerging class of software that is getting massive attention
(e.g., PokemonGo) and its market is projected to be huge
The integration of Artificial Intelligence and Augmented Reality applications can definitely lead to very
successful results, capable of attracting people to voluntarily execute diverse tasks
Goals to accomplish
• High accuracy
• Low power devices support
• High real-time performance
• Acceptable memory usage
• Acceptable battery consumption
2
Introduction and motivation
3. Use case
– Convolutional Neural Network (CNN) for mountain skyline
detection
– Integration of CNN for the development of an AR mobile app for
mountain peaks identification
Mountain skyline detection
– Simple scenarios
– Complex scenarios
3
Introduction and motivation
4. Mountain skyline detection for simple scenarios
– Comprises clear sky and continuous skylines
4
Introduction and motivation
(Input) ( Output)
5. Mountain skyline detection for complex scenarios
– May comprise fuzzy or interrupted skylines with obstacles
(e.g., clouds, trees, houses, cables, people, etc)
5
Introduction and motivation
(Input) ( Output)
6. Heuristic methods for skyline detection
– Edge-based
– Dynamic programming
– Solves simple scenarios
– Does not solve complex scenarios
Image-level CNN methods for skyline detection
– Semantic segmentation (seen as foreground-background problem)
– Solves simple scenarios
– To solve complex scenarios it would require ground truth extremely
difficult to generate
6
Related work
7. Successful pixel-level CNN methods for other purposes
– Detection of cancer in biomedical images
– Edges extraction
Our approach
– Use pixel-wise CNN for mountain skyline detection
7
Related work
12. Model architecture
12
Skyline extraction with CNN
Layer Type Input Kernel Stride Pad Output
Layer1 Conv 29 x 29 x 3 6 1 0 24 x 24 x 20
Layer2 Pool (max) 24 x 24 x 20 2 2 0 12 x 12 x 20
Layer3 Conv 12 x 12 x 20 5 1 0 8 x 8 x 50
Layer4 Pool (max) 8 x 8 x 50 2 2 0 4 x 4 x 50
Layer5 Conv 4 x 4 x 50 4 1 0 1 x 1 x 500
Layer6 Relu 1 x 1 x 500 - 1 0 1 x 1 x 500
Layer7 Conv 1 x 1 x 500 1 1 0 1 x 1 x 2
Layer8 Softmaxloss 1 x 1 x 2 - 1 0 1 x 1 x 2
13. Training
– Caffe framework
– Workstation with NVIDIA GeForce GTX 1080
– 61 minutes
– 428.732 learned parameters
13
Skyline extraction with CNN
14. Deployment of Fully Convolutional Network
– Input: Image
– Output: Spatial map in which each pixel is assigned a probability
of being positive (0..255 range)
14
Skyline extraction with CNN
(Input) ( Output)
17. Accuracy evaluated at image level with test dataset
– Average Skyline Accuracy (ASA)
– Average No Skyline Accuracy (ANSA)
– Average Accuracy (AA)
17
Evaluation
19. 19
Evaluation
Evaluation example
– Average Skyline Accuracy: 98%
– Average No Skyline Accuracy: 73%
– Average Accuracy: 94%
Ground truth annotation pixel
Correctly predicted skyline pixel
Incorrectly predicted skyline pixel
(Annotation) ( Evaluation)
20. Accuracy on test dataset images
20
Evaluation
Images Pixels
Per
Column
Threshold Average
Skyline
Accuracy
Average
No
Skyline
Accuracy
Average
Accuracy
Continuous skyline images from
test dataset
1 0 94,45% - 94,45%
Complete test dataset images 1 100 92,45% 20,14% 86,87%
21. Runtime performance
The dimension of a frame image impacts over:
– Accuracy
– Memory consumption
– Execution time
Good balance
– 321 x 241 px
We built our own library in native code for the deployment of the CNN on
the mobile
21
Evaluation
23. PeakLens is an outdoor AR mobile application that identifies
mountain peaks and overlays them in real-time on the view.
It extracts the mountain skyline with CNN and aligns it with
respect to the terrain skyline of the user’s current location.
23
Usage experience
100k installs
in Android
25. Concept
– CNN model for mountain skyline extraction trained with a large set of
annotated images taken in uncontrolled conditions
– Definition of metrics to evaluate the quality of the resulting skyline
– Support for its deployment over low-end mobile devices
– Integration of the module on an AR mobile app
Future work
– Optimization of the CNN model to achieve a faster execution time
– Improvement of obstacles management
– Improvement of pre-processing and post-processing steps
– Runtime performance comparison vs. Caffe2 and TensorFlow with
MobileNets (both released after ICANN’s submission deadline)
25
Conclusions
26. 26
Thanks For Your Attention!
Convolutional Neural Network
for pixel-wise skyline detection
Darian Frajberg
Piero Fraternali
Rocio Nahime Torres
darian.frajberg | piero.fraternali | rocionahime.torres
@polimi.it