Practical steps for non-machine learners on how to prepare your medical image dataset for deep learning modelling.
Here we use a fundus image dataset as an example that might have controls (healthy eyes) and glaucomatous fundus images with three different severities. In glaucoma, the optic disc is of a special interest so we want to annotate that from the images using a bounding box to help the deep learning training.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Falling costs with rising quality via hardware innovations and deep learning.
Technical introduction for scanning technologies from Structure-from-Motion (SfM), Range sensing (e.g. Kinect and Matterport) to Laser scanning (e.g. LiDAR), and the associated traditional and deep learning-based processing techniques.
Note! Due to small font size, and bad rendering by SlideShare, better to download the slides locally to your device
Alternative download link for the PDF:
https://www.dropbox.com/s/eclyy45k3gz66ve/proptech_emergingScanningTech.pdf?dl=0
Image restoration techniques covered such as denoising, deblurring and super-resolution for 3D images and models.
From classical computer vision techniques to contemporary deep learning based processing for both ordered and unordered point clouds, depth maps and meshes.
With a focus on hardware-centric deep learning, and end-to-end deep learning pipelines for diagnosis including imaging optimization
Alternative download link:
https://www.dropbox.com/s/bmdg2vzp6k9p9pe/portable_medicalDiagnostics_embeddedComputing.pdf?dl=0
Shallow introduction for Deep Learning Retinal Image AnalysisPetteriTeikariPhD
Overview of retinal imaging techniques such as fundus photography, optical coherence tomography (OCT) along with future upgrades such as multispectral imaging, OCT angiography, adaptive optics imaging and polarization-sensitive OCT. This is followed by an overview of deep learning image analysis methods suitable to be used with retinal imaging techniques.
Alternative download link: https://www.dropbox.com/s/n01w02cjaf68vbo/retina_deepLearning_pipeline.pdf?dl=0
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Practical steps for non-machine learners on how to prepare your medical image dataset for deep learning modelling.
Here we use a fundus image dataset as an example that might have controls (healthy eyes) and glaucomatous fundus images with three different severities. In glaucoma, the optic disc is of a special interest so we want to annotate that from the images using a bounding box to help the deep learning training.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Falling costs with rising quality via hardware innovations and deep learning.
Technical introduction for scanning technologies from Structure-from-Motion (SfM), Range sensing (e.g. Kinect and Matterport) to Laser scanning (e.g. LiDAR), and the associated traditional and deep learning-based processing techniques.
Note! Due to small font size, and bad rendering by SlideShare, better to download the slides locally to your device
Alternative download link for the PDF:
https://www.dropbox.com/s/eclyy45k3gz66ve/proptech_emergingScanningTech.pdf?dl=0
Image restoration techniques covered such as denoising, deblurring and super-resolution for 3D images and models.
From classical computer vision techniques to contemporary deep learning based processing for both ordered and unordered point clouds, depth maps and meshes.
With a focus on hardware-centric deep learning, and end-to-end deep learning pipelines for diagnosis including imaging optimization
Alternative download link:
https://www.dropbox.com/s/bmdg2vzp6k9p9pe/portable_medicalDiagnostics_embeddedComputing.pdf?dl=0
Shallow introduction for Deep Learning Retinal Image AnalysisPetteriTeikariPhD
Overview of retinal imaging techniques such as fundus photography, optical coherence tomography (OCT) along with future upgrades such as multispectral imaging, OCT angiography, adaptive optics imaging and polarization-sensitive OCT. This is followed by an overview of deep learning image analysis methods suitable to be used with retinal imaging techniques.
Alternative download link: https://www.dropbox.com/s/n01w02cjaf68vbo/retina_deepLearning_pipeline.pdf?dl=0
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Using Mask R CNN to Isolate PV Panels from Background Object in Imagesijtsrd
Identifying foreground objects in an image is one of the most common operations used in image processing. In this work, Mask R CNN algorithm is used to identify solar photovoltaic PV panels in aerial images and create a mask that can be used to remove the background from the images. This allows processing the PV panels separately. Using ML to solve this problem can generate more accurate results in comparison to more traditional image processing techniques like using edge detection or Gaussian filtering especially in images where the view might not be easily separable from the objects of interest. The trained model was found to be successful in detecting the PV panels and selecting the pixels that belong to them while ignoring the background pixels. This kind of work can be useful in collecting information about PV installation present in aerial or satellite imagery, or in analyzing the health and integrity of PV modules in large scale installations e.g., in a solar power plant. The results show that this method is effective with a high potential for improved results if the model is trained using larger and more diverse datasets. Muhammet Sait | Atilla Erguzen | Erdal Erdal "Using Mask R-CNN to Isolate PV Panels from Background Object in Images" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38173.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/38173/using-mask-rcnn-to-isolate-pv-panels-from-background-object-in-images/muhammet-sait
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...sipij
Detecting clouds in satellite imagery is becoming more important with increasing data availability which
are generated by earth observing satellites. Hence, intellectual processing of the enormous amount of data
received by hundreds of earth receiving stations, with specific satellite image oriented approaches,
presents itself as a pressing need. One of the most important steps in previous stages of satellite image
processing is cloud detection. While there are many approaches that compact with different semantic
meaning, there are rarely approaches that compact specifically with cloud and cloud cover detection. In
this paper, the technique presented is the scene based adaptive cloud, cloud cover detection and find the
position with assumption of sun reflection, background varying and scattering are constant. The capability
of the developed system was tested using dedicated satellite images and assessed in terms of cloud
percentage coverage. The system used for this process comprises of Intel(R) Xenon(R) CPU E31245 @
3.30GHz processor along with MATLAB 13 software and DSPC6713 processor along with Code Compose
Studio 3.1.
Using physics-based OCT Monte Carlo simulation and wave optics models for synthesising new OCT volumes for ophthalmic deep learning.
Alternative download link:
https://www.dropbox.com/s/ax15qy47yi76eex/OCT_MonteCarlo.pdf?dl=0
Interpretable AI: Not Just For RegulatorsDatabricks
Machine learning systems are used today to make life-altering decisions about employment, bail, parole, and lending. Moreover, the scope of decisions delegated to machine learning systems seems likely only to expand in the future. Unfortunately serious discrimination, privacy, and even accuracy concerns can be raised about these systems. Many researchers and practitioners are tackling disparate impact, inaccuracy, privacy issues, and security problems with a number of brilliant, but often siloed, approaches. This presentation illustrates how to combine innovations from several sub-disciplines of machine learning research to train explainable, fair, trustable, and accurate predictive modeling systems. Together these techniques create a new and truly human-centered type of machine learning suitable for use in business- and life-critical decision support.
Author: Patrick Hall
- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
A Novel Visual Cryptographic Scheme Using Floyd Steinberg Half Toning and Block Replacement Algorithms Nisha Menon K – PG Scholar,
Minu Kuriakose – Assistant Professor,
Department of Electronics and Communication,
Federal Institute of Science and Technology, Ernakulam, India
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
Wireless network implementation is a viable option for building network infrastructure in rural communities. Rural people lack network infrastructures for information services and socio-economic development. The aim of this study was to develop a wireless network infrastructure architecture for network services to rural dwellers. A user-centered approach was applied in the study and a wireless network infrastructure was designed and deployed to cover five rural locations. Data was collected and analyzed to assess the performance of the network facilities. The results shows that the system had been performing adequately without any downtime with an average of 200 users per month and the quality of service has remained high. The transmit/receive rate of 300Mbps was thrice as fast as the normal Ethernet transmit/receive specification with an average throughput of 1 Mbps. The multiple output/multiple input (MIMO) point-to-multipoint network design increased the network throughput and the quality of service experienced by the users.
Using Mask R CNN to Isolate PV Panels from Background Object in Imagesijtsrd
Identifying foreground objects in an image is one of the most common operations used in image processing. In this work, Mask R CNN algorithm is used to identify solar photovoltaic PV panels in aerial images and create a mask that can be used to remove the background from the images. This allows processing the PV panels separately. Using ML to solve this problem can generate more accurate results in comparison to more traditional image processing techniques like using edge detection or Gaussian filtering especially in images where the view might not be easily separable from the objects of interest. The trained model was found to be successful in detecting the PV panels and selecting the pixels that belong to them while ignoring the background pixels. This kind of work can be useful in collecting information about PV installation present in aerial or satellite imagery, or in analyzing the health and integrity of PV modules in large scale installations e.g., in a solar power plant. The results show that this method is effective with a high potential for improved results if the model is trained using larger and more diverse datasets. Muhammet Sait | Atilla Erguzen | Erdal Erdal "Using Mask R-CNN to Isolate PV Panels from Background Object in Images" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38173.pdf Paper URL : https://www.ijtsrd.com/engineering/computer-engineering/38173/using-mask-rcnn-to-isolate-pv-panels-from-background-object-in-images/muhammet-sait
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...sipij
Detecting clouds in satellite imagery is becoming more important with increasing data availability which
are generated by earth observing satellites. Hence, intellectual processing of the enormous amount of data
received by hundreds of earth receiving stations, with specific satellite image oriented approaches,
presents itself as a pressing need. One of the most important steps in previous stages of satellite image
processing is cloud detection. While there are many approaches that compact with different semantic
meaning, there are rarely approaches that compact specifically with cloud and cloud cover detection. In
this paper, the technique presented is the scene based adaptive cloud, cloud cover detection and find the
position with assumption of sun reflection, background varying and scattering are constant. The capability
of the developed system was tested using dedicated satellite images and assessed in terms of cloud
percentage coverage. The system used for this process comprises of Intel(R) Xenon(R) CPU E31245 @
3.30GHz processor along with MATLAB 13 software and DSPC6713 processor along with Code Compose
Studio 3.1.
Using physics-based OCT Monte Carlo simulation and wave optics models for synthesising new OCT volumes for ophthalmic deep learning.
Alternative download link:
https://www.dropbox.com/s/ax15qy47yi76eex/OCT_MonteCarlo.pdf?dl=0
Interpretable AI: Not Just For RegulatorsDatabricks
Machine learning systems are used today to make life-altering decisions about employment, bail, parole, and lending. Moreover, the scope of decisions delegated to machine learning systems seems likely only to expand in the future. Unfortunately serious discrimination, privacy, and even accuracy concerns can be raised about these systems. Many researchers and practitioners are tackling disparate impact, inaccuracy, privacy issues, and security problems with a number of brilliant, but often siloed, approaches. This presentation illustrates how to combine innovations from several sub-disciplines of machine learning research to train explainable, fair, trustable, and accurate predictive modeling systems. Together these techniques create a new and truly human-centered type of machine learning suitable for use in business- and life-critical decision support.
Author: Patrick Hall
- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
A Novel Visual Cryptographic Scheme Using Floyd Steinberg Half Toning and Block Replacement Algorithms Nisha Menon K – PG Scholar,
Minu Kuriakose – Assistant Professor,
Department of Electronics and Communication,
Federal Institute of Science and Technology, Ernakulam, India
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
Wireless network implementation is a viable option for building network infrastructure in rural communities. Rural people lack network infrastructures for information services and socio-economic development. The aim of this study was to develop a wireless network infrastructure architecture for network services to rural dwellers. A user-centered approach was applied in the study and a wireless network infrastructure was designed and deployed to cover five rural locations. Data was collected and analyzed to assess the performance of the network facilities. The results shows that the system had been performing adequately without any downtime with an average of 200 users per month and the quality of service has remained high. The transmit/receive rate of 300Mbps was thrice as fast as the normal Ethernet transmit/receive specification with an average throughput of 1 Mbps. The multiple output/multiple input (MIMO) point-to-multipoint network design increased the network throughput and the quality of service experienced by the users.
3D reconstruction is a technique used in computer vision which has a wide range of applications in areas like object recognition, city modelling, virtual reality, physical simulations, video games and special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required. Such systems were often very expensive and was only available for industrial or research purpose. With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition, the goal of this work also included making the 3D scanning process fully automated by building and integrating a turntable alongside the software. This means the user can perform a full 3D scan only by a press of a few buttons from our dedicated graphical user interface. Three main steps were followed to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and convert the acquired point cloud data into a watertight mesh of good quality. Third, export the reconstructed model to a 3D printer to obtain a proper 3D print of the model.
3D reconstruction is a technique used in computer vision which has a wide range of applications in areas like object recognition, city modelling, virtual reality, physical simulations, video games and special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required. Such systems were often very expensive and was only available for industrial or research purpose. With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition, the goal of this work also included making the 3D scanning process fully automated by building and integrating a turntable alongside the software. This means the user can perform a full 3D scan only by a press of a few buttons from our dedicated graphical user interface. Three main steps were followed to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and convert the acquired point cloud data into a watertight mesh of good quality. Third, export the reconstructed model to a 3D printer to obtain a proper 3D print of the model.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
How can you handle defects? If you are in a factory, production can produce objects with defects. Or values from sensors can tell you over time that some values are not "normal". What can you do as a developer (not a Data Scientist) with .NET o Azure to detect these anomalies? Let's see how in this session.
Come puoi gestire i difetti? Se sei in una fabbrica, la produzione può produrre oggetti con difetti. Oppure i valori dei sensori possono dirti nel tempo che alcuni valori non sono "normali". Cosa puoi fare come sviluppatore (non come Data Scientist) con .NET o Azure per rilevare queste anomalie? Vediamo come in questa sessione.
Scene recognition using Convolutional Neural NetworkDhirajGidde
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
We develop custom Image Recognition systems for Aerospace and defence applications. Using algorithms like Deep Convolutional Neural Networks and Regional Convolutional Neural Networks.
Our algorithms for Target Recognition and Tracking are designed from the beginning to be run on embedded systems. We target both GPU and FPGA devices.
To Train and Validate our algorithms we developed a process to generate photorealistic 3D environments.
Those 3D Environments are used to produce realistic video streams of the targets in different environmental conditions (lighting, adverse meteorological conditions, camouflage, point-of-view).
The same technology can be used to Train and Test Automotive Vision Systems.
A Literature Survey: Neural Networks for object detectionvivatechijri
Humans have a great capability to distinguish objects by their vision. But, for machines object
detection is an issue. Thus, Neural Networks have been introduced in the field of computer science. Neural
Networks are also called as ‘Artificial Neural Networks’ [13]. Artificial Neural Networks are computational
models of the brain which helps in object detection and recognition. This paper describes and demonstrates the
different types of Neural Networks such as ANN, KNN, FASTER R-CNN, 3D-CNN, RNN etc. with their accuracies.
From the study of various research papers, the accuracies of different Neural Networks are discussed and
compared and it can be concluded that in the given test cases, the ANN gives the best accuracy for the object
detection.
From lung/heart/ambient source separation to clinical unimodal
classification
Alternative download link:
https://www.dropbox.com/scl/fi/8s7uq4h0fi8lgqbzqwg83/wearableMic_signal.pdf?rlkey=l2tqg5yffd4e0w224g3cs6pfl&dl=0
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and OculomicsPetteriTeikariPhD
Shallow literature analysis on recent trends in (multimodal) ophthalmic imaging with focus on neurodegenerative disease imaging / oculomics. Open-ended literature review on what you could be building next.
#1/2: Hardware
#2/2: Computational imaging (coming)
Alternative download link:
https://www.dropbox.com/scl/fi/ebp5xkhm3ngfu80hw0lvo/retina_imaging_2024.pdf?rlkey=eeikf3ewxdb481v06wxm34mqu&dl=0
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...PetteriTeikariPhD
Shallow literature analysis on recent trends in computational ophthalmic imaging with focus on neurodegenerative disease imaging / oculomics.
Open-ended literature review on what you could be building next.
#1/2: Hardware
#2/2: Computational imaging
Alternative download link:
https://www.dropbox.com/scl/fi/d34pgi3xopfjbrcqj2lvi/retina_imaging_2024_computational.pdf?rlkey=xnt1dbe8rafyowocl9cbgjh3p&dl=0
Skin temperature as a proxy for core body temperature (CBT) and circadian phasePetteriTeikariPhD
Using distal temperature (wrist temperature with smartwatch / finger temperature with smart ring as Oura) to estimate core body temperature (CBT).
We can then use the wrist temperature shifts as circadian phase shift estimates in circadian phase management. For example when prescribing melatonin or/and light exposure to mitigate the effects of jet lag
Alternative download link:
https://www.dropbox.com/scl/fi/es7174291yws262rhr568/cbt_estimation.pdf?rlkey=846yeed1wrqsjgkx7kp8ccc2y&dl=0
Summary of "Precision strength training: The future of strength training with...PetteriTeikariPhD
Short visual summary of the preprint:
Petteri Teikari and Aleksandra Pietrusz (2021)
“Precision Strength Training: Data-driven Artificial
Intelligence Approach to Strength and Conditioning.”
SportRxiv. May 20. https://doi.org/10.31236/osf.io/w734a
Precision strength training: The future of strength training with data-driven...PetteriTeikariPhD
Visual presentation of the preprint:
Petteri Teikari and Aleksandra Pietrusz (2021)
“Precision Strength Training: Data-driven Artificial
Intelligence Approach to Strength and Conditioning.”
SportRxiv. May 20. https://doi.org/10.31236/osf.io/w734a
Alternative download link:
https://www.dropbox.com/scl/fi/47nqp579t1b4m1zs0irhw/precision_strength_training.pdf?rlkey=05mzzw2ep8id71mq86936hvfi&dl=0
Intracerebral Hemorrhage (ICH): Understanding the CT imaging featuresPetteriTeikariPhD
Overview of CT basics and deep learning literature mostly focused on the analysis of ICH.
Intracerebral hemorrhage (ICH), also known as cerebral bleed, is a type of intracranial bleed that occurs within the brain tissue or ventricles. Intracerebral bleeds are the second most common cause of stroke, accounting for 10% of hospital admissions for stroke.
For spontaneous ICH seen on CT scan, the death rate (mortality) is 34–50% by 30 days after the insult,and half of the deaths occur in the first 2 days. Even though the majority of deaths occurs in the first days after ICH, survivors have a long term excess mortality of 27% compared to the general population.
Deep learning and computational steps roughly can be categorized to 1) Preprocessing, 2) Image Restoration (denoising, deblurring, inpainting, reconstruction), 3) Diffeomorphic registration for spatial normalization, 4) Hand-crafted radiomics and texture analysis, 5) Hemorrhage segmentation, among other relevant head CT issues
Alternative download link: https://www.dropbox.com/s/8l2h93cl2pmle4g/CT_hemorrhage.pdf?dl=0
Clinical applications with a focus on rheumatoid arthritis (RA) management. Quick overview of hand pose tracking for managing rheumatoid arthritis.
For best clinical outcome, you might want to think how to integrate additional modalities like surface electromyography (sEMG) and hand function assessments (like hand grip strength, and finger extension strength) to the clinical prognostics model.
Alternative download link:
https://www.dropbox.com/s/rexzt3d5tsm1vgc/hand_tracking_arthritis_management.pdf?dl=0
Hardware landscape from computer vision to wearable sensors, and a light intro for UX requirements to ensure adherence and engagement.
At the intersection of new sensors, big data, deep learning, gamification, behavioral medicine and human factors.
Applications benefiting from "quantitative sensorimotor training", "precision exercise", "precision physiotherapy" or whatever you are calling this, include weight and strength training, powerlifting, bodybuilding, martial arts, yoga, dance, musical instrument training, post-surgery rehabilitation for ACL tears, etc.
Alternative download link:
https://www.dropbox.com/s/wcfrzdjkn58xjdq/physio_pipeline_hw.pdf?dl=0
Multimodal RGB-D+RF-based sensing for human movement analysisPetteriTeikariPhD
Combining RGB-D based computer vision with commodity Wifi for pose estimation and human movement analysis for action recognition.
Think of applications especially in healthcare settings, where existing Wifi Access Point already exist and adding USB Wifi dongles to Raspberry Pi (or dedicated chips) is a very easy way to create "operational awareness" of all your patients.
Alternative download link:
https://www.dropbox.com/s/awkqqfhibesjcb9/multimodal_remote_MovementSensing.pdf?dl=0
Creativity as Science: What designers can learn from science and technologyPetteriTeikariPhD
What personality traits do creative people share? Is creativity skill like any other? Is creativity suppressed in our world, is creativity misunderstood by "dinosaur companies" stuck with their legacy systems? Are "creatives" actually that creative in the end? Can fashion design exist in some romantic old school silo where no tech understanding is needed?
Alternative download link:
https://www.dropbox.com/s/ghiyeo3nyrtutzt/RCA_creativity.pdf?dl=0
High-level concepts For applications such as:
1) Myopia, 2) Jetlag, 3) Seasonal Affective Disorder (SAD)
If you want to add some tech to eyewear / glasses / sunglasses design projects, this slideshow serves as a high-level introduction for technical details
Alternative download link:
https://www.dropbox.com/s/qe8dpji6gwh1s8v/lightTreatmentGlasses_concepts.pdf?dl=0
Deep Learning for Biomedical Unstructured Time SeriesPetteriTeikariPhD
1D Convolutional neural networks (CNNs) for time series analysis, and inspiration from beyond biomedical field. Short intro for various different steps involved in Time Series Analysis including outlier detection, imputation, denoising, segmentation, classification and forecasting.
Available also from:
https://www.dropbox.com/s/cql2jhrt5mdyxne/timeSeries_deepLearning.pdf?dl=0
Short intro for some design considerations around hyperspectral retinal imaging. Both for research-grade desktop setups built around supercontinuum laser and AOTF tunable filter, and for mobile low-cost retinal imagers.
Available also from:
https://www.dropbox.com/s/5brchl9ntqno0i9/hyperspectral_retinal_imaging.pdf?dl=0
Design to accommodate “intelligent adaptive experiments” with future-proof hardware for deep learning-enabled imaging and neuroscience.
In other words, how to design future-proof measurement systems that are both easy to setup and are scalable for more advanced measurement paradigms of the future. And how you would like to think of structuring your data acquisition to be used efficiently with deep learning in neuroscience.
Alternative download link:
https://www.dropbox.com/s/j5r8vifvh6e7bfp/animal_instrumentation.pdf?dl=0
Novel deep learning-powered diagnostics hardware for assessing retinal health.
The impact of deep learning and artificial intelligence for the design practice itself is covered better in https://algorithms.design/ and the focus of this presentation is in the visual function diagnostics.
How is the future looking for your high-street optician's (e.g. Specsavers, Boots) vision exam going beyond simple refraction correction, and how possibly in the future AR glasses could allow design of "smarter" every-day eyewear also for health monitoring.
Talk given for “Future of Eyecare: How we see and how we want to be seen” organized by Flora McLean.
Royal College of Art - London UK
From traditional desktop to novel optical designs in small form factors. Towards portable low-cost fundus imaging designs with computational imaging techniques for image quality improvement.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
2. Purpose of this presentation
● This is the more ‘pragmatic set’ accompanying the slideset for
analyzing SfM-Net architecture from Google.
● The main idea in the dataset creation is to have multiple sensor
quality levels in the same rig, in order for us to obtain good quality
reference data (ground truth, gold standard) with terrestrial laser
scanner that can be used for image restoration deep learning
networks
- In order to get more out from the inference of lower quality sensors
as well. Think of Google Tango, iPhone 8 with depth sensing, Kinect,
etc.
● Presentation tries to address the typical problem of finding the
relevant “seed literature” for a new topic helping fresh grad
students, postdocs, software engineers and startup founders.
- Answer to “Do you know if someone has done some work on the
various steps involved in SfM” to identify what wheels do not need to
be re-invented
●● Some of the RGB image enhancement/styling slides are not the most relevant when designing the hardware pipe per se,
but are there highlighting the need for systems engineering approach for the design of the whole pipe rather than just
obtaining the data somewhere and expecting the deep learning software to do all the magic for you.
Deep Learning for Structure-from-Motion (SfM)
https://www.slideshare.net/PetteriTeikariPhD/deconstructing-sfmnet
4. Pipeline Dataset creation #1A
The Indoor Multi-sensor Acquisition System (IMAS)
presented in this paper consists of a wheeled platform
equipped with two 2D laser heads, RGB cameras,
thermographic camera, thermohygrometer, and luxmeter.
http://dx.doi.org/10.3390/s16060785
Inspired by the system of Armesto et al., one could have
a custom rig with:
● high-quality laser scanner giving the “gold standard”
for depth,
● accompanied with smart phone quality RGB and depth
sensing,
● accompanied by DSLR gold standard for RGB
● and some mid-level structured light scanner?
Rig config would allow multiframe exposure techniques
to be used easily than a handheld system (see next slide)
We saw previous that the brightness constancy assumption might be
tricky with some materials, and polarization measurement for example
can help distinguishing materials (dielectric materials polarize light,
whereas conductive do not), or if there is some other way of
estimating Bidirectional Reflectance Distribution Function (BRDF)
Multicamera rig calibration by double-sided
thick checkerboard
Marco Marcon; Augusto Sarti; Stefano Tubaro
IET Computer Vision 2017
http://dx.doi.org/10.1049/iet-cvi.2016.0193
5. Pipeline Dataset creation #2a
: Multiframe Techniques
Note! In deep learning, the term super-resolution refers to “statistical upsampling” whereas in optical imaging super-resolution
typically refers to imaging techniques. Note2! Nothing should stop someone marrying them two though
In practice anyone can play with super-resolution at home by putting a camera on a tripod and then taking multiple shots of the
same static scene, and post-processing them through super-resolution that can improve modulation transfer function (MTF) for
RGB images, improve depth resolution and reduce noise for laser scans and depth sensing e.g. with Kinect.
https://doi.org/10.2312/SPBG/SPBG06/009-015
Cited by 47 articles
(a) One scan. (b) Final super-resolved surface from 100 scans.
“PhotoAcute software processes sets of photographs taken in continuous
mode. It utilizes superresolution algorithms to convert a sequence of images
into a single high-resolution and low-noise picture, that could only be taken with
much better camera.”
Depth looks a lot nicer when reconstructed using
50 consecutive Kinec v1 frames in comparison to
just one frame. [Data from Petteri Teikari[
Kinect multiframe reconstruction with
SiftFu [Xiao et al. (2013)]
https://github.com/jianxiongxiao/ProfXkit
6. Pipeline Dataset creation #2b
: Multiframe Techniques
Boring to take manually e.g. 100 shots of the same scene involving even 360 rotation of the imaging devices, in practice this would
need to be automated in some way with a step motor driven by Arduino or if no good commercial systems are not available.
Multiframe techniques would allow another level of “nesting” of ground truths for a joint image enhancement block along with the
proposed structure and motion network.
● The reconstructed laser scan / depth image / RGB from 100 images would the target, and the single-frame version the input that
need to be enhanced
Meinhardt et al. (2017)
Diamond et al. (2017)
7. Pipeline Dataset creation #3
A Pipeline for Generating Ground
Truth Labels for Real RGBD Data
of Cluttered Scenes
Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake
Submitted on 15 Jul 2017, last revised 25 Jul 2017
https://arxiv.org/abs/1707.04796
In this paper we develop a pipeline to rapidly
generate high quality RGBD data with
pixelwise labels and object poses. We use an
RGBD camera to collect video of a scene from
multiple viewpoints and leverage existing
reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D
reconstruction using a human assisted ICP-
fitting of object meshes. By reprojecting the
results of labeling the 3D scene we can
produce labels for each RGBD image of the
scene. This pipeline enabled us to collect over
1,000,000 labeled object instances in just a
few days.
We use this dataset to answer
questions related to how much
training data is required, and of what
quality the data must be, to achieve
high performance from a DNN
architecture.
Overview of the data generation pipeline. (a) Xtion RGBD sensor
mounted on Kuka IIWA arm for raw data collection. (b) RGBD data
processed by ElasticFusion into reconstructed pointcloud. (c) User
annotation tool that allows for easy alignment using 3 clicks. User
clicks are shown as red and blue spheres. The transform mapping the
red spheres to the green spheres is then the user specified guess. (d)
Cropped pointcloud coming from user specified pose estimate is
shown in green. The mesh model shown in grey is then finely aligned
using ICP on the cropped pointcloud and starting from the user
provided guess. (e) All the aligned meshes shown in reconstructed
pointcloud. (f) The aligned meshes are rendered as masks in the RGB
image, producing pixelwise labeled RGBD images for each view.
Increasing the variety of backgrounds in the
training data for single-object scenes also
improved generalization performance for new
backgrounds, with approximately 50 different
backgrounds breaking into above- 50% IoU on
entirely novel scenes. Our recommendation is to
focus on multi-object data collection in a variety
of backgrounds for the most gains in
generalization performance.
We hope that our pipeline lowers the barrier to
entry for using deep learning approaches for
perception in support of robotic manipulation
tasks by reducing the amount of human time
needed to generate vast quantities of labeled
data for your specific environment and set of
objects. It is also our hope that our analysis of
segmentation network performance provides
guidance on the type and quantity of data that
needs to be collected to achieve desired levels of
generalization performance.
8. Pipeline Dataset creation #4
A Novel Benchmark RGBD Dataset for Dormant Apple
Trees and Its Application to Automatic Pruning
Shayan A. Akbar, Somrita Chattopadhyay, Noha M. Elfiky, Avinash Kak;
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016
https://doi.org/10.1109/CVPRW.2016.50
Extending of the Kinect device functionality and the
corresponding database
Libor Bolecek ; Pavel Němec ; Jan Kufa ; Vaclav Ricny
Radioelektronika (RADIOELEKTRONIKA), 2017
https://doi.org/10.1109/RADIOELEK.2017.7937594
One of the possible research directions is use of infrared version of the investigated scene for improvement of the depth
map. However, the databases of the Kinect data which would contain the corresponding infrared images do not exist.
Therefore, our aim was to create such database. We want to increase the usability of the database by adding stereo
images. Moreover, the same scenes were captured by Kinect v2. It was also investigated the impact of simultaneous use
Kinect v1 and Kinect v2 to improve depth map investigated the scene. The database contains sequences of objects
on turntable and simple scenes containing several objects.
The depth map of the scene
obtained by a) Kinect v1, b)
Kinect v2.
The comparison of the one row
of the depth map obtained by a)
Kinect v1 b) Kinect v2 with true
depth map.
Kinect inrared image after change
of the dynamics of brightness
9. Pipeline Multiframe Pipe #1
100
Multiframe reconstruction enhancement block
1
2
100
3
...
...
Depth image (e.g. Kinect)
Laser scan (e.g. Velodyne)
RGB Image
Target
Learn to improve image
quality from single
image when the system
is deployed
Reconstruction
could be done
using traditional
algorithms
(e.g. OpenCV)
to start with
need then to
save all
individual
frames when
reconstruction
algorithms
improve, and all
blocks can be
iterated then ad
infinitum
Mix different image qualities and sensor
qualities then in the training set to build
invariance to scan quality
10. Pipeline Multiframe Pipe #2
You could cascade different levels of quality If you want to make things complex in deeply supervised fashion
LOWEST QUALITY
Just with RGB
HIGHEST QUALITY
Depth map with professional
laser scanner
2
1
3
4
5
6
The following step in the cascade is closer in quality to the previous one, and one could assume that this enhancement
would be easier to learn, and the pipeline would output the enhanced quality as a “side effect” which is good for
visualization purposes.
11. Pipeline acquisition example with Kinect
https://arxiv.org/abs/1704.07632
KinectFusion (Newcombe et al. 2011), one of the pioneering works, showed that a real-world object as well as an
indoor scene canbe reconstructed in real-time with GPU acceleration. It exploits the iterative closest point (ICP)
algorithm (Besl and McKay 1992) to track 6-DoF poses and the volumetric surface representation scheme with
signed distance functions (Curless and Levoy, 1996) to fuse 3D measurements. A number of following studies (e.g.
Choi et al. 2015) have tackled the limitation of KinectFusion; as the scale ofa scene increases, it is hard to
completely reconstruct thescene due to the drift problem of the ICP algorithm as wellas the large memory
consumption of volumetric integration.
To scale up the KinectFusion algorithm, Whelan et al . (2012)] presented a spatially extended KinectFusion,
named as Kintinuous, by incrementally adding KinectFusion results as the form of triangular meshes.
Whelan et al . (2015) also proposed ElasticFusion to tackle similar problems as well as to overcome the
problem of a pose graph optimization by using the surface loop closure optimization and the surfel-based
representation. Moreover, to decrease the space complexity, ElasticFusion deallocates invisible surfels from
the memory; invisible surfels are allocated in the memory again only if they are likely to be visible in the near
future.
13. Pipeline Multiframe Pipe Quality simulation
Simulated Imagery Rendering Workflow for UAS-
Based Photogrammetric 3D Reconstruction
Accuracy Assessments
Richard K. Slocum and Christopher E. Parrish
Remote Sensing 2017, 9(4), 396; doi :10.3390/rs9040396
“Here, we present a workflow to render computer generated imagery using a virtual environment which can
mimic the independent variables that would be experienced in a real-world UAS imagery acquisition
scenario. The resultant modular workflow utilizes Blender Python API, an open source computer graphics
software, for the generation of photogrammetrically-accurate imagery suitable for SfM processing, with
explicit control of camera interior orientation, exterior orientation, texture of objects in the scene, placement
of objects in the scene, and ground control point (GCP) accuracy.”
Pictorial representation of the simUAS (simulated UAS)
imagery rendering workflow. Note: The SfM-MVS step is
shown as a “black box” to highlight the fact that the procedure
can be implemented using any SfM-MVS software, including
proprietary commercial software.
The imagery from Blender, rendered using a pinhole camera
model, is postprocessed to introduce lens and camera effects.
The magnitudes of the postprocessing effects are set high in this
example to clearly demonstrate the effect of each. The fullsize
image (left) and a close up image (right) are both shown in order to
depict both the large and small scale effects.
A 50 cm wide section of
the point cloud containing
a box (3 m cube) is shown
with the dense
reconstruction point
clouds overlaid to
demonstrate the effect of
point cloud dense
reconstruction quality on
accuracy near sharp
edges.
The points along the side of a
vertical plane on a box were
isolated and the error
perpendicular to the plane of
the box were visualized for
each dense reconstruction
setting, with white regions
indicating no point cloud
data. Notice that the region
with data gaps in the point
cloud from the ultra-high
setting corresponds to the
region of the plane with low
image texture, as shown in
the lower right plot.
15. Pipeline data Fusion / Registration #1
“Rough estimates for 3D structure obtained using structure
from motion (SfM) on the uncalibrated images are first co-
registered with the lidar scan and then a precise alignment
between the datasets is estimated by identifying
correspondences between the captured images and
reprojected images for individual cameras from the 3D lidar
point clouds. The precise alignment is used to update both the
camera geometry parameters for the images and the individual
camera radial distortion estimates, thereby providing a 3D-to-
2D transformation that accurately maps the 3D lidar scan
onto the 2D image planes. The 3D to 2D map is then utilized to
estimate a dense depth map for each image. Experimental
results on two datasets that include independently acquired
high-resolution color images and 3D point cloud datasets
indicate the utility of the framework. The proposed approach
offers significant improvements on results obtained with
SfM alone.”
Fusing structure from motion and lidar for
dense accurate depth map estimation
Li Ding ; Gaurav Sharma
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
https://doi.org/10.1109/ICASSP.2017.7952363
https://arxiv.org/abs/1707.03167
“In this paper, we present RegNet, the first deep convolutional neural
network (CNN) to infer a 6 degrees of freedom (DOF) extrinsic
calibration between multimodal sensors, exemplified using a
scanning LiDAR and a monocular camera. Compared to existing
approaches, RegNet casts all three conventional calibration steps
(feature extraction, feature matching and global regression) into a single
real-time capable CNN.”
Development of the mean absolute error (MAE) of the
rotational components over training iteration for different
output representations: Euler angles are represented in red,
quaternions in brown and dual quaternions in blue. Both
quaternion representations outperform the Euler angles
representation.
“Our method yields a mean calibration error of 6 cm for translation and 0.28◦
for rotation with decalibration
magnitudes of up to 1.5 m and 20◦
, which competes with state-of-the-art online and offline methods.”
16. Pipeline data Fusion / Registration #2
Depth refinement for binocular Kinect RGB-D cameras
Jinghui Bai ; Jingyu Yang ; Xinchen Ye ; Chunping Hou
Visual Communications and Image Processing (VCIP), 2016
https://doi.org/10.1109/VCIP.2016.7805545
17. Pipeline data Fusion / Registration #3
Used Kinects
inexpensive
~£29.95 (eBay)
Use multiple Kinects at once
for better occlusion handling
Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar
IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 )
https://doi.org/10.1109/JSEN.2014.2309987
Characterization of Different Microsoft Kinect Sensor Models
IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015)
https://doi.org/10.1109/JSEN.2015.2422611
An ANOVA analysis was performed to determine if the model of the Kinect, the operating temperature,
or their interaction were significant factors in the Kinect's ability to determine the distance to the target.
Different sized gauge blocks were also used to test how well a Kinect could reconstruct precise objects.
Machinist blocks were used to examine how well the Kinect could reconstruct objects setup on an angle
and determine the location of the center of a hole. All the Kinect models were able to determine the
location of a target with a low standard deviation (<;2 mm). At close distances, the resolutions of all the
Kinect models were 1 mm. Through the ANOVA analysis, the best performing Kinect at close distances
was the Kinect model 1414, and at farther distances was the Kinect model 1473. The internal
temperature of the Kinect sensor had an effect on the distance reported by the sensor. Using different
correction factors, the Kinect was able to determine the volume of a gauge block and the angles machinist
blocks were setup at, with under a 10% error.
18. Pipeline data Fusion / Registration #4
A Generic Approach for Error Estimation of Depth
Data from (Stereo and RGB-D) 3D Sensors
Luis Fernandez, Viviana Avila and Luiz Gonçalves
Preprints | Posted: 23 May 2017 |
http://dx.doi.org/10.20944/preprints201705.0170.v1
“We propose an approach for estimating
the error in depth data provided by
generic 3D sensors, which are modern
devices capable of generating an image
(RGB data) and a depth map (distance)
or other similar 2.5D structure (e.g.
stereo disparity) of the scene.
We come up with a multi-platform
system and its verification and
evaluation has been done, using the
development kit of the board NVIDIA
Jetson TK1 with the MS Kinects v1/v2
and the Stereolabs ZED camera. So the
main contribution is the error
determination procedure that does
not need any data set or benchmark,
thus relying only on data acquired on-
the-fly. With a simple checkerboard, our
approach is able to determine the error
for any device”
In the article of Yang [16], an MS Kinect v2 structure is proposed to improve the accuracy of the
sensors and the depth of capture of objects that are placed more than four meters apart. It has
been concluded that an object covered with light-absorbing materials, may cause less reflected IR
light back to the MS Kinect and therefore erroneous depth data. Other factors, such as power
consumption, complex wiring and high requirement for laptop computer also limits the use of the
sensor.
The characteristics of MS Kinect stochastic errors are presented for each direction of the axis in the work by Choo [17].
The depth error is measured using a 3D chessboard, similarly the one used in our approach. The results show that, for all
three axes, the error should be considered independently. In the work of Song [18] it is proposed an approach to
generate a per-pixel confidence measure for each depth map captured by MS Kinect in indoor scenes through
supervised learning and the use of artificial intelligence
Detection (a) and ordering (b) of corners in the three planes
of the pattern.
It would make sense to combine versions 1 and 2 for the same rig as Kinect v1 is
more accurate for close distances, and Kinect v2 more accurate for far distances
19. Pipeline data Fusion / Registration #5
Precise 3D/2D calibration between a RGB-D
sensor and a C-arm fluoroscope
International Journal of Computer Assisted Radiology and Surgery
August 2016, Volume 11, Issue 8, pp 1385–1395
https://doi.org/10.1007/s11548-015-1347-2
“A RMS reprojection error of 0.5 mm is achieved using
our calibration method which is promising for surgical
applications. Our calibration method is more accurate
when compared to Tsai’s method. Lastly, the simulation
result shows that using a projection matrix has a lower
error than using intrinsic and extrinsic parameters in
the rotation estimation.”
While the color camera has a relative high resolution (1920 px ×
1080 px for Kinect 2.0), the depth camera is mid-resolution (512 px
× 424 px for Kinect 2.0) and highly noisy. Furthermore, RGB-D
sensors have a minimal distance to the scene from which they can
estimate the depth. For instance, the minimum optimal distance of
Kinect 2.0 is 50 cm.
On the other hand, C-arm fluoroscopes have a short focus, which is
typically 40 cm, and a much narrower field of view than the RGB-D
sensor with also a mid-resolution image (ours is 640 px × 480 px).
All these factors lead to a high disparity in the field of view
between the C-arm and the RGB-D sensor if the two were to be
integrated in a single system. This means that the calibration
process is crucial. We need to achieve high accuracy for the
localization of 3D points using RGB-D sensors, and we require a
calibration phantom which can be clearly imaged by both devices.
Workflow of the calibration process between the RGB-D sensor
and a C-arm. The input data include a sequence of infrared,
depth, and color images from the RGB-D sensor and X-ray images
from the C-arm. The output of the calibration pipeline is the
projection matrix, which is calculated by the 3D/2D
correspondences detected from the input data
20. Pipeline data Fusion / Registration #6
Fusing Depth and Silhouette for Scanning Transparent
Object with RGB-D Sensor
Yijun Ji, Qing Xia, and Zhijiang Zhang
System overview; TSDF: truncated signed distance function; SFS: shape from silhouette.
Results on noise region. (a) Color images captured by
stationary camera with a rotating platform. (b) The noisy voxels
detected by multiple depth images are in red. (c) and (d) show
the experimental results done by a moving Kinect; the
background is changing in these two cases.
21. Pipeline data Fusion / Registration #7
Intensity Video Guided 4D Fusion for Improved
Highly Dynamic 3D Reconstruction
Jie Zhang, Christos Maniatis, Luis Horna, Robert B. Fisher
(Submitted on 6 Aug 2017)
https://arxiv.org/abs/1708.01946
Temporal tracking of intensity image points (of moving and
deforming objects) allows registration of the corresponding 3D
data points, whose 3D noise and fluctuations are then reduced
by spatio-temporal multi-frame 4D fusion. The results
demonstrate that the proposed algorithm is effective at
reducing 3D noise and is robust against intensity noise. It
outperforms existing algorithms with good scalability on both
stationary and dynamic objects.
The system framework (using 3 consecutive
frames as an example)
Static Plane (first row): (a) mean roughness;
(b) std of roughness vs. number of frames
fused. Falling ball (second row): (c) mean
roughness; (d) std of roughness vs.
number of frames fused
Texture-related
3D noise on a static
plane: (a) 3D frame;
(b) 3D frame with
textures. The 3D
noise is closely
related to the
textures in the
intensity image.
Illustration of 3D noise
reduction on the ball.
Spatial-temporal divisive
normalized bilateral filter (DNBF)
22. Pipeline data Fusion / Registration #8
Utilization of a Terrestrial Laser Scanner for
the Calibration of Mobile Mapping Systems
Seunghwan Hong, Ilsuk Park, Jisang Lee, Kwangyong Lim, Yoonjo
Choi and Hong-Gyoo Sohn
Sensors 2017, 17(3), 474; doi :10.3390/s17030474
Configuration of mobile mapping system: network video cameras (F:
front, L: left, R: right), mobile laser scanner, and Global Navigation
Satellite System (GNSS)/Inertial Navigation System (INS).
To integrate the datasets captured by each sensor mounted on the
Mobile Mapping System (MMS) into the unified single coordinate
system, the calibration, which is the process to estimate the orientation
(boresight) and position (lever-arm) parameters, is required with the
reference datasets [Schwarz and El-Sheimy 2004, Habib et al. 2010,
Chan et al. 2010].
When the boresight and lever-arm parameters defining the geometric relationship
between each sensing data and GNSS/INS data are determined, georeferenced data
can be generated. However, even after precise calibration, the boresight and lever-
arm parameters of an MMS can be shaken and the errors that deteriorate the
accuracy of the georeferenced data might accumulate. Accordingly, for the stable
operation of multiple sensors, precise calibration must be conducted periodically.
(a) Sphere target used for registration
of terrestrial laser scanning data; (b)
sphere target detected in a point cloud
(the green sphere is a fitted sphere
model).
Network video camera: AXIS F1005-E
GNSS/INS unit: OxTS Survey+
Terrestrial laser scanner (TLS): Faro Focus 3D
Mobile laser scanner: Velodyne HDL 32-E
23. Pipeline data Fusion / Registration #9
Dense Semantic Labeling of Very-High-Resolution Aerial
Imagery and LiDAR with Fully-Convolutional Neural
Networks and Higher-Order CRFs
Yansong Liu, Sankaranarayanan Piramanayagam, Sildomar T. Monteiro, Eli Saber
http://openaccess.thecvf.com/content_cvpr_2017_workshops/w18/papers/Liu_Dense_Semantic_Labeling_CVPR_2017_paper.pdf
Our proposed decision-level fusion scheme: training one fully-convolutional
neural network on the color-infrared image (CIR) and one logistic regression using
hand-crafted features. Two probabilistic results: PFCN
and PLR
are then combined in a
higher-order CRF framework
Main original contributions of our work are: 1) the use of energy based CRFs for efficient decision-
level multisensor data fusion for the task of dense semantic labeling. 2) the use of higher-order CRFs
for generating labeling outputs with accurate object boundaries. 3) the proposed fusion scheme has
a simpler architecture than training two separate neural networks, yet it still yields the state-of-the-
art dense semantic labeling results.
Guiding multimodal registration with learned
optimization updates
Gutierrez-Becker B, Mateus D, Peter L, Navab N
Medical Image Analysis Volume 41, October 2017, Pages 2-17
https://doi.org/10.1016/j.media.2017.05.002
Training stage (left): A set of aligned multimodal images is used to generate a training set of images with known
transformations. From this training set we train an ensemble of trees mapping the joint appearance of the images to
displacement vectors. Testing stage (right): We register a pair of multimodal images by predicting with our trained
ensemble the required displacements δ for alignment at different locations z. The predicted displacements are then
used to devise the updates of the transformation parameters to be applied to the moving image. The procedure is
repeated until convergence is achieved.
Corresponding CT (left) and MR-T1
(middle) images of the brain obtained
from the RIRE dataset. The
highlighted regions are corresponding
areas between both images (right).
Some multimodal similarity metrics
rely on structural similarities between
images obtained using different
modalities, like the ones inside the
blue boxes. However in many cases
structures which are clearly visible in
one imaging modality correspond to
regions with homogeneous voxel
values in the other modality (red and
green boxes).
25. PipelineRGB image Restoration #1
https://arxiv.org/abs/1704.02738
Our method
includes a sub-
pixel motion
compensation
(SPMC) layer that
can better handle
inter-frame motion
for this task. Our
detail fusion (DF)
network that can
effectively fuse
image details from
multiple images
after SPMC
alignment
“Hardware Super-resolution” of course all via deep learning too
https://petapixel.com/2015/02/21/a-pract
ical-guide-to-creating-superresolution-p
hotos-with-photoshop/
26. PipelineRGB image Restoration #2A
“Data-driven Super-resolution” what super-resolution typically means in the deep learning space
Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution”
External Prior Guided Internal Prior
Learning for Real Noisy Image Denoising
Jun Xu, Lei Zhang, David Zhang
(Submitted on 12 May 2017)
https://arxiv.org/abs/1705.04505
Denoised images of a region cropped from the real noisy
image from DSLR “Nikon D800 ISO 3200 A3”,
Nam et al. 2016 (+video) by different methods. The
scene was shot 500 times with the same camera and
camera setting. The mean image of the 500 shots is
roughly taken as the “ground truth”, with which the
PSNR index can be computed. The images are better
viewed by zooming in on screen
Benchmarking Denoising Algorithms
with Real Photographs
Tobias Plötz, Stefan Roth
(Submitted on 5 Jul 2017)
https://arxiv.org/abs/1707.01313
“We then capture a novel benchmark dataset, the
Darmstadt Noise Dataset (DND), with consumer
cameras of differing sensor sizes. One interesting
finding is that various recent techniques that
perform well on synthetic noise are clearly
outperformed by BM3D on photographs with real
noise. Our benchmark delineates realistic evaluation
scenarios that deviate strongly from those
commonly used in the scientific literature.”
Image formation process underlying the observed low-ISO image xr
and
high-ISO image xn
. They are generated from latent noise-free images yr
and yn
, respectively, which in turn are related by a linear scaling of image
intensities (LS), a small camera translation (T), and a residual low-
frequency pattern (LF). To obtain the denoising ground truth yp
, we apply
post-processing to xr
aiming at undoing these undesirable transformations.
Mean PSNR (in dB) of the denoising methods tested on our DND benchmark. We
apply denoising either on linear raw intensities, after a variance stabilizing transformation
(VST, Anscombe), or after conversion to the sRGB space. Likewise, we evaluate the result
either in linear raw space or in sRGB space. The noisy images have a PSNR of 39.39 dB
(linear raw) and 29.98 dB (sRGB).
Difference between blue channels of low- and high-ISO images from Fig. 1 after various post-
processing stages. Images are smoothed for display to highlight structured residuals,
attenuating the noise.
27. PipelineRGB image Restoration #2b
“Data-driven Super-resolution” what super-resolution typically means in the deep learning space
MemNet: A Persistent Memory
Network for Image Restoration
Ying Tai, Jian Yang, Xiaoming Liu, Chunyan Xu
(Submitted on 7 Aug 2017)
https://arxiv.org/abs/1708.02209
https://github.com/tyshiwo/MemNet.
Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution”
The same MemNet structure achieves the state-of-the-art performance in image denoising, super-resolution and
JPEG deblocking. Due to the strong learning ability, our MemNet can be trained to handle different levels of
corruption even using a single model.
Training Setting: Following the method of Mao et al. (2016), for
image denoising, the grayscale image is used; while for SISR and
JPEG deblocking, the luminance component is fed into the
model.
Deep Generative Adversarial
Compression Artifact Removal
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini,
Alberto Del Bimbo
(Submitted on 8 Apr 2017)
https://arxiv.org/abs/1704.02518
In this work we address the problem of artifact removal using convolutional neural networks. The proposed
approach can be used as a post-processing technique applied to decompressed images, and thus can be
applied to different compression algorithms (typically applied in YCrCb color space) such as JPEG, intra-frame
coding of H.264/AVC and H.265/HEVC. Compared to super resolution techniques, working on compressed
images instead of down-sampled ones, is more practical, since it does not require to change the compression
pipeline, that is typically hardware based, to subsample the image before its coding; moreover, camera
resolutions have increased during the latest years, a trend that we can expect to continue.
28. PipelineRGB image Restoration #3
An attempt to improve smartphone camera quality with DSLR high quality image as the ‘gold standard’ with deep learning
https://arxiv.org/abs/1704.02470
Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool
Computer Vision Laboratory, ETH Zurich, Switzerland
“Quality transfer”
30. Pipelineimage enhancement #1
Aesthetics enhancement: “AI-driven Interior Design”
“Re-colorization” of scanned indoor scenes or intrinsic decomposition based editing
Limitations. We have to manually correct
inaccurate segmentation, though seldom
encountered. This is a limitation of our method.
However, segmentation errors are seldom
encountered during experiments. Since our
method is object-based, our segmentation method
does not consider the color patterns among similar
components of an image object.
Currently, our system is not capable of segmenting
the mesh according to the colored components
with similar geometry for this kind of objects. This
is another limitation of our method.
An intrinsic image decomposition method could
be helpful to our image database, for extracting
lighting-free textures to be further used in
rendering colorized scenes. However, such
methods are not so robust that can be directly
applied to various images in a large image
database. On the other hand, intrinsic image
decomposition is not essential to achieve good
results in our experiments. So we did not
incorporate it in our work, but we will further study
it to improve our database.
31. Pipelineimage enhancement #2
“Auto-adjust” RGB texture maps for indoor scans with user interaction
We use the CIELab color space for both the input and
output images. We can use 3-channel Lab color as the
color features. However, it generates color variations in
smooth regions since each color is processed
independently. To alleviate this issue, we add the local
neighborhood information by concatenating the Lab color
and the L2 normalized first-layer convolutional feature
maps of ResNet-50.
Although the proposed method provides the users with
automatically adjusted photos, some users may want their photos
to be retouched by their own preference. In the first row of Fig. 2 for
example, a user may want only the color of the people to be changed.
For such situations, we provide a way for the users to give their own
adjustment maps to the system. Figure 4 shows some examples of
the personalization. When the input image is forwarded, we
substitute the extracted semantic adjustment map with the new
adjustment map from the user. As shown in the figure, the proposed
method effectively creates the personalized images adjusted by
user’s own style.
Deep Semantics-Aware Photo Adjustment
Seonghyeon Nam, Seon Joo Kim (Submitted on 26 Jun 2017) https://arxiv.org/abs/1706.08260
32. Pipelineimage enhancement #3
Aesthetic-Driven Image Enhancement
by Adversarial Learning
Yubin Deng, Chen Change Loy, Xiaoou Tang
(Submitted on 17 Jul 2017)
https://arxiv.org/abs/1707.05251
GAN
GAN
GAN
Pro
Pro
Pro
Examples of image enhancement given original input (a).
The architecture of our
proposed EnhanceGAN
framework. ResNet module is
the feature extractor (for image in
CIELab color space); in this work,
we use the ResNet-101 and
removed the last average pooling
layer and the final fc layer. The
switch icons in the discriminator
network represent zero-masking
during stage-wise training
“Auto-adjust” RGB texture maps for indoor scans with GANs
33. Pipelineimage enhancement #4
“Auto-adjust” RGB texture maps for indoor scans with GANs for “auto-matting”
Creatism: A deep-learning photographer
capable of creating professional work
Hui Fang, Meng Zhang (Submitted on 11 Jul 2017)
https://arxiv.org/abs/1707.03491
https://google.github.io/creatism/
Datasets were created that contain ratings of
photographs based on aesthetic quality
[Murray et al., 2012] [Kong et al., 2016] [Lu et al., 2015].
Using our system, we mimic the workflow of a
landscape photographer, from framing for the best
composition to carrying out various post-processing
operations. The environment for our virtual
photographer is simulated by a collection of panorama
images from Google Street View. We design a "Turing-
test"-like experiment to objectively measure quality of
its creations, where professional photographers rate a
mixture of photographs from different sources blindly.
We work with professional photographers to empirically de- fine 4 levels of aesthetic quality:
● 1: point-and-shoot photos without consideration.
● 2: Good photos from the majority of population without art background. Nothing artistic stands out.
● 3: Semi-pro. Great photos showing clear artistic aspects. The photographer is on the right track of
becoming a professional.
● 4: Pro-work. Clearly each professional has his/her unique taste that needs calibration.
We use AVA dataset [Murray et al., 2012] wto bootstrap a consensus among them.
Assume there exists a universal aesthetics metric, Φ. By
definition, needs to incorporate all aesthetic aspects, suchΦ
as saturation, detail level, composition... To define withΦ
examples, number of images needs to grow exponentially
to cover more aspects [Jaroensri et al., 2015]. To make things
worse, unlike traditional problems such as object recognition,
what we need are not only natural images, but also pro-
level photos, which are much less in quantity.
34. Pipelineimage enhancement #5
“Auto-adjust” images based on different user groups (or personalizing for different markets for indoor scan products)
Multimodal Prediction and Personalization of
Photo Edits with Deep Generative Models
Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi,
Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams
CSAIL, MI; Adobe Research; Media Lab, MIT; Harvard and Google Brain
(Submitted on 17 Apr 2017) https://arxiv.org/abs/1704.04997
The main goals of our proposed models: (a) Multimodal photo
edits: For a given photo, there may be multiple valid aesthetic
choices that are quite different from one another. (b) User
categorization: A synthetic example where different user clusters
tend to prefer different slider values. Group 1 users prefer to
increase the exposure and temperature for the baby images;
group 2 users reduce clarity and saturation for similar images.
Predictive log-likelihood for users in the test set of different datasets. For each user in the test set, we compute the predictive log-likelihood of 20 images, given 0 to
30 images and their corresponding sliders from the same user. 30 sample trajectories and the overall average ± s.e. is shown for casual, frequent and expert users.
The figure shows that knowing more about the user (up to around 10 images) can increase the predictive log-likelihood. The log-likelihood is normalized by
subtracting off the predictive log-likelihood computed given zero images. Note the different y-axis in the plots. The rightmost plot is provided for comparing the
average predictive log-likelihood across datasets.
35. Pipelineimage enhancement #6
Combining semantic segmentation for higher quality “Instagram filters”
Exemplar-Based Image and Video Stylization
Using Fully Convolutional Semantic Features
Feida Zhu ; Zhicheng Yan ; Jiajun Bu ; Yizhou Yu
IEEE Transactions on Image Processing ( Volume: 26, Issue: 7, July 2017 )
https://doi.org/10.1109/TIP.2017.2703099
Color and tone stylization in images and videos strives to enhance unique themes with artistic color
and tone adjustments. It has a broad range of applications from professional image postp-rocessing
to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe
Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted
through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of
image contents and the resulting global color transform limits the range of artistic styles it can
represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various
semantic regions. Such an ability enables a broader range of visual styles.
Traditional professional video editing softwares (Adobe After Effects, Nuke, etc.) offer a suite of
predefined operations with tunable parameters that apply common global adjustments
(exposure/color correction, white balancing, sharpening, denoising, etc). Local adjustments within
specific spatiotemporal regions are usually accomplished with masking layers created with intensive
user interaction. Both parameter tuning and masking layer creation are labor intensive processes.
An example of learning semantics-aware photo adjustment styles. Left: Input image. Middle: Manually enhanced by
photographer. Distinct adjustments are applied to different semantic regions. Right: Automatically enhanced by our
deep learning model trained from image exemplars. (a) Input image. (b) Ground truth. (c) Our result.
Given a set of exemplar image pairs, each representing a photo before and
after pixel-level color (in CIELab space) and tone adjustments following a
particular style, we wish to learn a computational model that can automatically
adjust a novel input photo in the same style. We still cast this learning task as
a regression problem as in Yan et al. (2016). For completeness, let us first
review their problem definition and then present our new deep learning based
architecture and solution.
36. Pipelineimage enhancement #7A
Combining semantic segmentation for higher quality “Instagram filters”
Deep Bilateral Learning for Real-Time Image Enhancement
Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, Frédo Durand MIT CSAIL, Google Research, MIT CSAIL / Inria, Université Côte d’Azur
(Submitted on 10 Jul 2017)
https://arxiv.org/abs/1707.02880 | https://github.com/mgharbi/hdrnet | https://groups.csail.mit.edu/graphics/hdrnet/
https://youtu.be/GAe0qKKQY_I
Our novel neural network architecture can reproduce sophisticated image enhancements with inference running in real
time at full HD resolution on mobile devices. It can not only be used to dramatically accelerate reference
implementations, but can also learn subjective effects from human retouching (“copycat” filter).
By performing most of its computation within a bilateral grid and by predicting local affine color transforms, our model
is able to strike the right balance between expressivity and speed. To build this model we have introduced two new
layers: a data-dependent lookup that enables slicing into the bilateral grid, and a multiplicative operation for affine
transformation. By training in an end-to-end fashion and optimizing our loss function at full resolution (despite most of
our network being at a heavily reduced resolution), our model is capable of learning full-resolution and non-scale-
invariant effects.
37. Pipelineimage enhancement #8
Blind Image Quality assessment e.g. for quantifying RGB scan quality real-time
RankIQA: Learning from Rankings for No-
reference Image Quality Assessment
Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov
(Submitted on 26 Jul 2017)
https://arxiv.org/abs/1707.08347
The classical approach trains a
deep CNN regressor directly on
the ground-truth. Our approach
trains a network from an image
ranking dataset. These ranked
images can be easily generated
by applying distortions of varying
intensities. The network
parameters are then transferred
to the regression network for
finetuning. This allows for the
training of deeper and wider
networks.
Siamese network output for JPEG distortion considering 6 levels. This graphs illustrate
the fact that the Siamese network successfully manages to separate the different
distortion levels.
Blind Deep S3D Image Quality Evaluation via Local to
Global Feature Aggregation
Heeseok Oh ; Sewoong Ahn ; Jongyoo Kim ; Sanghoon Lee
IEEE Transactions on Image Processing ( Volume: 26, Issue: 10, Oct. 2017 )
https://doi.org/10.1109/TIP.2017.2725584
39. Pipelineimage Styling #1
Aesthetics enhancement: High Dynamic Range from SfM
Large scale structure-from-motion (SfM) algorithms have recently
enabled the reconstruction of highly detailed 3-D models of our surroundings
simply by taking photographs. In this paper, we propose to leverage these
reconstruction techniques to automatically estimate the outdoor
illumination conditions for each image in a SfM photo collection. We
introduce a novel dataset of outdoor photo collections, where the ground
truth lighting conditions are known at each image. We also present an
inverse rendering approach that recovers a high dynamic range
estimate of the lighting conditions for each low dynamic range input image.
Our novel database is used to quantitatively evaluate the performance of our
algorithm. Results show that physically plausible lighting estimates can
faithfully be recovered, both in terms of light direction and intensity.
Lighting Estimation in Outdoor Image Collections
Jean-Francois Lalonde (Laval University); Iain Matthews (Disney Research)
3D Vision (3DV), 2014 2nd International Conference on
https://www.disneyresearch.com/publication/lighting-estimation-in-outdoor-image-collections/
https://doi.org/10.1109/3DV.2014.112
The main limitation of our approach is that it can recover precise lighting parameters only when lighting actually creates strongly visible
effects—such as cast shadows, shading differences amongst surfaces of different orientations—on the image. When the camera does not
observe significant lighting variations, for example when the sun is shining on a part of the building that the camera does not observe, or when the
camera only see a very small fraction of the landmark with little geometric details, our approach recovers a coarse estimate of the full lighting
conditions. In addition, our approach is sensitive to errors in geometry estimation, or to the presence of unobserved, nearby objects.
Because it does not know about these objects, our method tries to explain their cast shadows with the available geometry, which may result in
errors. Our approach is also sensitive to inter-reflections. Incorporating more sophisticated image formation models such as radiosity could
help alleviating this problem, at the expense of significantly more computation. Finally, our approach relies on knowledge of the camera
exposure and white balance settings, which might be less applicable to the case of images downloaded on the Internet. We plan to explore
these issues in future work.
Exploring material recognition for estimating
reflectance and illumination from a single image
Michael Weinmann; Reinhard Klein
MAM '16 Proceedings of the Eurographics 2016 Workshop on Material Appearance Modeling
https://doi.org/10.2312/mam.20161253
We demonstrate that reflectance and illumination can be estimated
reliably for several materials that are beyond simple Lambertian
surface reflectance behavior because of exhibiting mesoscopic
effects such as interreflections and shadows.
Shading Annotations in the Wild
Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala
(Submitted on 2 May 2017)
https://arxiv.org/abs/1705.01156
http://opensurfaces.cs.cornell.edu/saw/
We use this data to train a
convolutional neural network
to predict per-pixel shading
information in an image. We
demonstrate the value of our
data and network in an
application to intrinsic
images, where we can reduce
decomposition artifacts
produced by existing
algorithms.
40. Pipelineimage Styling #2A
Aesthetics enhancement: High Dynamic Range #1
Learning High Dynamic Range from
Outdoor Panoramas
Jinsong Zhang, Jean-François Lalonde
(Submitted on 29 Mar 2017 (v1), last revised 8 Aug 2017 (this version, v2))
https://arxiv.org/abs/1703.10200
http://www.jflalonde.ca/projects/learningHDR
Qualitative results on the synthetic dataset.
Top row: the ground truth HDR panorama, middle row: the LDR panorama, and
bottom row: the predicted HDR panorama obtained with our method.
To illustrate dynamic range, each panorama is shown at two exposures, with a factor of 16
between the two. For each example, we show the panorama itself (left column), and the
rendering of a 3D object lit with the panorama (right column). The object is a “spiky
sphere” on a ground plane, seen from above. Our method accurately predicts the extremely
high dynamic range of outdoor lighting in a wide variety of lighting conditions. A tonemapping
of γ = 2.2 is used for display purposes.
Real cameras have non-linear response functions. To simulate this, we randomly sample real camera
response functions from the Database of Response Functions (DoRF) [Grossberg and Nayar, 2003],
and apply them to the linear synthetic data before training.
Examples from our real dataset. For each case, we show the LDR panorama
captured by the Ricoh Theta S camera, a consumer grade point-and-shoot 360º
camera (left), and the corresponding HDR panorama captured by the Canon 5D
Mark III DSLR mounted on a tripod, equipped with a Sigma 8mm fisheye lens
(right, shown at a different exposure to illustrate the high dynamic range).
We present a full end-to-end learning approach to estimate the extremely high
dynamic range of outdoor lighting from a single, LDR 360º panorama. Our main
insight is to exploit a large dataset of synthetic data composed of a realistic virtual
city model, lit with real world HDR sky light probes [Lalonde et al. 2016
http://www.hdrdb.com/] to train a deep convolutional autoencoder
41. Pipelineimage Styling #2b
High Dynamic Range #2: Learn illumination for relighting purposes
Learning to Predict Indoor Illumination from a Single Image
Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde
(Submitted on 1 Apr 2017 (v1), last revised 25 May 2017 (this version, v2))
https://arxiv.org/abs/1704.00090
42. Pipelineimage Styling #3a
Improving photocompositing and relighting of RGB textures
Deep Image Harmonization
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu,
Ming-Hsuan Yang
(Submitted on 28 Feb 2017)
https://arxiv.org/abs/1703.00069
Our method can adjust the appearances of the composite
foreground to make it compatible with the background
region. Given a composite image, we show the harmonized
images generated by Xue et al. (2012), Zhu et al. (2015) and
our deep harmonization network.
The overview of the proposed joint network architecture. Given a composite image and a provided foreground mask, we first pass the input through an encoder
for learning feature representations. The encoder is then connected to two decoders, including a harmonization decoder for reconstructing the harmonized output
and a scene parsing decoder to predict pixel-wise semantic labels. In order to use the learned semantics and improve harmonization results, we concatenate the
feature maps from the scene parsing decoder to the harmonization decoder (denoted as dot-orange lines). In addition, we add skip links (denoted as blue-dot
lines) between the encoder and decoders for retaining image details and textures. Note that, to keep the figure clean, we only depict the links for the harmonization
decoder, while the scene parsing decoder has the same skip links connected to the encoder.
Given an input image (a), our network
can adjust the foreground region
according to the provided mask (b)
and produce the output (c). In this
example, we invert the mask from the
one in the first row to the one in the
second row, and generate
harmonization results that account for
different context and semantic
information.
43. Pipelineimage Styling #3b
Sky is not the limit: semantic-aware sky replacement
YH Tsai, X Shen, Z Lin, K Sunkavalli; Ming-Hsuan Yang
ACM Transactions on Graphics (TOG) - Volume 35 Issue 4, July 2016
https://doi.org/10.1145/2897824.2925942
In order to find proper skies for replacement, we propose a data-driven sky search
scheme based on semantic layout of the input image. Finally, to re-compose the
stylized sky with the original foreground naturally, an appearance transfer method is
developed to match statistics locally and semantically.
Sample sky segmentation results. Given an input image, the FCN generates results that localize the sky well but
contain inaccurate boundaries and noisy segments. The proposed online model refines segmentations that are
complete and accurate, especially around the boundaries (best viewedin color with enlarged images).
Overview of the proposed algorithm. Given an input image, we first utilize the FCN to obtain scene parsing results
and semantic response for each category. A coarse-to-fine strategy is adopted to segment sky regions (illustrated
as the red mask). To find reference images for sky replacement, we develop a method to search images with
similar semantic layout. After re-composing images with the found skies, we transfer visual semantics to match
foreground statistics between the input image and the reference image. Finally, a set of composite images with
different stylized skies are generated automatically.
GP-GAN: Towards Realistic High-Resolution Image Blending
Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
(Submitted on 21 Mar 2017 (v1), last revised 25 Mar 2017 (this version, v2))
https://arxiv.org/abs/1703.07195
Qualitative illustration of high-resolution
image blending. a) shows the composited
copy-and-paste image where the inserted
object is circled out by red lines. Users usually
expect image blending algorithms to make this
image more natural. b) represents the result
based on Modified Poisson image editing [32]. c)
indicates the result from Multi-splines approach.
d) is the result of our method Gaussian-Poisson
GAN (GP-GAN). Our approach produces better
quality images than that from the alternatives in
terms of illumination, spatial, and color
consistencies.
We advanced the state-of-the-art in conditional image generation by combining the ideas from the generative
model GAN, Laplacian Pyramid, and Gauss-Poisson Equation. This combination is the first time a generative
model could produce realistic images in arbitrary resolution. In spite of the effectiveness, our algorithm fails to
generate realistic images when the composited images are far away from the distribution of the training
dataset. We aim to address this issue in future work.
Improving photocompositing and relighting of RGB textures
44. Pipelineimage Styling #3c
Live User-Guided Intrinsic Video for Static Scenes
Abhimitra Meka ; Gereon Fox ; Michael Zollhofer ; Christian Richardt ; Christian Theobalt
IEEE Transactions on Visualization and Computer Graphics ( Volume: PP, Issue: 99 )
https://doi.org/10.1109/TVCG.2017.2734425
Improving photocompositing and relighting of RGB textures
User constraints, in the form of constant shading and reflectance strokes,
can be placed directly on the real-world geometry using an intuitive touch-
based interaction metaphor, or using interactive mouse strokes. Fusing the
decomposition results and constraints in three-dimensional space allows for
robust propagation of this information to novel views by re-projection.
We propose a novel approach for live, user-guided intrinsic video decomposition. We
first obtain a dense volumetric reconstruction of the scene using a commodity RGB-D
sensor. The reconstruction is leveraged to store reflectance estimates and user-provided
constraints in 3D space to inform the ill-posed intrinsic video decomposition problem. Our
approach runs at real-time frame rates, and we apply it to applications such as relighting,
recoloring and material editing.
Our novel user-guided intrinsic video approach enables real-time applications such
as recoloring, relighting and material editing.
Constant reflectance strokes improve the decomposition by moving the high-frequency shading of the cloth to the shading layer.
Comparison to state-of-the-art intrinsic video decomposition techniques on the ‘girl’ dataset. Our approach matches the real-time
performance of Meka et al. (2016), while achieving the same quality as previous off-line techniques
45. Pipelineimage Styling #4
Beyond low-level style transfer for high-level manipulation
Generative Semantic Manipulation
with Contrasting GAN
Xiaodan Liang, Hao Zhang, Eric P. Xing
(Submitted on 1 Aug 2017)
https://arxiv.org/abs/1708.00315
Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired
image-to-image translation, such as photo sketch and artist painting style transfer. However, existing models→
can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-
level semantic meanings (e.g., geometric structure or content) of objects.
Some example semantic manipulation results by our model, which takes one image
and a desired object category (e.g. cat, dog) as inputs and then learns to
automatically change the object semantics by modifying their appearance or
geometric structure. We show the original image (left) and manipulated result (right)
in each pair.
Although our method can achieve compelling results in many semantic manipulation tasks, it shows
little success for some cases which require very large geometric changes, such as car truck and↔
car bus. Integrating spatial transformation layers for explicitly learning pixel-wise offsets may help↔
resolve very large geometric changes. To be more general, our model can be extended to replace the
mask annotations with the predicted object masks or automatically learned attentive regions via
attention modeling. This paper pushes forward the research of unsupervised setting by
demonstrating the possibility of manipulating high-level object semantics rather than the low-level
color and texture changes as previous works did. In addition, it would be more interesting to develop
techniques that are able to manipulate object interactions and activities in images/videos for the
future work.
46. Pipelineimage Styling #5A
Aesthetics enhancement: Style Transfer | Introduction #1
Neural Style Transfer: A Review
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song
(Submitted on 11 May 2017)
https://arxiv.org/abs/1705.04058
A list of mentioned papers in this review, corresponding codes and pre-trained models are
publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers
One of the reasons why Neural Style Transfer catches eyes in both academia and industry is its
popularity in some social networking sites (e.g., Twitter and Facebook). A mobile application
Prisma [36] is one of the first industrial applications that provides the Neural Style Transfer
algorithm as a service. Before Prisma, the general public almost never imagines that one day
they are able to turn their photos into art paintings in only a few minutes. Due to its high
quality, Prisma achieved great success and is becoming popular around the world.
Another use of Neural Style Transfer is to act as user-assisted creation tools.
Although, to the best of our knowledge, there are no popular applications that
applied the Neural Style Transfer technique in creation tools, we believe that it will
be a promising potential usage in the future. Neural Style Transfer is capable of
acting as a creation tool for painters and designers. Neural Style Transfer makes
it more convenient for a painter to create an artifact of a specific style, especially
when creating computer-made fine art images. Moreover, with Neural Style Transfer
algorithms it is trivial to produce stylized fashion elements for fashion designers and
stylized CAD drawings for architects in a variety of styles, which is costly to
produce them by hand.
47. Pipelineimage Styling #5b
Aesthetics enhancement: Style Transfer | Introduction #2
Neural Style Transfer: A Review
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song
(Submitted on 11 May 2017)
https://arxiv.org/abs/1705.04058
A list of mentioned papers in this review, corresponding codes and pre-trained models are
publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers
Promising directions for future research in Neural Style Transfer mainly focus on two
aspects. The first aspect is to solve the existing aforementioned challenges for current
algorithms, i.e., problem of parameter tuning, problem of stroke orientation control and
problem existing in “Fast” and “Faster” Neural Style Transfer algorithms. The second aspect of
promising directions is to focus on new extensions to Neural Style Transfer (e.g., Fashion
Style Transfer and Character Style Transfer). There are already some preliminary work related
with this direction, such as the recent work of Yang et al. (2016) on Text Effects Transfer.
These interesting extensions may become trending topics in the future and related new areas
may be created subsequently.
48. Pipelineimage Styling #5C
Aesthetics enhancement: Video Style Transfer
DeepMovie: Using Optical Flow and
Deep Neural Networks to Stylize Movies
Alexander G. Anderson, Cory P. Berg, Daniel P. Mossing,
Bruno A. Olshausen (Submitted on 26 May 2016)
https://arxiv.org/abs/1605.08153
https://github.com/anishathalye/neural-style
Coherent Online Video Style Transfer
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua
(Submitted on 27 Mar 2017 (v1), last revised 28 Mar 2017 (this version, v2))
https://arxiv.org/abs/1703.09211
The main contribution of this paper is to use optical flow to initialize the
style transfer optimization so that the texture features move with the
objects in the video. Finally, we suggest a method to incorporate optical
flow explicitly into the cost function.
Overview of Our Approach: We begin by applying the style transfer algorithm to the first frame of the
movie using the content image as the initialization. Next, we calculate the optical flow field that takes the
first frame of the movie to the second frame. We apply this flow-field to the rendered version of the first
frame and use that as the initialization for the style transfer optimization for the next frame. Note, for
instance, that a blue pixel in the flow field image means that the underlying object in the video at that pixel
moved to the left from frame one to frame two. Intuitively, in order to apply the flow field to the styled
image, you move the parts of the image that have a blue pixel in the flow field to the left.
We propose the first end-to-end network for online video style transfer, which generates temporally coherent
stylized video sequences in near real-time. Two key ideas include an efficient network by incorporating short-
term coherence, and propagating short-term coherence to long-term, which ensures the consistency over
larger period of time. Our network can incorporate different image stylization networks. We show that the
proposed method clearly outperforms the per-frame baseline both qualitatively and quantitatively. Moreover, it
can achieve visually comparable coherence to optimization-based video style transfer, but is three orders of
magnitudes faster in runtime.
There are still some limitations in our
method. For instance, limited by the
accuracy of ground-truth optical flow
(given by DeepFlow2 [Weinzaepfel et al. 2013]
),
our results may suffer from some
incoherence where the motion is too
large for the flow to track. And after
propagation over a long period, small
flow errors may accumulate, causing
blurriness. These open questions are
interesting for further exploration in the
future work.
49. Pipelineimage Styling #6A
Aesthetics enhancement: Texture synthesis and upsampling
TextureGAN: Controlling Deep Image
Synthesis with Texture Patches
Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu,
James Hays
(Submitted on 9 Jun 2017)
https://arxiv.org/abs/1706.02823
TextureGAN pipeline.
A feed-forward
generative network is
trained end-to-end to
directly transform a 4-
channel input to a high-
res photo with realistic
textural details.
Photo-realistic Facial Texture Transfer
Parneet Kaur, Hang Zhang, Kristin J. Dana
(Submitted on 14 Jun 2017)
https://arxiv.org/abs/1706.04306
Overview of our method. Facial identity is preserved
using Facial Semantic Regularization which
regularizes the update of meso-structures using a
facial prior and facial semantic structural loss.
Texture loss regularizes the update of local textures
from the style image. The output image is initialized
with the content image and updated at each iteration
by back-propagating the error gradients for the
combined losses. Content/style photos: Martin
Scheoller/Art+Commerce.
Identity-preserving Facial Texture Transfer (FaceTex).
The textural details are transferred from style image to
content image while preserving its identity. FaceTex
outperforms existing methods perceptually as well as
quantitatively. Column 3 uses input 1 as the style image
and input 2 as the content. Column 4 uses input 1 as
the content image and input 2 as the style image.
Figure 3 shows more examples and comparison with
existing methods. Input photos: Martin
Scheoller/Art+Commerce.
50. Pipelineimage Styling #6B
Aesthetics enhancement: Texture synthesis with style transfer
Stable and Controllable Neural Texture Synthesis
and Style Transfer Using Histogram Losses
Eric Risser, Pierre Wilmot, Connelly Barnes
Artomatix, University of Virginia
(Submitted on 31 Jan 2017 (v1), last revised 1 Feb 2017 (this version, v2))
https://arxiv.org/abs/1701.08893
Our style transfer and texture synthesis results. The input styles are
shown in (a), and style transfer results are in (b, c). Note that the angular
shapes of the Picasso painting are successfully transferred on the top row,
and that the more subtle brush strokes are transferred on the bottom row.
The original content images are inset in the upper right corner. Unless
otherwise noted, our algorithm is always run with default parameters (we do
not manually tune parameters). Input textures are shown in (d) and texture
synthesis results are in (e). For the texture synthesis, note that the algorithm
synthesizes creative new patterns and connectivities in the output. Different statistics that can be used for neural network texture synthesis.
51. Pipelineimage Styling #6C
Aesthetics enhancement: Enhancing texture maps
Depth Texture Synthesis for Realistic
Architectural Modeling
Félix Labrie-Larrivée ; Denis Laurendeau ; Jean-François Lalonde
Computer and Robot Vision (CRV), 2016 13th Conference on
https://doi.org/10.1109/CRV.2016.77
In this paper, we present a novel approach that
improves the resolution and geometry of 3D
meshes of large scenes with such repeating
elements. By leveraging structure from motion
reconstruction and an off-the-shelf depth sensor,
our approach captures a small sample of the scene
in high resolution and automatically extends that
information to similar regions of the scene.
Using RGB and SfM depth information as a guide
and simple geometric primitives as canvas, our
approach extends the high resolution mesh by
exploiting powerful, image-based texture synthesis
approaches. The final results improves on
standard SfM reconstruction with higher detail.
Our approach benefits from reduced manual
labor as opposed to full RGBD reconstruction, and
can be done much more cheaply than with LiDAR-
based solutions.
In the future, we plan to work on a more
generalized 3D texture synthesis
procedure capable of synthesizing a more
varied set of objects, and able to
reconstruct multiple parts of the scene by
exploiting several high resolution scan
samples at once in an effort to address the
tradeoff mentioned above. We also plan to
improve the robustness of the approach
to a more varied set of large scale scenes,
irrespective of the lighting conditions,
material colors, and geometric
configurations. Finally, we plan to evaluate
how our approach compares to SfM on a
more quantitative level by leveraging
LiDAR data as ground truth.
Overview of the data collection and alignment procedure. Top row: a collection of photos of the scene is acquired with a typical camera, and used to generate a
point cloud via SfM [Agarwal et al. 2009] and dense multi-view stereo (MVS) [ Furukawa and Ponce, 2012]. Bottom row: a repeating feature of the scene (in this
example, the left-most window) is recorded with a Kinect sensor, and reconstructed into a high resolution mesh via the RGB-D SLAM technique KinectFusion [
Newcombe et al. 2011]. The mesh is then automatically aligned to the SfM reconstruction using bundle adjustment and our automatic scale adaptation
technique (see sec. III-C). Right: the high resolution Kinect mesh is correctly aligned to the low resolution SfM point cloud
52. Pipelineimage Styling #6D
Aesthetics enhancement: Towards photorealism with good maps
One Ph.D. position (supervision by Profs Niessner and Rüdiger
Westermann) is available at our chair in the area of photorealistic rendering
for deep learning and online reconstruction
Research in this project includes the development of photorealistic realtime rendering
algorithms that can be used in deep learning applications for scene understanding, and for
high-quality scalable rendering of point scans from depth sensors and RGB stereo image
reconstruction. If you are interested in applying, you should have a strong background in
computer science, i.e., efficient algorithms and data structures, and GPU programming,
have experience implementing C/C++ algorithms, and you should be excited to work on
state-of-the-art research in the 3D computer graphics.
https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le
arning-and-online-reconstruction.html
Ph.D. Position – Photorealistic Rendering for
Deep Learning and Online Reconstruction
Photorealism Explained
Blender Guru Published on May 25, 2016
http://www.blenderguru.com/tutorials/photorealism-explained/
https://youtu.be/R1-Ef54uTeU
Stop wasting time creating
texture maps by hand. All
materials on Poliigon come
with the relevant normal,
displacement, reflection and
gloss maps included. Just
plug them into your
software, and your material
is ready to render.
https://www.poliigon.com/
How to Make
Photorealistic PBR
Materials - Part 1
Blender Guru Published
on Jun 28, 2016
http://www.blenderguru.com/tutoria
ls/pbr-shader-tutorial-pt1/
https://youtu.be/V3wghb
Z-Vh4?t=24m5s
Physically Based Rendering (PBR)
53. Pipelineimage Styling #7
Styling line graphics (e.g. floorplans, 2D CADs) and monochrome images e.g. for desired visual identity
Real-Time User-Guided Image Colorization with
Learned Deep Priors
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
(Submitted on 8 May 2017)
https://arxiv.org/abs/1705.02999
Our proposed method colorizes a grayscale image (left), guided by sparse user inputs (second), in real-time,
providing the capability for quickly generating multiple plausible colorizations (middle to right). Photograph of
Migrant Mother by Dorothea Lange, 1936 (Public Domain).
Network architecture We train two variants of the user interaction colorization network. Both variants use the blue layers for predicting
a colorization. The Local Hints Network also uses red layers to (a) incorporate user points Ul and (b) predict a color distribution bZ. The
Global Hints Network uses the green layers, which transforms global input Uд by 1 × 1 conv layers, and adds the result into the main
colorization network. Each box represents a conv layer, with vertical dimension indicating feature map spatial resolution, and horizontal
dimension indicating number of channels. Changes in resolution are achieved through subsampling and upsampling operations. In the
main network, when resolution is decreased, the number of feature channels are doubled. Shortcut connections are added to
upsampling convolution layers.
Style Transfer for Anime Sketches with
Enhanced Residual U-net and Auxiliary
Classifier GAN
Lvmin Zhang, Yi Ji, Xin Lin
(Submitted on 11 Jun 2017 (v1), last revised 13 Jun 2017 (this version, v2))
https://arxiv.org/abs/1706.03319
Examples of combination results on sketch images (top-left) and style images
(bottom-left). Our approach automatically applies the semantic features of an existing
painting to an unfinished sketch. Our network has learned to classify the hair, eyes,
skin and clothes, and has the ability to paint these features according to a sketch.
In this paper, we integrated residual U-net to apply the style to the grayscale sketch
with auxiliary classifier generative adversarial network (AC-GAN, Odena et al. 2016).
The whole process is automatic and fast, and the results are creditable in the quality
of art style as well as colorization
Limitation: the pretrained VGG is for ImageNet photograph classification, but not for
paintings. In the future, we will train a classification network only for paintings to
achieve better results. Furthermore, due to the large quantity of layers in our residual
network, the batch size during training is limited to no more than 4. It remains for
future study to reach a balance between the batch size and quantity of layers.
+
55. PipelineDepth image enhancement #1a
Image Formation #1
Pinhole Camera Model: ideal projection of a 3D
object on a 2D image. Fernandez et al. (2017)
Dot patterns of a Kinect for Windows (a) and two Kinects for Xbox (b) and (c)
are projected on a flat wall from a distance of 1000 mm. Note that the
projection of each pattern is similar, and related by a 3-D rotation depending
on the orientation of the Kinect diffuser installation. The installation variability
can clearly be observed from differences in the bright dot locations (yellow
stars), which differ by an average distance of 10 pixels. Also displayed in (d) is
the idealized binary replication of the Kinect dot pattern [Kinect Patter Uncovered]
, which
was used in this project to simulate IR images. - Landau et al. (2016)
Landau et al. (2016)
56. PipelineDepth image enhancement #1b
Image Formation #2
Characterizations of Noise in Kinect
Depth Images: A Review
Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar
IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 )
https://doi.org/10.1109/JSEN.2014.2309987
Kinect outputs for a scene. (a) RGB Image. (b) Depth data rendered as an 8-
bit gray-scale image with nearer depth values mapped to lower intensities.
Invalid depth values are set to 0. Note the fixed band of invalid (black) pixels
on left. (c) Depth image showing too near depths in blue, too far depths in red
and unknown depths due to highly specular objects in green. Often these are
all taken as invalid zero depth.
Shadow is created in a depth image (Yu et al. 2013) when the incident IR
from the emitter gets obstructed by an object and no depth can be
estimated.
PROPERTIES OF IR LIGHT [Rose]
57. Pipeline Depth image enhancement #1c
Image Formation #3
Authors’ experiments on structural noise using a plane in 400 frames.
(a) Error at 1.2m. (b) Error at 1.6m. (c) Error at 1.8m.
Smisek et al. (2013) calibrate a Kinect against a stereo-rig
(comprising two Nikon D60 DSLR cameras) to estimate and
improve its overall accuracy. They have taken images and
fitted planar objects at 18 different distances (from 0.7 to 1.3
meters) to estimate the error between the depths measured
by the two sensors. The experiments corroborate that the
accuracy varies inversely with the square of depth [2].
However, even after the calibration of Kinect, the procedure
still exhibits relatively complex residual errors (Fig. 8).
Fig. 8. Residual noise of a plane. (a) Plane at 86cm. (b) Plane
at 104cm.
Authors’ experiments on temporal noise. Entropy and SD
of each pixel in a depth frame over 400 frames for a
stationary wall at 1.6m. (a) Entropy image. (b) SD image.
Authors’ experiments with vibrating noise showing
ZD samples as white dots. A pixel is taken as noise if
it is zero in frame i and nonzero in frames i±1. Note
that noise follows depth edges and shadow. (a)
Frame (i−1). (b) Frame i. (c) Frame (i+1). (d) Noise for
frame i.
58. PipelineDepth image enhancement #1d
Image Formation #4
The filtered intensity samples generated from
unsaturated IR dots (blue dots) were used to fit the
intensity model (red line), which follows an inverse
square model for the distance between the sensor
and the surface point Landau et al. (2016)
(a) Multiplicative
speckle distribution
is unitless, and can
be represented as a
gamma distribution
Γ (4.54, 0.196). (b)
Additive detector
noise distribution
can be represented
as a normal
distribution Ν
(−0.126, 10.4), and
has units of 10-bit
intensity.
Landau et al. (2016)
The standard error in depth estimation (mm)
as a function of radial distance (pix) is plotted
for the (a) experimental and (b) simulated
data sets of flat walls at various depths (mm).
The experimental standard depth error
increases faster with an increase in radial
distance due to lens distortion.
Landau et al. (2016)
59. PipelineDepth image enhancement #2A
Metrological Calibration #1
A New Calibration Method for
Commercial RGB-D Sensors
Walid Darwish, Shenjun Tang, Wenbin Li and Wu Chen
Sensors 2017, 17(6), 1204; doi:10.3390/s17061204
Based on these calibration algorithms, different
calibration methods have been implemented and tested.
Methods include the use of 1D [Liu et al. 2012]
2D [Shibo and Qing 2012]
,
and 3D [Gui et al. 2014]
calibration objects that work with the
depth images directly; calibration of the manufacture
parameters of the IR camera and projector [Herrera et al. 2012]
;
or photogrammetric bundle adjustments used to model
the systematic errors of the IR sensors [
Davoodianidaliki and Saadatseresht 2013; Chow and Lichti 2013]
. To enhance the
depth precision, additional depth error models are
added to the calibration procedure [7,8,21,22,23].
All of these error models are used to compensate only
for the distortion effect of the IR projector and camera.
Other research works have been conducted to obtain
the relative calibration between an RGB camera and an
IR camera by accessing the IR camera [24,25,26]. This
can achieve relatively high accuracy calibration
parameters for a baseline between IR and RGB cameras,
while the remaining limitation is that the distortion
parameters for the IR camera cannot represent the full
distortion effect for the depth sensor.
This study addressed these issues using a two-step
calibration procedure to calibrate all of the geometric
parameters of RGB-D sensors. The first step was related
to the joint calibration between the RGB and IR cameras,
which was achieved by adopting the procedure
discussed in [27] to compute the external baseline
between the cameras and the distortion parameters of
the RGB camera. The second step focused on the depth
sensor calibration.
Point cloud of two perpendicular planes (blue
color: default depth; red color: modeled depth):
highlighted black dashed circles shows the
significant impact of the calibration method on the
point cloud quality.
The main difference between both sensors is the
baseline between the IR camera and projector. The
longer the sensor’s baseline, the longer working
distance can be achieved. The working range of
Kinect v1 is 0.80 m to 4.0 m, while it is 0.35 m to 3.5
m for Structure Sensor.
60. PipelineDepth image enhancement #2A
Metrological Calibration #2
Photogrammetric Bundle
Adjustment With Self-
Calibration of the PrimeSense
3D Camera Technology:
Microsoft Kinect
IEEE Access ( Volume: 1 ) 2013
https://doi.org/10.1109/ACCESS.2013.2271860
Roughness of point cloud before calibration. (Bottom) Roughness of point
cloud after calibration. The colours indicate the roughness as measured
by the normalized smallest eigenvalue.
Estimated Standard Deviation of the Observation Residuals
To quantify the external accuracy of the Kinect and the benefit of the proposed calibration,
a target board located at 1.5–1.8 m away with 20 signalized targets was imaged using an in-
house program based on the Microsoft Kinect SDK and with RGBDemo. Spatial distances
between the targets were known from surveying using the FARO Focus3D terrestrial laser
scanner with a standard deviation of 0.7 mm. By comparing the 10 independent spatial
distances measured by the Kinect to those made by the Focus3D, the RMSE was 7.8 mm
using RGBDemo and 3.7 mm using the calibrated Kinect results; showing a 53%
improvement to the accuracy. This accuracy check assesses the quality of all the imaging
sensors and not just the IR camera-projector pair alone.
The results show improvements in geometric accuracy up to 53%
compared with uncalibrated point clouds captured using the popular
software RGBDemo. Systematic depth discontinuities were also
reduced and in the check-plane analysis the noise of the Kinect point
cloud was reduced by 17%.
61. PipelineDepth image enhancement #2B
Metrological Calibration #3
Evaluating and Improving the Depth Accuracy of
Kinect for Windows v2
Lin Yang ; Longyu Zhang ; Haiwei Dong ; Abdulhameed
Alelaiwi ; Abdulmotaleb El Saddik
IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015)
https://doi.org/10.1109/JSEN.2015.2416651
Illustration of accuracy assessment of Kinect v2. (a) Depth accuracy. (b) Depth
resolution. (c) Depth entropy. (d) Edge noise. (e) Structural noise. The target plates in (a-
c) and (d-e) are parallel and perpendicular with the depth axis, respectively.
Accuracy error distribution
of Kinect for Windows v2.
62. PipelineDepth image enhancement #2c
A Comparative Error Analysis of Current
Time-of-Flight Sensors
IEEE Transactions on Computational Imaging (Volume: 2, Issue: 1, March 2016)
https://doi.org/10.1109/TCI.2015.2510506
For evaluating the presence of wiggling, ground truth distance
information is required. We calculate the true distance by setting
up a stereo camera system. This system consists of the ToF camera
to be evaluated and a high resolution monochrome camera (IDS
UI-1241LE7) which we call the reference camera.
The cameras are calibrated with Zhang (2000)’s algorithm with
point correspondences computed with ROCHADE (
Placht et al. 2014). Ground truth is calculated by intersecting the
rays of all ToF camera pixels with the 3D plane of the
checkerboard. For higher accuracy, we compute this plane from
corners detected in the reference image and transform the plane
into the coordinate system of the ToF camera
This experiment aims to quantify the so-
called amplitude-related distance error and
also to show that this effect is not related to
scattering. This effect can be observed when
looking at a planar surface with high
reflectivity variations. With some sensors the
distance measurements for pixels with
different amplitudes do not lie on the same
plane, even though they should.
To the best of our knowledge no evaluation
setup has been presented for this error
source so far. In the past this error has been
typically observed with images of
checkerboards or other high contrast
patterns. However, the analysis of single
images allows no differentiation between
amplitude-related errors andinternal
scattering
Metrological Calibration #4
63. PipelineDepth image enhancement #2c
Metrological Calibration #5
Low-Cost Reflectance-Based Method for the
Radiometric Calibration of Kinect 2
IEEE Sensors Journal ( Volume: 16, Issue: 7, April1, 2016 )
https://doi.org/10.1109/JSEN.2015.2508802
In this paper, a reflectance-based radiometric
method for the second generation of gaming
sensors, Kinect 2, is presented and discussed. In
particular, a repeatable methodology generalizable
to different gaming sensors by means of a calibrated
reference panel with Lambertian behavior is
developed.
The relationship between the received power and
the final digital level is obtained by means of a
combination of linear sensor relationship and
signal attenuation, into a least squares adjustment
with an outlier detector. The results confirm that
the quality of the method (standard deviation better
than 2% in laboratory conditions and discrepancies
lower than 7% b) is valid for exploiting the
radiometric possibilities of this low-cost sensor,
which ranges from the pathological analysis
(moisture, crusts, etc.…); to agricultural and forest
resource evaluation.
3D data acquired with Kinect 2 (left) and digital
number (DN) distribution (right) for the reference
panel at 0.7 m (units: counts).
Visible-RGB view of the brick wall (a), intensity-IR
digital levels (DN) (b-d) and calibrated reflectance
values (e-g) for the three acquisition distances
The objective of this paper was to develop a radiometric calibration equation of an IR projector-
camera for the second generation of gaming sensors, Kinect 2, to convert the recorded
digital levels into physical values (reflectance). By the proposed equation, the reflectance
properties of the IR projector-camera set of Kinect 2 were obtained. This new equation will
increase the number of application fields of gaming sensors, favored by the possibility of
working outdoors.
The process of radiometric calibration should be incorporated as part of an integral process
where the geometry obtained is also corrected (i.e., lens distortion, mapping function, depth
errors, etc.). As future perspectives, the effects of the diffuse radiance, which does not belong to
the sensor footprint and contaminate the received signal, will be evaluated to determine the
error budget in the active sensor.
64. PipelineDepth image enhancement #3
‘Old-school’ depth refining techniques
Depth enhancement with improved exemplar-based
inpainting and joint trilateral guided filtering
Liang Zhang ; Peiyi Shen ; Shu'e Zhang ; Juan Song ; Guangming Zhu
Image Processing (ICIP), 2016 IEEE International Conference on
https://doi.org/10.1109/ICIP.2016.7533131
In this paper, a novel depth enhancement algorithm with improved
exemplar-based inpainting and joint trilateral guided filtering is
proposed. The improved examplar-based inpainting method is
applied to fill the holes in the depth images, in which the level set
distance component is introduced in the priority evaluation function.
Then a joint trilateral guided filter is adopted to denoise and smooth
the inpainted results. Experimental results reveal that the proposed
algorithm can achieve better enhancement results compared with the
existing methods in terms of subjective and objective quality
measurements.
Robust depth enhancement and optimization based on
advanced multilateral filters
Ting-An ChangYang-Ting ChouJar-Ferr Yang
EURASIP Journal on Advances in Signal Processing December 2017, 2017:51
https://doi.org/10.1186/s13634-017-0487-7
Results of the depth enhancement coupled with hole filling results obtainedby a noisy depth map, b joint bilateral filter (JBF) [16
], c intensity guided depth superresolution (IGDS) [39], d compressive sensing based depth upsampling (CSDU) [40], e adaptive
joint trilateral filter (AJTF) [18], and f the proposed AMF for Art, Books, Doily, Moebius, RGBD_1, and RGBD_2
65. PipelineDepth image enhancement #4A
Deep learning-based depth refining techniques
DepthComp : real-time depth image completion based on
prior semantic scene segmentation
Atapour-Abarghouei, A. and Breckon, T.P.
28th British Machine Vision Conference (BMVC) 2017 London, UK, 4-7 September 2017.
http://dro.dur.ac.uk/22375/
Exemplar results on the KITTI dataset. S denotes the segmented images [3] and D the original (unfilled) disparity maps.
Results are compared with [1, 2, 29, 35, 45]. Results of cubic and linear interpolations are omitted due to space.
Comparison of the proposed method using different initial segmentation techniques on the KITTI dataset [27].
Original color and disparity image (top-left), results with manual labels (top-right), results with SegNet [3] (bottom-left)
and results with mean-shift [26] (bottom-right).
Fast depth image denoising and enhancement using a deep
convolutional network
Xin Zhang and Ruiyuan Wu
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEE
https://doi.org/10.1109/ICASSP.2016.7472127
66. PipelineDepth image enhancement #4b
Deep learning-based depth refining techniques
Guided deep network for depth map super-resolution: How
much can color help?
Wentian Zhou ; Xin Li ; Daryl Reynolds
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE
https://doi.org/10.1109/ICASSP.2017.7952398
https://anvoy.github.io/publication.html
Depth map upsampling using joint edge-guided
convolutional neural network for virtual view synthesizing
Yan Dong; Chunyu Lin; Yao Zhao; Chao Yao
Journal of Electronic Imaging Volume 26, Issue 4
http://dx.doi.org/10.1117/1.JEI.26.4.043004
Depth map upsampling. Input: (a) low-resolution depth map and (b) the corresponding color image.
Output: (c) recovered high-resolution depth map.
When the depth edges become unreliable, our network
tends to rely on color-based prediction network (CBPN) for
restoring more accurate depth edges. Therefore,
contribution of color image increases when the reliability of
the LR depth map decreases (e.g., as noise gets stronger).
We adopt the popular deep CNN to learn non-linear
mapping between LR and HR depth maps. Furthermore, a
novel color-based prediction network is proposed to
properly exploit supplementary color information in
addition to the depth enhancement network.
In our experiments, we have shown that deep neural
network based approach is superior to several existing
state-of-the-art methods. Further comparisons are
reported to confirm our analysis that the contributions of
color image vary significantly depending on the reliability
of LR depth maps.
68. PipelineLaser range Finding #1a
Versatile Approach to Probabilistic
Modeling of Hokuyo UTM-30LX
IEEE Sensors Journal ( Volume: 16, Issue: 6, March15, 2016 )
https://doi.org/10.1109/JSEN.2015.2506403
When working with Laser Range Finding (LRF), it is
necessary to know the principle of sensor’s
measurement and its properties. There are several
measurement principles used in LRFs [Nejad and Olyaee 2006], [
Łabęcki et al. 2012], [Adams 1999]
:
● Triangulation
● Time of flight (TOF)
● Frequency modulation continuous wave (FMCW)
● Phase shift measurement (PSM)
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Distance measurement principle of time-of-flight laser
scanners (top) and phase based laser scanners
(bottom).
Laser Range Finding : Image formation #1
69. PipelineLaser range Finding #1b
Laser Range Finding : Image formation #2
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Two ways link budget between the receiver (Rx) and
the transmitter (Tx) in a Free Space Path (FSP)
propagation model.
Schematic representation of the signal propagation from
the transmitter to the receiver.
Effect of increasing incidence angle and range to the signal
deterioration. (left) Plot of the the signal deterioration due
to increasing incidence angle α, (right) plot of the signal
deterioration due to increasing ranges ρ, with ρmin
= 0 m
and ρmax
= 100 m
Relationship between scan angle and normal
vector orientation used for the segmentation of
point cloud with respect to planar features. A
point P = [ , , ]θ ϕ ρ is measured on the plane with
the normal parameters N = [ , , ]α β γ . Different
angles used for the range image gradients are
plotted
Theoretical number of points. Practical
example of a plate of 1×1 m placed at 3 m, oriented
at 0º and being rotated at 60º.
Theoretical number of points. (left) Number of
points with respect to the orientation of the patch
and the distance.
Reference plate
measurement set-up. A white
coated plywood board is
mounted on a tripod via a
screw clamp mechanism
provided with a 2º precision
goniometer.
70. PipelineLaser range Finding #1c
Laser Range Finding : Image formation #3
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Terrestrial Laser Scanning (TLS) good practice of survey planning
Future directions At the time this research
started, terrestrial laser scanners were mainly
being used by research institutes and
manufacturers. However, nowadays, terrestrial
laser scanners are present in almost every field
of work, e.g. forensics, architecture, civil
engineering, gaming industry, movie industry.
Mobile mapping systems, such as scanners
capturing a scene while driving a car, or
scanners mounted on drones are currently
making use of the same range determination
techniques used in terrestrial laser scanners.
The number of applications that make use of
3D point clouds is rapidly growing. The need
for a sound quality product is even more
significant as it impacts the quality of a huge
panel of end-products.
71. PipelineLaser range Finding #1D
Laser Range Finding : Image formation #4
Ray-Tracing Method for Deriving
Terrestrial Laser Scanner Systematic
Errors
Derek D. Lichti, Ph.D., P.Eng.
Journal of Surveying Engineering | Volume 143 Issue 2 - May 2017
https://www.doi.org/10.1061/(ASCE)SU.1943-5428.0000213
Error model of direct georeferencing
procedure of terrestrial laser scanning
Pandžić, Jelena; Pejić, Marko; Božić, Branko; Erić, Verica
Automation in Construction Volume 78, June 2017, Pages 13-23
https://doi.org/10.1016/j.autcon.2017.01.003
72. PipelineLaser range Finding #2A
Calibration #1
Statistical Calibration Algorithms for Lidars
Anas Alhashimi, Luleå University of Technology, Control Engineering
Licentiate thesis (2016), ORCID iD: 0000-0001-6868-2210
A rigorous cylinder-based self-calibration approach for
terrestrial laser scanners
Ting On Chan; Derek D. Licht; David Belton
ISPRS Journal of Photogrammetry and Remote Sensing; Volume 99, January 2015
https://doi.org/10.1016/j.isprsjprs.2014.11.003
The proposed method and its variants were first applied to two simulated datasets, to compare
their effectiveness, and then to three real datasets captured by three different types of scanners
are presented: a Faro Focus 3D (a phase-based panoramic scanner); a Velodyne HDL-32E (a
pulse-based multi spinning beam scanner); and a Leica ScanStation C10 (a dual operating-mode
scanner).
In situ self-calibration is essential for
terrestrial laser scanners (TLSs) to
maintain high accuracy for many
applications such as structural
deformation monitoring (Lindenbergh, 2010)
. This
is particularly true for aged TLSs and
instruments being operated for long hours
outdoors with varying environmental
conditions.
Although the plane-based methods are now widely adopted for TLS
calibration, they also suffer from the problem of high parameter correlation
when there is a low diversity in the plane orientations (Chow et al., 2013). In
practice, not all locations possess large and smooth planar features that can be
used to perform a calibration. Even though planar features are available, their
planarity is not always guaranteed. Because of the drawbacks to the point-
based and plane-based calibrations, an alternative geometric feature, namely
circular cylindrical features (e.g. Rabbani et al., 2007), should be considered
and incorporated in to the self-calibration procedure.
Estimate d without being aware of the mode hopping,
i.e., assuming a certain λ0
without actually knowing that
the average λ jumps between different lasing modes,
reflects thus in a multimodal measurement of d
Potential temperature-bias dependencies for the
polynomial model.
The plot explaining the cavity
modes, gain profile and lasing
modes for typical laser diode. The
upper drawing shows the
wavelength v1
as the dominant
lasing mode while the lower drawing
shows how both wavelengths v1
and
v2
are competing; this latter case is
responsible for the mode-hopping
effects.
73. PipelineLaser range Finding #2b
Calibration #2
Calibration of a multi-beam Laser System
by using a TLS-generated Reference
Gordon, M.; Meidow, J.
ISPRS Annals of Photogrammetry, Remote Sensing and Spatial
Information Sciences, Volume II-5/W2, 2013, pp.85-90
http://dx.doi.org/10.5194/isprsannals-II-5-W2-85-2013
Extrinsic calibration of a multi-beam LiDAR
system with improved intrinsic laser parameters
using v-shaped planes and infrared images
Po-Sen Huang ; Wen-Bin Hong ; Hsiang-Jen Chien ; Chia-Yen Chen
IVMSP Workshop, 2013 IEEE 11th
https://doi.org/10.1109/IVMSPW.2013.6611921
Velodyne HDL-64E S2, the
LiDAR system studied in this
work , for example, is a mobile
scanner consisting of 64 pairs of
laser emitter-receiver which are
rigidly attached to a rotating
motor and provides real-time
panoramic range data with
measurement errors of around
2.5 mm.
In this paper we propose a
method to use IR images as
feedbacks in finding
optimized intrinsic and
extrinsic parameters of the
LiDAR-vision scanner.
First, we apply the IR-based calibration technique to a LiDAR system that
fires multiple beams, which significantly increases the problem's
complexity and difficulty. Second, the adjustment of parameters is applied
to not only the extrinsic parameters, but also the laser parameters as well
as the intrinsic parameters of the camera. Third, we use two different
objective functions to avoid generalization failure of the optimized
parameters.
It is assumed that the accuracy of this point clouds is considerably higher than that from the multi-
beam LIDAR and that the data represent faces of man-made objects at different distances. We
inspect the Velodyne HDL-64E S2 system as the best-known representative for this kind of sensor
system, while Z+F Imager 5010 serve as reference data. Beside the improvement of the point
accuracy by considering the calibration results, we test the significance of the parameters related to
the sensor model and consider the uncertainty of measurements w.r.t. the measured distances.
Standard deviation of the planar misclosure is
nearly halved from 3.2 cm to 1.7 cm. The variance
component estimation as well as the standard
deviation of the range residuals reveal that the
manufactures standard deviation of the distance
accuracy with 2 cm is a bit too optimistic.
The histograms of the planar misclosures and the residuals reveal that this
quantities are not normal distributed. Our investigation of the distance
depending misclosure variance change is one reason. Other sources were
investigated by Glennie and Lichti (2010): the incidence angle and the vertical
angle. A further possibility is the focal distance, which is different for each
laser and the average is at 8 m for the lower block and at 15 m for the
upper block. This may introduce a distance depending—but nonlinear—
variance change. Further research is needed to find the sources of these
observations.