SlideShare a Scribd company logo
DatasetcreationforDeepLearning-basedGeometric
ComputerVisionproblems
Purpose of this presentation
● This is the more ‘pragmatic set’ accompanying the slideset for
analyzing SfM-Net architecture from Google.
● The main idea in the dataset creation is to have multiple sensor
quality levels in the same rig, in order for us to obtain good quality
reference data (ground truth, gold standard) with terrestrial laser
scanner that can be used for image restoration deep learning
networks
- In order to get more out from the inference of lower quality sensors
as well. Think of Google Tango, iPhone 8 with depth sensing, Kinect,
etc.
● Presentation tries to address the typical problem of finding the
relevant “seed literature” for a new topic helping fresh grad
students, postdocs, software engineers and startup founders.
- Answer to “Do you know if someone has done some work on the
various steps involved in SfM” to identify what wheels do not need to
be re-invented
●● Some of the RGB image enhancement/styling slides are not the most relevant when designing the hardware pipe per se,
but are there highlighting the need for systems engineering approach for the design of the whole pipe rather than just
obtaining the data somewhere and expecting the deep learning software to do all the magic for you.
Deep Learning for Structure-from-Motion (SfM)
https://www.slideshare.net/PetteriTeikariPhD/deconstructing-sfmnet
Future Hardware and dataset creation
Pipeline Dataset creation #1A
The Indoor Multi-sensor Acquisition System (IMAS)
presented in this paper consists of a wheeled platform
equipped with two 2D laser heads, RGB cameras,
thermographic camera, thermohygrometer, and luxmeter.
http://dx.doi.org/10.3390/s16060785
Inspired by the system of Armesto et al., one could have
a custom rig with:
● high-quality laser scanner giving the “gold standard”
for depth,
● accompanied with smart phone quality RGB and depth
sensing,
● accompanied by DSLR gold standard for RGB
● and some mid-level structured light scanner?
Rig config would allow multiframe exposure techniques
to be used easily than a handheld system (see next slide)
We saw previous that the brightness constancy assumption might be
tricky with some materials, and polarization measurement for example
can help distinguishing materials (dielectric materials polarize light,
whereas conductive do not), or if there is some other way of
estimating Bidirectional Reflectance Distribution Function (BRDF)
Multicamera rig calibration by double-sided
thick checkerboard
Marco Marcon; Augusto Sarti; Stefano Tubaro
IET Computer Vision 2017
http://dx.doi.org/10.1049/iet-cvi.2016.0193
Pipeline Dataset creation #2a
: Multiframe Techniques
Note! In deep learning, the term super-resolution refers to “statistical upsampling” whereas in optical imaging super-resolution
typically refers to imaging techniques. Note2! Nothing should stop someone marrying them two though
In practice anyone can play with super-resolution at home by putting a camera on a tripod and then taking multiple shots of the
same static scene, and post-processing them through super-resolution that can improve modulation transfer function (MTF) for
RGB images, improve depth resolution and reduce noise for laser scans and depth sensing e.g. with Kinect.
https://doi.org/10.2312/SPBG/SPBG06/009-015
Cited by 47 articles
(a) One scan. (b) Final super-resolved surface from 100 scans.
“PhotoAcute software processes sets of photographs taken in continuous
mode. It utilizes superresolution algorithms to convert a sequence of images
into a single high-resolution and low-noise picture, that could only be taken with
much better camera.”
Depth looks a lot nicer when reconstructed using
50 consecutive Kinec v1 frames in comparison to
just one frame. [Data from Petteri Teikari[
Kinect multiframe reconstruction with
SiftFu [Xiao et al. (2013)]
https://github.com/jianxiongxiao/ProfXkit
Pipeline Dataset creation #2b
: Multiframe Techniques
Boring to take manually e.g. 100 shots of the same scene involving even 360 rotation of the imaging devices, in practice this would
need to be automated in some way with a step motor driven by Arduino or if no good commercial systems are not available.
Multiframe techniques would allow another level of “nesting” of ground truths for a joint image enhancement block along with the
proposed structure and motion network.
● The reconstructed laser scan / depth image / RGB from 100 images would the target, and the single-frame version the input that
need to be enhanced
Meinhardt et al. (2017)
Diamond et al. (2017)
Pipeline Dataset creation #3
A Pipeline for Generating Ground
Truth Labels for Real RGBD Data
of Cluttered Scenes
Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake
Submitted on 15 Jul 2017, last revised 25 Jul 2017
https://arxiv.org/abs/1707.04796
In this paper we develop a pipeline to rapidly
generate high quality RGBD data with
pixelwise labels and object poses. We use an
RGBD camera to collect video of a scene from
multiple viewpoints and leverage existing
reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D
reconstruction using a human assisted ICP-
fitting of object meshes. By reprojecting the
results of labeling the 3D scene we can
produce labels for each RGBD image of the
scene. This pipeline enabled us to collect over
1,000,000 labeled object instances in just a
few days.
We use this dataset to answer
questions related to how much
training data is required, and of what
quality the data must be, to achieve
high performance from a DNN
architecture.
Overview of the data generation pipeline. (a) Xtion RGBD sensor
mounted on Kuka IIWA arm for raw data collection. (b) RGBD data
processed by ElasticFusion into reconstructed pointcloud. (c) User
annotation tool that allows for easy alignment using 3 clicks. User
clicks are shown as red and blue spheres. The transform mapping the
red spheres to the green spheres is then the user specified guess. (d)
Cropped pointcloud coming from user specified pose estimate is
shown in green. The mesh model shown in grey is then finely aligned
using ICP on the cropped pointcloud and starting from the user
provided guess. (e) All the aligned meshes shown in reconstructed
pointcloud. (f) The aligned meshes are rendered as masks in the RGB
image, producing pixelwise labeled RGBD images for each view.
Increasing the variety of backgrounds in the
training data for single-object scenes also
improved generalization performance for new
backgrounds, with approximately 50 different
backgrounds breaking into above- 50% IoU on
entirely novel scenes. Our recommendation is to
focus on multi-object data collection in a variety
of backgrounds for the most gains in
generalization performance.
We hope that our pipeline lowers the barrier to
entry for using deep learning approaches for
perception in support of robotic manipulation
tasks by reducing the amount of human time
needed to generate vast quantities of labeled
data for your specific environment and set of
objects. It is also our hope that our analysis of
segmentation network performance provides
guidance on the type and quantity of data that
needs to be collected to achieve desired levels of
generalization performance.
Pipeline Dataset creation #4
A Novel Benchmark RGBD Dataset for Dormant Apple
Trees and Its Application to Automatic Pruning
Shayan A. Akbar, Somrita Chattopadhyay, Noha M. Elfiky, Avinash Kak;
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016
https://doi.org/10.1109/CVPRW.2016.50
Extending of the Kinect device functionality and the
corresponding database
Libor Bolecek ; Pavel Němec ; Jan Kufa ; Vaclav Ricny
Radioelektronika (RADIOELEKTRONIKA), 2017
https://doi.org/10.1109/RADIOELEK.2017.7937594
One of the possible research directions is use of infrared version of the investigated scene for improvement of the depth
map. However, the databases of the Kinect data which would contain the corresponding infrared images do not exist.
Therefore, our aim was to create such database. We want to increase the usability of the database by adding stereo
images. Moreover, the same scenes were captured by Kinect v2. It was also investigated the impact of simultaneous use
Kinect v1 and Kinect v2 to improve depth map investigated the scene. The database contains sequences of objects
on turntable and simple scenes containing several objects.
The depth map of the scene
obtained by a) Kinect v1, b)
Kinect v2.
The comparison of the one row
of the depth map obtained by a)
Kinect v1 b) Kinect v2 with true
depth map.
Kinect inrared image after change
of the dynamics of brightness
Pipeline Multiframe Pipe #1
100
Multiframe reconstruction enhancement block
1
2
100
3
...
...
Depth image (e.g. Kinect)
Laser scan (e.g. Velodyne)
RGB Image
Target
Learn to improve image
quality from single
image when the system
is deployed
Reconstruction
could be done
using traditional
algorithms
(e.g. OpenCV)
to start with
need then to
save all
individual
frames when
reconstruction
algorithms
improve, and all
blocks can be
iterated then ad
infinitum
Mix different image qualities and sensor
qualities then in the training set to build
invariance to scan quality
Pipeline Multiframe Pipe #2
You could cascade different levels of quality If you want to make things complex in deeply supervised fashion
LOWEST QUALITY
Just with RGB
HIGHEST QUALITY
Depth map with professional
laser scanner
2
1
3
4
5
6
The following step in the cascade is closer in quality to the previous one, and one could assume that this enhancement
would be easier to learn, and the pipeline would output the enhanced quality as a “side effect” which is good for
visualization purposes.
Pipeline acquisition example with Kinect
https://arxiv.org/abs/1704.07632
KinectFusion (Newcombe et al. 2011), one of the pioneering works, showed that a real-world object as well as an
indoor scene canbe reconstructed in real-time with GPU acceleration. It exploits the iterative closest point (ICP)
algorithm (Besl and McKay 1992) to track 6-DoF poses and the volumetric surface representation scheme with
signed distance functions (Curless and Levoy, 1996) to fuse 3D measurements. A number of following studies (e.g.
Choi et al. 2015) have tackled the limitation of KinectFusion; as the scale ofa scene increases, it is hard to
completely reconstruct thescene due to the drift problem of the ICP algorithm as wellas the large memory
consumption of volumetric integration.
To scale up the KinectFusion algorithm, Whelan et al . (2012)] presented a spatially extended KinectFusion,
named as Kintinuous, by incrementally adding KinectFusion results as the form of triangular meshes.
Whelan et al . (2015) also proposed ElasticFusion to tackle similar problems as well as to overcome the
problem of a pose graph optimization by using the surface loop closure optimization and the surfel-based
representation. Moreover, to decrease the space complexity, ElasticFusion deallocates invisible surfels from
the memory; invisible surfels are allocated in the memory again only if they are likely to be visible in the near
future.
Pipeline Multiframe Pipe into Sfm-Net
Pipeline Multiframe Pipe Quality simulation
Simulated Imagery Rendering Workflow for UAS-
Based Photogrammetric 3D Reconstruction
Accuracy Assessments
Richard K. Slocum and Christopher E. Parrish
Remote Sensing 2017, 9(4), 396; doi :10.3390/rs9040396
“Here, we present a workflow to render computer generated imagery using a virtual environment which can
mimic the independent variables that would be experienced in a real-world UAS imagery acquisition
scenario. The resultant modular workflow utilizes Blender Python API, an open source computer graphics
software, for the generation of photogrammetrically-accurate imagery suitable for SfM processing, with
explicit control of camera interior orientation, exterior orientation, texture of objects in the scene, placement
of objects in the scene, and ground control point (GCP) accuracy.”
Pictorial representation of the simUAS (simulated UAS)
imagery rendering workflow. Note: The SfM-MVS step is
shown as a “black box” to highlight the fact that the procedure
can be implemented using any SfM-MVS software, including
proprietary commercial software.
The imagery from Blender, rendered using a pinhole camera
model, is postprocessed to introduce lens and camera effects.
The magnitudes of the postprocessing effects are set high in this
example to clearly demonstrate the effect of each. The fullsize
image (left) and a close up image (right) are both shown in order to
depict both the large and small scale effects.
A 50 cm wide section of
the point cloud containing
a box (3 m cube) is shown
with the dense
reconstruction point
clouds overlaid to
demonstrate the effect of
point cloud dense
reconstruction quality on
accuracy near sharp
edges.
The points along the side of a
vertical plane on a box were
isolated and the error
perpendicular to the plane of
the box were visualized for
each dense reconstruction
setting, with white regions
indicating no point cloud
data. Notice that the region
with data gaps in the point
cloud from the ultra-high
setting corresponds to the
region of the plane with low
image texture, as shown in
the lower right plot.
Data fusion combining multimodal data
Pipeline data Fusion / Registration #1
“Rough estimates for 3D structure obtained using structure
from motion (SfM) on the uncalibrated images are first co-
registered with the lidar scan and then a precise alignment
between the datasets is estimated by identifying
correspondences between the captured images and
reprojected images for individual cameras from the 3D lidar
point clouds. The precise alignment is used to update both the
camera geometry parameters for the images and the individual
camera radial distortion estimates, thereby providing a 3D-to-
2D transformation that accurately maps the 3D lidar scan
onto the 2D image planes. The 3D to 2D map is then utilized to
estimate a dense depth map for each image. Experimental
results on two datasets that include independently acquired
high-resolution color images and 3D point cloud datasets
indicate the utility of the framework. The proposed approach
offers significant improvements on results obtained with
SfM alone.”
Fusing structure from motion and lidar for
dense accurate depth map estimation
Li Ding ; Gaurav Sharma
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
https://doi.org/10.1109/ICASSP.2017.7952363
https://arxiv.org/abs/1707.03167
“In this paper, we present RegNet, the first deep convolutional neural
network (CNN) to infer a 6 degrees of freedom (DOF) extrinsic
calibration between multimodal sensors, exemplified using a
scanning LiDAR and a monocular camera. Compared to existing
approaches, RegNet casts all three conventional calibration steps
(feature extraction, feature matching and global regression) into a single
real-time capable CNN.”
Development of the mean absolute error (MAE) of the
rotational components over training iteration for different
output representations: Euler angles are represented in red,
quaternions in brown and dual quaternions in blue. Both
quaternion representations outperform the Euler angles
representation.
“Our method yields a mean calibration error of 6 cm for translation and 0.28◦
for rotation with decalibration
magnitudes of up to 1.5 m and 20◦
, which competes with state-of-the-art online and offline methods.”
Pipeline data Fusion / Registration #2
Depth refinement for binocular Kinect RGB-D cameras
Jinghui Bai ; Jingyu Yang ; Xinchen Ye ; Chunping Hou
Visual Communications and Image Processing (VCIP), 2016
https://doi.org/10.1109/VCIP.2016.7805545
Pipeline data Fusion / Registration #3
Used Kinects
inexpensive
~£29.95 (eBay)
Use multiple Kinects at once
for better occlusion handling
Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar
IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 )
https://doi.org/10.1109/JSEN.2014.2309987
Characterization of Different Microsoft Kinect Sensor Models
IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015)
https://doi.org/10.1109/JSEN.2015.2422611
An ANOVA analysis was performed to determine if the model of the Kinect, the operating temperature,
or their interaction were significant factors in the Kinect's ability to determine the distance to the target.
Different sized gauge blocks were also used to test how well a Kinect could reconstruct precise objects.
Machinist blocks were used to examine how well the Kinect could reconstruct objects setup on an angle
and determine the location of the center of a hole. All the Kinect models were able to determine the
location of a target with a low standard deviation (<;2 mm). At close distances, the resolutions of all the
Kinect models were 1 mm. Through the ANOVA analysis, the best performing Kinect at close distances
was the Kinect model 1414, and at farther distances was the Kinect model 1473. The internal
temperature of the Kinect sensor had an effect on the distance reported by the sensor. Using different
correction factors, the Kinect was able to determine the volume of a gauge block and the angles machinist
blocks were setup at, with under a 10% error.
Pipeline data Fusion / Registration #4
A Generic Approach for Error Estimation of Depth
Data from (Stereo and RGB-D) 3D Sensors
Luis Fernandez, Viviana Avila and Luiz Gonçalves
Preprints | Posted: 23 May 2017 |
http://dx.doi.org/10.20944/preprints201705.0170.v1
“We propose an approach for estimating
the error in depth data provided by
generic 3D sensors, which are modern
devices capable of generating an image
(RGB data) and a depth map (distance)
or other similar 2.5D structure (e.g.
stereo disparity) of the scene.
We come up with a multi-platform
system and its verification and
evaluation has been done, using the
development kit of the board NVIDIA
Jetson TK1 with the MS Kinects v1/v2
and the Stereolabs ZED camera. So the
main contribution is the error
determination procedure that does
not need any data set or benchmark,
thus relying only on data acquired on-
the-fly. With a simple checkerboard, our
approach is able to determine the error
for any device”
In the article of Yang [16], an MS Kinect v2 structure is proposed to improve the accuracy of the
sensors and the depth of capture of objects that are placed more than four meters apart. It has
been concluded that an object covered with light-absorbing materials, may cause less reflected IR
light back to the MS Kinect and therefore erroneous depth data. Other factors, such as power
consumption, complex wiring and high requirement for laptop computer also limits the use of the
sensor.
The characteristics of MS Kinect stochastic errors are presented for each direction of the axis in the work by Choo [17].
The depth error is measured using a 3D chessboard, similarly the one used in our approach. The results show that, for all
three axes, the error should be considered independently. In the work of Song [18] it is proposed an approach to
generate a per-pixel confidence measure for each depth map captured by MS Kinect in indoor scenes through
supervised learning and the use of artificial intelligence
Detection (a) and ordering (b) of corners in the three planes
of the pattern.
It would make sense to combine versions 1 and 2 for the same rig as Kinect v1 is
more accurate for close distances, and Kinect v2 more accurate for far distances
Pipeline data Fusion / Registration #5
Precise 3D/2D calibration between a RGB-D
sensor and a C-arm fluoroscope
International Journal of Computer Assisted Radiology and Surgery
August 2016, Volume 11, Issue 8, pp 1385–1395
https://doi.org/10.1007/s11548-015-1347-2
“A RMS reprojection error of 0.5 mm is achieved using
our calibration method which is promising for surgical
applications. Our calibration method is more accurate
when compared to Tsai’s method. Lastly, the simulation
result shows that using a projection matrix has a lower
error than using intrinsic and extrinsic parameters in
the rotation estimation.”
While the color camera has a relative high resolution (1920 px ×
1080 px for Kinect 2.0), the depth camera is mid-resolution (512 px
× 424 px for Kinect 2.0) and highly noisy. Furthermore, RGB-D
sensors have a minimal distance to the scene from which they can
estimate the depth. For instance, the minimum optimal distance of
Kinect 2.0 is 50 cm.
On the other hand, C-arm fluoroscopes have a short focus, which is
typically 40 cm, and a much narrower field of view than the RGB-D
sensor with also a mid-resolution image (ours is 640 px × 480 px).
All these factors lead to a high disparity in the field of view
between the C-arm and the RGB-D sensor if the two were to be
integrated in a single system. This means that the calibration
process is crucial. We need to achieve high accuracy for the
localization of 3D points using RGB-D sensors, and we require a
calibration phantom which can be clearly imaged by both devices.
Workflow of the calibration process between the RGB-D sensor
and a C-arm. The input data include a sequence of infrared,
depth, and color images from the RGB-D sensor and X-ray images
from the C-arm. The output of the calibration pipeline is the
projection matrix, which is calculated by the 3D/2D
correspondences detected from the input data
Pipeline data Fusion / Registration #6
Fusing Depth and Silhouette for Scanning Transparent
Object with RGB-D Sensor
Yijun Ji, Qing Xia, and Zhijiang Zhang
System overview; TSDF: truncated signed distance function; SFS: shape from silhouette.
Results on noise region. (a) Color images captured by
stationary camera with a rotating platform. (b) The noisy voxels
detected by multiple depth images are in red. (c) and (d) show
the experimental results done by a moving Kinect; the
background is changing in these two cases.
Pipeline data Fusion / Registration #7
Intensity Video Guided 4D Fusion for Improved
Highly Dynamic 3D Reconstruction
Jie Zhang, Christos Maniatis, Luis Horna, Robert B. Fisher
(Submitted on 6 Aug 2017)
https://arxiv.org/abs/1708.01946
Temporal tracking of intensity image points (of moving and
deforming objects) allows registration of the corresponding 3D
data points, whose 3D noise and fluctuations are then reduced
by spatio-temporal multi-frame 4D fusion. The results
demonstrate that the proposed algorithm is effective at
reducing 3D noise and is robust against intensity noise. It
outperforms existing algorithms with good scalability on both
stationary and dynamic objects.
The system framework (using 3 consecutive
frames as an example)
Static Plane (first row): (a) mean roughness;
(b) std of roughness vs. number of frames
fused. Falling ball (second row): (c) mean
roughness; (d) std of roughness vs.
number of frames fused
Texture-related
3D noise on a static
plane: (a) 3D frame;
(b) 3D frame with
textures. The 3D
noise is closely
related to the
textures in the
intensity image.
Illustration of 3D noise
reduction on the ball.
Spatial-temporal divisive
normalized bilateral filter (DNBF)
Pipeline data Fusion / Registration #8
Utilization of a Terrestrial Laser Scanner for
the Calibration of Mobile Mapping Systems
Seunghwan Hong, Ilsuk Park, Jisang Lee, Kwangyong Lim, Yoonjo
Choi and Hong-Gyoo Sohn
Sensors 2017, 17(3), 474; doi :10.3390/s17030474
Configuration of mobile mapping system: network video cameras (F:
front, L: left, R: right), mobile laser scanner, and Global Navigation
Satellite System (GNSS)/Inertial Navigation System (INS).
To integrate the datasets captured by each sensor mounted on the
Mobile Mapping System (MMS) into the unified single coordinate
system, the calibration, which is the process to estimate the orientation
(boresight) and position (lever-arm) parameters, is required with the
reference datasets [Schwarz and El-Sheimy 2004, Habib et al. 2010,
Chan et al. 2010].
When the boresight and lever-arm parameters defining the geometric relationship
between each sensing data and GNSS/INS data are determined, georeferenced data
can be generated. However, even after precise calibration, the boresight and lever-
arm parameters of an MMS can be shaken and the errors that deteriorate the
accuracy of the georeferenced data might accumulate. Accordingly, for the stable
operation of multiple sensors, precise calibration must be conducted periodically.
(a) Sphere target used for registration
of terrestrial laser scanning data; (b)
sphere target detected in a point cloud
(the green sphere is a fitted sphere
model).
Network video camera: AXIS F1005-E
GNSS/INS unit: OxTS Survey+
Terrestrial laser scanner (TLS): Faro Focus 3D
Mobile laser scanner: Velodyne HDL 32-E
Pipeline data Fusion / Registration #9
Dense Semantic Labeling of Very-High-Resolution Aerial
Imagery and LiDAR with Fully-Convolutional Neural
Networks and Higher-Order CRFs
Yansong Liu, Sankaranarayanan Piramanayagam, Sildomar T. Monteiro, Eli Saber
http://openaccess.thecvf.com/content_cvpr_2017_workshops/w18/papers/Liu_Dense_Semantic_Labeling_CVPR_2017_paper.pdf
Our proposed decision-level fusion scheme: training one fully-convolutional
neural network on the color-infrared image (CIR) and one logistic regression using
hand-crafted features. Two probabilistic results: PFCN
and PLR
are then combined in a
higher-order CRF framework
Main original contributions of our work are: 1) the use of energy based CRFs for efficient decision-
level multisensor data fusion for the task of dense semantic labeling. 2) the use of higher-order CRFs
for generating labeling outputs with accurate object boundaries. 3) the proposed fusion scheme has
a simpler architecture than training two separate neural networks, yet it still yields the state-of-the-
art dense semantic labeling results.
Guiding multimodal registration with learned
optimization updates
Gutierrez-Becker B, Mateus D, Peter L, Navab N
Medical Image Analysis Volume 41, October 2017, Pages 2-17
https://doi.org/10.1016/j.media.2017.05.002
Training stage (left): A set of aligned multimodal images is used to generate a training set of images with known
transformations. From this training set we train an ensemble of trees mapping the joint appearance of the images to
displacement vectors. Testing stage (right): We register a pair of multimodal images by predicting with our trained
ensemble the required displacements δ for alignment at different locations z. The predicted displacements are then
used to devise the updates of the transformation parameters to be applied to the moving image. The procedure is
repeated until convergence is achieved.
Corresponding CT (left) and MR-T1
(middle) images of the brain obtained
from the RIRE dataset. The
highlighted regions are corresponding
areas between both images (right).
Some multimodal similarity metrics
rely on structural similarities between
images obtained using different
modalities, like the ones inside the
blue boxes. However in many cases
structures which are clearly visible in
one imaging modality correspond to
regions with homogeneous voxel
values in the other modality (red and
green boxes).
Future Image restoration Natural Images (RGB)
PipelineRGB image Restoration #1
https://arxiv.org/abs/1704.02738
Our method
includes a sub-
pixel motion
compensation
(SPMC) layer that
can better handle
inter-frame motion
for this task. Our
detail fusion (DF)
network that can
effectively fuse
image details from
multiple images
after SPMC
alignment
“Hardware Super-resolution” of course all via deep learning too
https://petapixel.com/2015/02/21/a-pract
ical-guide-to-creating-superresolution-p
hotos-with-photoshop/
PipelineRGB image Restoration #2A
“Data-driven Super-resolution” what super-resolution typically means in the deep learning space
Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution”
External Prior Guided Internal Prior
Learning for Real Noisy Image Denoising
Jun Xu, Lei Zhang, David Zhang
(Submitted on 12 May 2017)
https://arxiv.org/abs/1705.04505
Denoised images of a region cropped from the real noisy
image from DSLR “Nikon D800 ISO 3200 A3”,
Nam et al. 2016 (+video) by different methods. The
scene was shot 500 times with the same camera and
camera setting. The mean image of the 500 shots is
roughly taken as the “ground truth”, with which the
PSNR index can be computed. The images are better
viewed by zooming in on screen
Benchmarking Denoising Algorithms
with Real Photographs
Tobias Plötz, Stefan Roth
(Submitted on 5 Jul 2017)
https://arxiv.org/abs/1707.01313
“We then capture a novel benchmark dataset, the
Darmstadt Noise Dataset (DND), with consumer
cameras of differing sensor sizes. One interesting
finding is that various recent techniques that
perform well on synthetic noise are clearly
outperformed by BM3D on photographs with real
noise. Our benchmark delineates realistic evaluation
scenarios that deviate strongly from those
commonly used in the scientific literature.”
Image formation process underlying the observed low-ISO image xr
and
high-ISO image xn
. They are generated from latent noise-free images yr
and yn
, respectively, which in turn are related by a linear scaling of image
intensities (LS), a small camera translation (T), and a residual low-
frequency pattern (LF). To obtain the denoising ground truth yp
, we apply
post-processing to xr
aiming at undoing these undesirable transformations.
Mean PSNR (in dB) of the denoising methods tested on our DND benchmark. We
apply denoising either on linear raw intensities, after a variance stabilizing transformation
(VST, Anscombe), or after conversion to the sRGB space. Likewise, we evaluate the result
either in linear raw space or in sRGB space. The noisy images have a PSNR of 39.39 dB
(linear raw) and 29.98 dB (sRGB).
Difference between blue channels of low- and high-ISO images from Fig. 1 after various post-
processing stages. Images are smoothed for display to highlight structured residuals,
attenuating the noise.
PipelineRGB image Restoration #2b
“Data-driven Super-resolution” what super-resolution typically means in the deep learning space
MemNet: A Persistent Memory
Network for Image Restoration
Ying Tai, Jian Yang, Xiaoming Liu, Chunyan Xu
(Submitted on 7 Aug 2017)
https://arxiv.org/abs/1708.02209
https://github.com/tyshiwo/MemNet.
Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution”
The same MemNet structure achieves the state-of-the-art performance in image denoising, super-resolution and
JPEG deblocking. Due to the strong learning ability, our MemNet can be trained to handle different levels of
corruption even using a single model.
Training Setting: Following the method of Mao et al. (2016), for
image denoising, the grayscale image is used; while for SISR and
JPEG deblocking, the luminance component is fed into the
model.
Deep Generative Adversarial
Compression Artifact Removal
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini,
Alberto Del Bimbo
(Submitted on 8 Apr 2017)
https://arxiv.org/abs/1704.02518
In this work we address the problem of artifact removal using convolutional neural networks. The proposed
approach can be used as a post-processing technique applied to decompressed images, and thus can be
applied to different compression algorithms (typically applied in YCrCb color space) such as JPEG, intra-frame
coding of H.264/AVC and H.265/HEVC. Compared to super resolution techniques, working on compressed
images instead of down-sampled ones, is more practical, since it does not require to change the compression
pipeline, that is typically hardware based, to subsample the image before its coding; moreover, camera
resolutions have increased during the latest years, a trend that we can expect to continue.
PipelineRGB image Restoration #3
An attempt to improve smartphone camera quality with DSLR high quality image as the ‘gold standard’ with deep learning
https://arxiv.org/abs/1704.02470
Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool
Computer Vision Laboratory, ETH Zurich, Switzerland
“Quality transfer”
Future Image Enhancement
Pipelineimage enhancement #1
Aesthetics enhancement: “AI-driven Interior Design”
“Re-colorization” of scanned indoor scenes or intrinsic decomposition based editing
Limitations. We have to manually correct
inaccurate segmentation, though seldom
encountered. This is a limitation of our method.
However, segmentation errors are seldom
encountered during experiments. Since our
method is object-based, our segmentation method
does not consider the color patterns among similar
components of an image object.
Currently, our system is not capable of segmenting
the mesh according to the colored components
with similar geometry for this kind of objects. This
is another limitation of our method.
An intrinsic image decomposition method could
be helpful to our image database, for extracting
lighting-free textures to be further used in
rendering colorized scenes. However, such
methods are not so robust that can be directly
applied to various images in a large image
database. On the other hand, intrinsic image
decomposition is not essential to achieve good
results in our experiments. So we did not
incorporate it in our work, but we will further study
it to improve our database.
Pipelineimage enhancement #2
“Auto-adjust” RGB texture maps for indoor scans with user interaction
We use the CIELab color space for both the input and
output images. We can use 3-channel Lab color as the
color features. However, it generates color variations in
smooth regions since each color is processed
independently. To alleviate this issue, we add the local
neighborhood information by concatenating the Lab color
and the L2 normalized first-layer convolutional feature
maps of ResNet-50.
Although the proposed method provides the users with
automatically adjusted photos, some users may want their photos
to be retouched by their own preference. In the first row of Fig. 2 for
example, a user may want only the color of the people to be changed.
For such situations, we provide a way for the users to give their own
adjustment maps to the system. Figure 4 shows some examples of
the personalization. When the input image is forwarded, we
substitute the extracted semantic adjustment map with the new
adjustment map from the user. As shown in the figure, the proposed
method effectively creates the personalized images adjusted by
user’s own style.
Deep Semantics-Aware Photo Adjustment
Seonghyeon Nam, Seon Joo Kim (Submitted on 26 Jun 2017) https://arxiv.org/abs/1706.08260
Pipelineimage enhancement #3
Aesthetic-Driven Image Enhancement
by Adversarial Learning
Yubin Deng, Chen Change Loy, Xiaoou Tang
(Submitted on 17 Jul 2017)
https://arxiv.org/abs/1707.05251
GAN
GAN
GAN
Pro
Pro
Pro
Examples of image enhancement given original input (a).
The architecture of our
proposed EnhanceGAN
framework. ResNet module is
the feature extractor (for image in
CIELab color space); in this work,
we use the ResNet-101 and
removed the last average pooling
layer and the final fc layer. The
switch icons in the discriminator
network represent zero-masking
during stage-wise training
“Auto-adjust” RGB texture maps for indoor scans with GANs
Pipelineimage enhancement #4
“Auto-adjust” RGB texture maps for indoor scans with GANs for “auto-matting”
Creatism: A deep-learning photographer
capable of creating professional work
Hui Fang, Meng Zhang (Submitted on 11 Jul 2017)
https://arxiv.org/abs/1707.03491
https://google.github.io/creatism/
Datasets were created that contain ratings of
photographs based on aesthetic quality
[Murray et al., 2012] [Kong et al., 2016] [Lu et al., 2015].
Using our system, we mimic the workflow of a
landscape photographer, from framing for the best
composition to carrying out various post-processing
operations. The environment for our virtual
photographer is simulated by a collection of panorama
images from Google Street View. We design a "Turing-
test"-like experiment to objectively measure quality of
its creations, where professional photographers rate a
mixture of photographs from different sources blindly.
We work with professional photographers to empirically de- fine 4 levels of aesthetic quality:
● 1: point-and-shoot photos without consideration.
● 2: Good photos from the majority of population without art background. Nothing artistic stands out.
● 3: Semi-pro. Great photos showing clear artistic aspects. The photographer is on the right track of
becoming a professional.
● 4: Pro-work. Clearly each professional has his/her unique taste that needs calibration.
We use AVA dataset [Murray et al., 2012] wto bootstrap a consensus among them.
Assume there exists a universal aesthetics metric, Φ. By
definition, needs to incorporate all aesthetic aspects, suchΦ
as saturation, detail level, composition... To define withΦ
examples, number of images needs to grow exponentially
to cover more aspects [Jaroensri et al., 2015]. To make things
worse, unlike traditional problems such as object recognition,
what we need are not only natural images, but also pro-
level photos, which are much less in quantity.
Pipelineimage enhancement #5
“Auto-adjust” images based on different user groups (or personalizing for different markets for indoor scan products)
Multimodal Prediction and Personalization of
Photo Edits with Deep Generative Models
Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi,
Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams
CSAIL, MI; Adobe Research; Media Lab, MIT; Harvard and Google Brain
(Submitted on 17 Apr 2017) https://arxiv.org/abs/1704.04997
The main goals of our proposed models: (a) Multimodal photo
edits: For a given photo, there may be multiple valid aesthetic
choices that are quite different from one another. (b) User
categorization: A synthetic example where different user clusters
tend to prefer different slider values. Group 1 users prefer to
increase the exposure and temperature for the baby images;
group 2 users reduce clarity and saturation for similar images.
Predictive log-likelihood for users in the test set of different datasets. For each user in the test set, we compute the predictive log-likelihood of 20 images, given 0 to
30 images and their corresponding sliders from the same user. 30 sample trajectories and the overall average ± s.e. is shown for casual, frequent and expert users.
The figure shows that knowing more about the user (up to around 10 images) can increase the predictive log-likelihood. The log-likelihood is normalized by
subtracting off the predictive log-likelihood computed given zero images. Note the different y-axis in the plots. The rightmost plot is provided for comparing the
average predictive log-likelihood across datasets.
Pipelineimage enhancement #6
Combining semantic segmentation for higher quality “Instagram filters”
Exemplar-Based Image and Video Stylization
Using Fully Convolutional Semantic Features
Feida Zhu ; Zhicheng Yan ; Jiajun Bu ; Yizhou Yu
IEEE Transactions on Image Processing ( Volume: 26, Issue: 7, July 2017 )
https://doi.org/10.1109/TIP.2017.2703099
Color and tone stylization in images and videos strives to enhance unique themes with artistic color
and tone adjustments. It has a broad range of applications from professional image postp-rocessing
to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe
Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted
through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of
image contents and the resulting global color transform limits the range of artistic styles it can
represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various
semantic regions. Such an ability enables a broader range of visual styles.
Traditional professional video editing softwares (Adobe After Effects, Nuke, etc.) offer a suite of
predefined operations with tunable parameters that apply common global adjustments
(exposure/color correction, white balancing, sharpening, denoising, etc). Local adjustments within
specific spatiotemporal regions are usually accomplished with masking layers created with intensive
user interaction. Both parameter tuning and masking layer creation are labor intensive processes.
An example of learning semantics-aware photo adjustment styles. Left: Input image. Middle: Manually enhanced by
photographer. Distinct adjustments are applied to different semantic regions. Right: Automatically enhanced by our
deep learning model trained from image exemplars. (a) Input image. (b) Ground truth. (c) Our result.
Given a set of exemplar image pairs, each representing a photo before and
after pixel-level color (in CIELab space) and tone adjustments following a
particular style, we wish to learn a computational model that can automatically
adjust a novel input photo in the same style. We still cast this learning task as
a regression problem as in Yan et al. (2016). For completeness, let us first
review their problem definition and then present our new deep learning based
architecture and solution.
Pipelineimage enhancement #7A
Combining semantic segmentation for higher quality “Instagram filters”
Deep Bilateral Learning for Real-Time Image Enhancement
Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, Frédo Durand MIT CSAIL, Google Research, MIT CSAIL / Inria, Université Côte d’Azur
(Submitted on 10 Jul 2017)
https://arxiv.org/abs/1707.02880 | https://github.com/mgharbi/hdrnet | https://groups.csail.mit.edu/graphics/hdrnet/
https://youtu.be/GAe0qKKQY_I
Our novel neural network architecture can reproduce sophisticated image enhancements with inference running in real
time at full HD resolution on mobile devices. It can not only be used to dramatically accelerate reference
implementations, but can also learn subjective effects from human retouching (“copycat” filter).
By performing most of its computation within a bilateral grid and by predicting local affine color transforms, our model
is able to strike the right balance between expressivity and speed. To build this model we have introduced two new
layers: a data-dependent lookup that enables slicing into the bilateral grid, and a multiplicative operation for affine
transformation. By training in an end-to-end fashion and optimizing our loss function at full resolution (despite most of
our network being at a heavily reduced resolution), our model is capable of learning full-resolution and non-scale-
invariant effects.
Pipelineimage enhancement #8
Blind Image Quality assessment e.g. for quantifying RGB scan quality real-time
RankIQA: Learning from Rankings for No-
reference Image Quality Assessment
Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov
(Submitted on 26 Jul 2017)
https://arxiv.org/abs/1707.08347
The classical approach trains a
deep CNN regressor directly on
the ground-truth. Our approach
trains a network from an image
ranking dataset. These ranked
images can be easily generated
by applying distortions of varying
intensities. The network
parameters are then transferred
to the regression network for
finetuning. This allows for the
training of deeper and wider
networks.
Siamese network output for JPEG distortion considering 6 levels. This graphs illustrate
the fact that the Siamese network successfully manages to separate the different
distortion levels.
Blind Deep S3D Image Quality Evaluation via Local to
Global Feature Aggregation
Heeseok Oh ; Sewoong Ahn ; Jongyoo Kim ; Sanghoon Lee
IEEE Transactions on Image Processing ( Volume: 26, Issue: 10, Oct. 2017 )
https://doi.org/10.1109/TIP.2017.2725584
Future Image Styling
Pipelineimage Styling #1
Aesthetics enhancement: High Dynamic Range from SfM
Large scale structure-from-motion (SfM) algorithms have recently
enabled the reconstruction of highly detailed 3-D models of our surroundings
simply by taking photographs. In this paper, we propose to leverage these
reconstruction techniques to automatically estimate the outdoor
illumination conditions for each image in a SfM photo collection. We
introduce a novel dataset of outdoor photo collections, where the ground
truth lighting conditions are known at each image. We also present an
inverse rendering approach that recovers a high dynamic range
estimate of the lighting conditions for each low dynamic range input image.
Our novel database is used to quantitatively evaluate the performance of our
algorithm. Results show that physically plausible lighting estimates can
faithfully be recovered, both in terms of light direction and intensity.
Lighting Estimation in Outdoor Image Collections
Jean-Francois Lalonde (Laval University); Iain Matthews (Disney Research)
3D Vision (3DV), 2014 2nd International Conference on
https://www.disneyresearch.com/publication/lighting-estimation-in-outdoor-image-collections/
https://doi.org/10.1109/3DV.2014.112
The main limitation of our approach is that it can recover precise lighting parameters only when lighting actually creates strongly visible
effects—such as cast shadows, shading differences amongst surfaces of different orientations—on the image. When the camera does not
observe significant lighting variations, for example when the sun is shining on a part of the building that the camera does not observe, or when the
camera only see a very small fraction of the landmark with little geometric details, our approach recovers a coarse estimate of the full lighting
conditions. In addition, our approach is sensitive to errors in geometry estimation, or to the presence of unobserved, nearby objects.
Because it does not know about these objects, our method tries to explain their cast shadows with the available geometry, which may result in
errors. Our approach is also sensitive to inter-reflections. Incorporating more sophisticated image formation models such as radiosity could
help alleviating this problem, at the expense of significantly more computation. Finally, our approach relies on knowledge of the camera
exposure and white balance settings, which might be less applicable to the case of images downloaded on the Internet. We plan to explore
these issues in future work.
Exploring material recognition for estimating
reflectance and illumination from a single image
Michael Weinmann; Reinhard Klein
MAM '16 Proceedings of the Eurographics 2016 Workshop on Material Appearance Modeling
https://doi.org/10.2312/mam.20161253
We demonstrate that reflectance and illumination can be estimated
reliably for several materials that are beyond simple Lambertian
surface reflectance behavior because of exhibiting mesoscopic
effects such as interreflections and shadows.
Shading Annotations in the Wild
Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala
(Submitted on 2 May 2017)
https://arxiv.org/abs/1705.01156
http://opensurfaces.cs.cornell.edu/saw/
We use this data to train a
convolutional neural network
to predict per-pixel shading
information in an image. We
demonstrate the value of our
data and network in an
application to intrinsic
images, where we can reduce
decomposition artifacts
produced by existing
algorithms.
Pipelineimage Styling #2A
Aesthetics enhancement: High Dynamic Range #1
Learning High Dynamic Range from
Outdoor Panoramas
Jinsong Zhang, Jean-François Lalonde
(Submitted on 29 Mar 2017 (v1), last revised 8 Aug 2017 (this version, v2))
https://arxiv.org/abs/1703.10200
http://www.jflalonde.ca/projects/learningHDR
Qualitative results on the synthetic dataset.
Top row: the ground truth HDR panorama, middle row: the LDR panorama, and
bottom row: the predicted HDR panorama obtained with our method.
To illustrate dynamic range, each panorama is shown at two exposures, with a factor of 16
between the two. For each example, we show the panorama itself (left column), and the
rendering of a 3D object lit with the panorama (right column). The object is a “spiky
sphere” on a ground plane, seen from above. Our method accurately predicts the extremely
high dynamic range of outdoor lighting in a wide variety of lighting conditions. A tonemapping
of γ = 2.2 is used for display purposes.
Real cameras have non-linear response functions. To simulate this, we randomly sample real camera
response functions from the Database of Response Functions (DoRF) [Grossberg and Nayar, 2003],
and apply them to the linear synthetic data before training.
Examples from our real dataset. For each case, we show the LDR panorama
captured by the Ricoh Theta S camera, a consumer grade point-and-shoot 360º
camera (left), and the corresponding HDR panorama captured by the Canon 5D
Mark III DSLR mounted on a tripod, equipped with a Sigma 8mm fisheye lens
(right, shown at a different exposure to illustrate the high dynamic range).
We present a full end-to-end learning approach to estimate the extremely high
dynamic range of outdoor lighting from a single, LDR 360º panorama. Our main
insight is to exploit a large dataset of synthetic data composed of a realistic virtual
city model, lit with real world HDR sky light probes [Lalonde et al. 2016
http://www.hdrdb.com/] to train a deep convolutional autoencoder
Pipelineimage Styling #2b
High Dynamic Range #2: Learn illumination for relighting purposes
Learning to Predict Indoor Illumination from a Single Image
Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde
(Submitted on 1 Apr 2017 (v1), last revised 25 May 2017 (this version, v2))
https://arxiv.org/abs/1704.00090
Pipelineimage Styling #3a
Improving photocompositing and relighting of RGB textures
Deep Image Harmonization
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu,
Ming-Hsuan Yang
(Submitted on 28 Feb 2017)
https://arxiv.org/abs/1703.00069
Our method can adjust the appearances of the composite
foreground to make it compatible with the background
region. Given a composite image, we show the harmonized
images generated by Xue et al. (2012), Zhu et al. (2015) and
our deep harmonization network.
The overview of the proposed joint network architecture. Given a composite image and a provided foreground mask, we first pass the input through an encoder
for learning feature representations. The encoder is then connected to two decoders, including a harmonization decoder for reconstructing the harmonized output
and a scene parsing decoder to predict pixel-wise semantic labels. In order to use the learned semantics and improve harmonization results, we concatenate the
feature maps from the scene parsing decoder to the harmonization decoder (denoted as dot-orange lines). In addition, we add skip links (denoted as blue-dot
lines) between the encoder and decoders for retaining image details and textures. Note that, to keep the figure clean, we only depict the links for the harmonization
decoder, while the scene parsing decoder has the same skip links connected to the encoder.
Given an input image (a), our network
can adjust the foreground region
according to the provided mask (b)
and produce the output (c). In this
example, we invert the mask from the
one in the first row to the one in the
second row, and generate
harmonization results that account for
different context and semantic
information.
Pipelineimage Styling #3b
Sky is not the limit: semantic-aware sky replacement
YH Tsai, X Shen, Z Lin, K Sunkavalli; Ming-Hsuan Yang
ACM Transactions on Graphics (TOG) - Volume 35 Issue 4, July 2016
https://doi.org/10.1145/2897824.2925942
In order to find proper skies for replacement, we propose a data-driven sky search
scheme based on semantic layout of the input image. Finally, to re-compose the
stylized sky with the original foreground naturally, an appearance transfer method is
developed to match statistics locally and semantically.
Sample sky segmentation results. Given an input image, the FCN generates results that localize the sky well but
contain inaccurate boundaries and noisy segments. The proposed online model refines segmentations that are
complete and accurate, especially around the boundaries (best viewedin color with enlarged images).
Overview of the proposed algorithm. Given an input image, we first utilize the FCN to obtain scene parsing results
and semantic response for each category. A coarse-to-fine strategy is adopted to segment sky regions (illustrated
as the red mask). To find reference images for sky replacement, we develop a method to search images with
similar semantic layout. After re-composing images with the found skies, we transfer visual semantics to match
foreground statistics between the input image and the reference image. Finally, a set of composite images with
different stylized skies are generated automatically.
GP-GAN: Towards Realistic High-Resolution Image Blending
Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
(Submitted on 21 Mar 2017 (v1), last revised 25 Mar 2017 (this version, v2))
https://arxiv.org/abs/1703.07195
Qualitative illustration of high-resolution
image blending. a) shows the composited
copy-and-paste image where the inserted
object is circled out by red lines. Users usually
expect image blending algorithms to make this
image more natural. b) represents the result
based on Modified Poisson image editing [32]. c)
indicates the result from Multi-splines approach.
d) is the result of our method Gaussian-Poisson
GAN (GP-GAN). Our approach produces better
quality images than that from the alternatives in
terms of illumination, spatial, and color
consistencies.
We advanced the state-of-the-art in conditional image generation by combining the ideas from the generative
model GAN, Laplacian Pyramid, and Gauss-Poisson Equation. This combination is the first time a generative
model could produce realistic images in arbitrary resolution. In spite of the effectiveness, our algorithm fails to
generate realistic images when the composited images are far away from the distribution of the training
dataset. We aim to address this issue in future work.
Improving photocompositing and relighting of RGB textures
Pipelineimage Styling #3c
Live User-Guided Intrinsic Video for Static Scenes
Abhimitra Meka ; Gereon Fox ; Michael Zollhofer ; Christian Richardt ; Christian Theobalt
IEEE Transactions on Visualization and Computer Graphics ( Volume: PP, Issue: 99 )
https://doi.org/10.1109/TVCG.2017.2734425
Improving photocompositing and relighting of RGB textures
User constraints, in the form of constant shading and reflectance strokes,
can be placed directly on the real-world geometry using an intuitive touch-
based interaction metaphor, or using interactive mouse strokes. Fusing the
decomposition results and constraints in three-dimensional space allows for
robust propagation of this information to novel views by re-projection.
We propose a novel approach for live, user-guided intrinsic video decomposition. We
first obtain a dense volumetric reconstruction of the scene using a commodity RGB-D
sensor. The reconstruction is leveraged to store reflectance estimates and user-provided
constraints in 3D space to inform the ill-posed intrinsic video decomposition problem. Our
approach runs at real-time frame rates, and we apply it to applications such as relighting,
recoloring and material editing.
Our novel user-guided intrinsic video approach enables real-time applications such
as recoloring, relighting and material editing.
Constant reflectance strokes improve the decomposition by moving the high-frequency shading of the cloth to the shading layer.
Comparison to state-of-the-art intrinsic video decomposition techniques on the ‘girl’ dataset. Our approach matches the real-time
performance of Meka et al. (2016), while achieving the same quality as previous off-line techniques
Pipelineimage Styling #4
Beyond low-level style transfer for high-level manipulation
Generative Semantic Manipulation
with Contrasting GAN
Xiaodan Liang, Hao Zhang, Eric P. Xing
(Submitted on 1 Aug 2017)
https://arxiv.org/abs/1708.00315
Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired
image-to-image translation, such as photo sketch and artist painting style transfer. However, existing models→
can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-
level semantic meanings (e.g., geometric structure or content) of objects.
Some example semantic manipulation results by our model, which takes one image
and a desired object category (e.g. cat, dog) as inputs and then learns to
automatically change the object semantics by modifying their appearance or
geometric structure. We show the original image (left) and manipulated result (right)
in each pair.
Although our method can achieve compelling results in many semantic manipulation tasks, it shows
little success for some cases which require very large geometric changes, such as car truck and↔
car bus. Integrating spatial transformation layers for explicitly learning pixel-wise offsets may help↔
resolve very large geometric changes. To be more general, our model can be extended to replace the
mask annotations with the predicted object masks or automatically learned attentive regions via
attention modeling. This paper pushes forward the research of unsupervised setting by
demonstrating the possibility of manipulating high-level object semantics rather than the low-level
color and texture changes as previous works did. In addition, it would be more interesting to develop
techniques that are able to manipulate object interactions and activities in images/videos for the
future work.
Pipelineimage Styling #5A
Aesthetics enhancement: Style Transfer | Introduction #1
Neural Style Transfer: A Review
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song
(Submitted on 11 May 2017)
https://arxiv.org/abs/1705.04058
A list of mentioned papers in this review, corresponding codes and pre-trained models are
publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers
One of the reasons why Neural Style Transfer catches eyes in both academia and industry is its
popularity in some social networking sites (e.g., Twitter and Facebook). A mobile application
Prisma [36] is one of the first industrial applications that provides the Neural Style Transfer
algorithm as a service. Before Prisma, the general public almost never imagines that one day
they are able to turn their photos into art paintings in only a few minutes. Due to its high
quality, Prisma achieved great success and is becoming popular around the world.
Another use of Neural Style Transfer is to act as user-assisted creation tools.
Although, to the best of our knowledge, there are no popular applications that
applied the Neural Style Transfer technique in creation tools, we believe that it will
be a promising potential usage in the future. Neural Style Transfer is capable of
acting as a creation tool for painters and designers. Neural Style Transfer makes
it more convenient for a painter to create an artifact of a specific style, especially
when creating computer-made fine art images. Moreover, with Neural Style Transfer
algorithms it is trivial to produce stylized fashion elements for fashion designers and
stylized CAD drawings for architects in a variety of styles, which is costly to
produce them by hand.
Pipelineimage Styling #5b
Aesthetics enhancement: Style Transfer | Introduction #2
Neural Style Transfer: A Review
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song
(Submitted on 11 May 2017)
https://arxiv.org/abs/1705.04058
A list of mentioned papers in this review, corresponding codes and pre-trained models are
publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers
Promising directions for future research in Neural Style Transfer mainly focus on two
aspects. The first aspect is to solve the existing aforementioned challenges for current
algorithms, i.e., problem of parameter tuning, problem of stroke orientation control and
problem existing in “Fast” and “Faster” Neural Style Transfer algorithms. The second aspect of
promising directions is to focus on new extensions to Neural Style Transfer (e.g., Fashion
Style Transfer and Character Style Transfer). There are already some preliminary work related
with this direction, such as the recent work of Yang et al. (2016) on Text Effects Transfer.
These interesting extensions may become trending topics in the future and related new areas
may be created subsequently.
Pipelineimage Styling #5C
Aesthetics enhancement: Video Style Transfer
DeepMovie: Using Optical Flow and
Deep Neural Networks to Stylize Movies
Alexander G. Anderson, Cory P. Berg, Daniel P. Mossing,
Bruno A. Olshausen (Submitted on 26 May 2016)
https://arxiv.org/abs/1605.08153
https://github.com/anishathalye/neural-style
Coherent Online Video Style Transfer
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua
(Submitted on 27 Mar 2017 (v1), last revised 28 Mar 2017 (this version, v2))
https://arxiv.org/abs/1703.09211
The main contribution of this paper is to use optical flow to initialize the
style transfer optimization so that the texture features move with the
objects in the video. Finally, we suggest a method to incorporate optical
flow explicitly into the cost function.
Overview of Our Approach: We begin by applying the style transfer algorithm to the first frame of the
movie using the content image as the initialization. Next, we calculate the optical flow field that takes the
first frame of the movie to the second frame. We apply this flow-field to the rendered version of the first
frame and use that as the initialization for the style transfer optimization for the next frame. Note, for
instance, that a blue pixel in the flow field image means that the underlying object in the video at that pixel
moved to the left from frame one to frame two. Intuitively, in order to apply the flow field to the styled
image, you move the parts of the image that have a blue pixel in the flow field to the left.
We propose the first end-to-end network for online video style transfer, which generates temporally coherent
stylized video sequences in near real-time. Two key ideas include an efficient network by incorporating short-
term coherence, and propagating short-term coherence to long-term, which ensures the consistency over
larger period of time. Our network can incorporate different image stylization networks. We show that the
proposed method clearly outperforms the per-frame baseline both qualitatively and quantitatively. Moreover, it
can achieve visually comparable coherence to optimization-based video style transfer, but is three orders of
magnitudes faster in runtime.
There are still some limitations in our
method. For instance, limited by the
accuracy of ground-truth optical flow
(given by DeepFlow2 [Weinzaepfel et al. 2013]
),
our results may suffer from some
incoherence where the motion is too
large for the flow to track. And after
propagation over a long period, small
flow errors may accumulate, causing
blurriness. These open questions are
interesting for further exploration in the
future work.
Pipelineimage Styling #6A
Aesthetics enhancement: Texture synthesis and upsampling
TextureGAN: Controlling Deep Image
Synthesis with Texture Patches
Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu,
James Hays
(Submitted on 9 Jun 2017)
https://arxiv.org/abs/1706.02823
TextureGAN pipeline.
A feed-forward
generative network is
trained end-to-end to
directly transform a 4-
channel input to a high-
res photo with realistic
textural details.
Photo-realistic Facial Texture Transfer
Parneet Kaur, Hang Zhang, Kristin J. Dana
(Submitted on 14 Jun 2017)
https://arxiv.org/abs/1706.04306
Overview of our method. Facial identity is preserved
using Facial Semantic Regularization which
regularizes the update of meso-structures using a
facial prior and facial semantic structural loss.
Texture loss regularizes the update of local textures
from the style image. The output image is initialized
with the content image and updated at each iteration
by back-propagating the error gradients for the
combined losses. Content/style photos: Martin
Scheoller/Art+Commerce.
Identity-preserving Facial Texture Transfer (FaceTex).
The textural details are transferred from style image to
content image while preserving its identity. FaceTex
outperforms existing methods perceptually as well as
quantitatively. Column 3 uses input 1 as the style image
and input 2 as the content. Column 4 uses input 1 as
the content image and input 2 as the style image.
Figure 3 shows more examples and comparison with
existing methods. Input photos: Martin
Scheoller/Art+Commerce.
Pipelineimage Styling #6B
Aesthetics enhancement: Texture synthesis with style transfer
Stable and Controllable Neural Texture Synthesis
and Style Transfer Using Histogram Losses
Eric Risser, Pierre Wilmot, Connelly Barnes
Artomatix, University of Virginia
(Submitted on 31 Jan 2017 (v1), last revised 1 Feb 2017 (this version, v2))
https://arxiv.org/abs/1701.08893
Our style transfer and texture synthesis results. The input styles are
shown in (a), and style transfer results are in (b, c). Note that the angular
shapes of the Picasso painting are successfully transferred on the top row,
and that the more subtle brush strokes are transferred on the bottom row.
The original content images are inset in the upper right corner. Unless
otherwise noted, our algorithm is always run with default parameters (we do
not manually tune parameters). Input textures are shown in (d) and texture
synthesis results are in (e). For the texture synthesis, note that the algorithm
synthesizes creative new patterns and connectivities in the output. Different statistics that can be used for neural network texture synthesis.
Pipelineimage Styling #6C
Aesthetics enhancement: Enhancing texture maps
Depth Texture Synthesis for Realistic
Architectural Modeling
Félix Labrie-Larrivée ; Denis Laurendeau ; Jean-François Lalonde
Computer and Robot Vision (CRV), 2016 13th Conference on
https://doi.org/10.1109/CRV.2016.77
In this paper, we present a novel approach that
improves the resolution and geometry of 3D
meshes of large scenes with such repeating
elements. By leveraging structure from motion
reconstruction and an off-the-shelf depth sensor,
our approach captures a small sample of the scene
in high resolution and automatically extends that
information to similar regions of the scene.
Using RGB and SfM depth information as a guide
and simple geometric primitives as canvas, our
approach extends the high resolution mesh by
exploiting powerful, image-based texture synthesis
approaches. The final results improves on
standard SfM reconstruction with higher detail.
Our approach benefits from reduced manual
labor as opposed to full RGBD reconstruction, and
can be done much more cheaply than with LiDAR-
based solutions.
In the future, we plan to work on a more
generalized 3D texture synthesis
procedure capable of synthesizing a more
varied set of objects, and able to
reconstruct multiple parts of the scene by
exploiting several high resolution scan
samples at once in an effort to address the
tradeoff mentioned above. We also plan to
improve the robustness of the approach
to a more varied set of large scale scenes,
irrespective of the lighting conditions,
material colors, and geometric
configurations. Finally, we plan to evaluate
how our approach compares to SfM on a
more quantitative level by leveraging
LiDAR data as ground truth.
Overview of the data collection and alignment procedure. Top row: a collection of photos of the scene is acquired with a typical camera, and used to generate a
point cloud via SfM [Agarwal et al. 2009] and dense multi-view stereo (MVS) [ Furukawa and Ponce, 2012]. Bottom row: a repeating feature of the scene (in this
example, the left-most window) is recorded with a Kinect sensor, and reconstructed into a high resolution mesh via the RGB-D SLAM technique KinectFusion [
Newcombe et al. 2011]. The mesh is then automatically aligned to the SfM reconstruction using bundle adjustment and our automatic scale adaptation
technique (see sec. III-C). Right: the high resolution Kinect mesh is correctly aligned to the low resolution SfM point cloud
Pipelineimage Styling #6D
Aesthetics enhancement: Towards photorealism with good maps
One Ph.D. position (supervision by Profs Niessner and Rüdiger
Westermann) is available at our chair in the area of photorealistic rendering
for deep learning and online reconstruction
Research in this project includes the development of photorealistic realtime rendering
algorithms that can be used in deep learning applications for scene understanding, and for
high-quality scalable rendering of point scans from depth sensors and RGB stereo image
reconstruction. If you are interested in applying, you should have a strong background in
computer science, i.e., efficient algorithms and data structures, and GPU programming,
have experience implementing C/C++ algorithms, and you should be excited to work on
state-of-the-art research in the 3D computer graphics.
https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le
arning-and-online-reconstruction.html
Ph.D. Position – Photorealistic Rendering for
Deep Learning and Online Reconstruction
Photorealism Explained
Blender Guru Published on May 25, 2016
http://www.blenderguru.com/tutorials/photorealism-explained/
https://youtu.be/R1-Ef54uTeU
Stop wasting time creating
texture maps by hand. All
materials on Poliigon come
with the relevant normal,
displacement, reflection and
gloss maps included. Just
plug them into your
software, and your material
is ready to render.
https://www.poliigon.com/
How to Make
Photorealistic PBR
Materials - Part 1
Blender Guru Published
on Jun 28, 2016
http://www.blenderguru.com/tutoria
ls/pbr-shader-tutorial-pt1/
https://youtu.be/V3wghb
Z-Vh4?t=24m5s
Physically Based Rendering (PBR)
Pipelineimage Styling #7
Styling line graphics (e.g. floorplans, 2D CADs) and monochrome images e.g. for desired visual identity
Real-Time User-Guided Image Colorization with
Learned Deep Priors
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
(Submitted on 8 May 2017)
https://arxiv.org/abs/1705.02999
Our proposed method colorizes a grayscale image (left), guided by sparse user inputs (second), in real-time,
providing the capability for quickly generating multiple plausible colorizations (middle to right). Photograph of
Migrant Mother by Dorothea Lange, 1936 (Public Domain).
Network architecture We train two variants of the user interaction colorization network. Both variants use the blue layers for predicting
a colorization. The Local Hints Network also uses red layers to (a) incorporate user points Ul and (b) predict a color distribution bZ. The
Global Hints Network uses the green layers, which transforms global input Uд by 1 × 1 conv layers, and adds the result into the main
colorization network. Each box represents a conv layer, with vertical dimension indicating feature map spatial resolution, and horizontal
dimension indicating number of channels. Changes in resolution are achieved through subsampling and upsampling operations. In the
main network, when resolution is decreased, the number of feature channels are doubled. Shortcut connections are added to
upsampling convolution layers.
Style Transfer for Anime Sketches with
Enhanced Residual U-net and Auxiliary
Classifier GAN
Lvmin Zhang, Yi Ji, Xin Lin
(Submitted on 11 Jun 2017 (v1), last revised 13 Jun 2017 (this version, v2))
https://arxiv.org/abs/1706.03319
Examples of combination results on sketch images (top-left) and style images
(bottom-left). Our approach automatically applies the semantic features of an existing
painting to an unfinished sketch. Our network has learned to classify the hair, eyes,
skin and clothes, and has the ability to paint these features according to a sketch.
In this paper, we integrated residual U-net to apply the style to the grayscale sketch
with auxiliary classifier generative adversarial network (AC-GAN, Odena et al. 2016).
The whole process is automatic and fast, and the results are creditable in the quality
of art style as well as colorization
Limitation: the pretrained VGG is for ImageNet photograph classification, but not for
paintings. In the future, we will train a classification network only for paintings to
achieve better results. Furthermore, due to the large quantity of layers in our residual
network, the batch size during training is limited to no more than 4. It remains for
future study to reach a balance between the batch size and quantity of layers.
+
Future Image restoration Depth Images (Kinect, etc.)
PipelineDepth image enhancement #1a
Image Formation #1
Pinhole Camera Model: ideal projection of a 3D
object on a 2D image. Fernandez et al. (2017)
Dot patterns of a Kinect for Windows (a) and two Kinects for Xbox (b) and (c)
are projected on a flat wall from a distance of 1000 mm. Note that the
projection of each pattern is similar, and related by a 3-D rotation depending
on the orientation of the Kinect diffuser installation. The installation variability
can clearly be observed from differences in the bright dot locations (yellow
stars), which differ by an average distance of 10 pixels. Also displayed in (d) is
the idealized binary replication of the Kinect dot pattern [Kinect Patter Uncovered]
, which
was used in this project to simulate IR images. - Landau et al. (2016)
Landau et al. (2016)
PipelineDepth image enhancement #1b
Image Formation #2
Characterizations of Noise in Kinect
Depth Images: A Review
Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar
IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 )
https://doi.org/10.1109/JSEN.2014.2309987
Kinect outputs for a scene. (a) RGB Image. (b) Depth data rendered as an 8-
bit gray-scale image with nearer depth values mapped to lower intensities.
Invalid depth values are set to 0. Note the fixed band of invalid (black) pixels
on left. (c) Depth image showing too near depths in blue, too far depths in red
and unknown depths due to highly specular objects in green. Often these are
all taken as invalid zero depth.
Shadow is created in a depth image (Yu et al. 2013) when the incident IR
from the emitter gets obstructed by an object and no depth can be
estimated.
PROPERTIES OF IR LIGHT [Rose]
Pipeline Depth image enhancement #1c
Image Formation #3
Authors’ experiments on structural noise using a plane in 400 frames.
(a) Error at 1.2m. (b) Error at 1.6m. (c) Error at 1.8m.
Smisek et al. (2013) calibrate a Kinect against a stereo-rig
(comprising two Nikon D60 DSLR cameras) to estimate and
improve its overall accuracy. They have taken images and
fitted planar objects at 18 different distances (from 0.7 to 1.3
meters) to estimate the error between the depths measured
by the two sensors. The experiments corroborate that the
accuracy varies inversely with the square of depth [2].
However, even after the calibration of Kinect, the procedure
still exhibits relatively complex residual errors (Fig. 8).
Fig. 8. Residual noise of a plane. (a) Plane at 86cm. (b) Plane
at 104cm.
Authors’ experiments on temporal noise. Entropy and SD
of each pixel in a depth frame over 400 frames for a
stationary wall at 1.6m. (a) Entropy image. (b) SD image.
Authors’ experiments with vibrating noise showing
ZD samples as white dots. A pixel is taken as noise if
it is zero in frame i and nonzero in frames i±1. Note
that noise follows depth edges and shadow. (a)
Frame (i−1). (b) Frame i. (c) Frame (i+1). (d) Noise for
frame i.
PipelineDepth image enhancement #1d
Image Formation #4
The filtered intensity samples generated from
unsaturated IR dots (blue dots) were used to fit the
intensity model (red line), which follows an inverse
square model for the distance between the sensor
and the surface point Landau et al. (2016)
(a) Multiplicative
speckle distribution
is unitless, and can
be represented as a
gamma distribution
Γ (4.54, 0.196). (b)
Additive detector
noise distribution
can be represented
as a normal
distribution Ν
(−0.126, 10.4), and
has units of 10-bit
intensity.
Landau et al. (2016)
The standard error in depth estimation (mm)
as a function of radial distance (pix) is plotted
for the (a) experimental and (b) simulated
data sets of flat walls at various depths (mm).
The experimental standard depth error
increases faster with an increase in radial
distance due to lens distortion.
Landau et al. (2016)
PipelineDepth image enhancement #2A
Metrological Calibration #1
A New Calibration Method for
Commercial RGB-D Sensors
Walid Darwish, Shenjun Tang, Wenbin Li and Wu Chen
Sensors 2017, 17(6), 1204; doi:10.3390/s17061204
Based on these calibration algorithms, different
calibration methods have been implemented and tested.
Methods include the use of 1D [Liu et al. 2012]
2D [Shibo and Qing 2012]
,
and 3D [Gui et al. 2014]
calibration objects that work with the
depth images directly; calibration of the manufacture
parameters of the IR camera and projector [Herrera et al. 2012]
;
or photogrammetric bundle adjustments used to model
the systematic errors of the IR sensors [
Davoodianidaliki and Saadatseresht 2013; Chow and Lichti 2013]
. To enhance the
depth precision, additional depth error models are
added to the calibration procedure [7,8,21,22,23].
All of these error models are used to compensate only
for the distortion effect of the IR projector and camera.
Other research works have been conducted to obtain
the relative calibration between an RGB camera and an
IR camera by accessing the IR camera [24,25,26]. This
can achieve relatively high accuracy calibration
parameters for a baseline between IR and RGB cameras,
while the remaining limitation is that the distortion
parameters for the IR camera cannot represent the full
distortion effect for the depth sensor.
This study addressed these issues using a two-step
calibration procedure to calibrate all of the geometric
parameters of RGB-D sensors. The first step was related
to the joint calibration between the RGB and IR cameras,
which was achieved by adopting the procedure
discussed in [27] to compute the external baseline
between the cameras and the distortion parameters of
the RGB camera. The second step focused on the depth
sensor calibration.
Point cloud of two perpendicular planes (blue
color: default depth; red color: modeled depth):
highlighted black dashed circles shows the
significant impact of the calibration method on the
point cloud quality.
The main difference between both sensors is the
baseline between the IR camera and projector. The
longer the sensor’s baseline, the longer working
distance can be achieved. The working range of
Kinect v1 is 0.80 m to 4.0 m, while it is 0.35 m to 3.5
m for Structure Sensor.
PipelineDepth image enhancement #2A
Metrological Calibration #2
Photogrammetric Bundle
Adjustment With Self-
Calibration of the PrimeSense
3D Camera Technology:
Microsoft Kinect
IEEE Access ( Volume: 1 ) 2013
https://doi.org/10.1109/ACCESS.2013.2271860
Roughness of point cloud before calibration. (Bottom) Roughness of point
cloud after calibration. The colours indicate the roughness as measured
by the normalized smallest eigenvalue.
Estimated Standard Deviation of the Observation Residuals
To quantify the external accuracy of the Kinect and the benefit of the proposed calibration,
a target board located at 1.5–1.8 m away with 20 signalized targets was imaged using an in-
house program based on the Microsoft Kinect SDK and with RGBDemo. Spatial distances
between the targets were known from surveying using the FARO Focus3D terrestrial laser
scanner with a standard deviation of 0.7 mm. By comparing the 10 independent spatial
distances measured by the Kinect to those made by the Focus3D, the RMSE was 7.8 mm
using RGBDemo and 3.7 mm using the calibrated Kinect results; showing a 53%
improvement to the accuracy. This accuracy check assesses the quality of all the imaging
sensors and not just the IR camera-projector pair alone.
The results show improvements in geometric accuracy up to 53%
compared with uncalibrated point clouds captured using the popular
software RGBDemo. Systematic depth discontinuities were also
reduced and in the check-plane analysis the noise of the Kinect point
cloud was reduced by 17%.
PipelineDepth image enhancement #2B
Metrological Calibration #3
Evaluating and Improving the Depth Accuracy of
Kinect for Windows v2
Lin Yang ; Longyu Zhang ; Haiwei Dong ; Abdulhameed
Alelaiwi ; Abdulmotaleb El Saddik
IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015)
https://doi.org/10.1109/JSEN.2015.2416651
Illustration of accuracy assessment of Kinect v2. (a) Depth accuracy. (b) Depth
resolution. (c) Depth entropy. (d) Edge noise. (e) Structural noise. The target plates in (a-
c) and (d-e) are parallel and perpendicular with the depth axis, respectively.
Accuracy error distribution
of Kinect for Windows v2.
PipelineDepth image enhancement #2c
A Comparative Error Analysis of Current
Time-of-Flight Sensors
IEEE Transactions on Computational Imaging (Volume: 2, Issue: 1, March 2016)
https://doi.org/10.1109/TCI.2015.2510506
For evaluating the presence of wiggling, ground truth distance
information is required. We calculate the true distance by setting
up a stereo camera system. This system consists of the ToF camera
to be evaluated and a high resolution monochrome camera (IDS
UI-1241LE7) which we call the reference camera.
The cameras are calibrated with Zhang (2000)’s algorithm with
point correspondences computed with ROCHADE (
Placht et al. 2014). Ground truth is calculated by intersecting the
rays of all ToF camera pixels with the 3D plane of the
checkerboard. For higher accuracy, we compute this plane from
corners detected in the reference image and transform the plane
into the coordinate system of the ToF camera
This experiment aims to quantify the so-
called amplitude-related distance error and
also to show that this effect is not related to
scattering. This effect can be observed when
looking at a planar surface with high
reflectivity variations. With some sensors the
distance measurements for pixels with
different amplitudes do not lie on the same
plane, even though they should.
To the best of our knowledge no evaluation
setup has been presented for this error
source so far. In the past this error has been
typically observed with images of
checkerboards or other high contrast
patterns. However, the analysis of single
images allows no differentiation between
amplitude-related errors andinternal
scattering
Metrological Calibration #4
PipelineDepth image enhancement #2c
Metrological Calibration #5
Low-Cost Reflectance-Based Method for the
Radiometric Calibration of Kinect 2
IEEE Sensors Journal ( Volume: 16, Issue: 7, April1, 2016 )
https://doi.org/10.1109/JSEN.2015.2508802
In this paper, a reflectance-based radiometric
method for the second generation of gaming
sensors, Kinect 2, is presented and discussed. In
particular, a repeatable methodology generalizable
to different gaming sensors by means of a calibrated
reference panel with Lambertian behavior is
developed.
The relationship between the received power and
the final digital level is obtained by means of a
combination of linear sensor relationship and
signal attenuation, into a least squares adjustment
with an outlier detector. The results confirm that
the quality of the method (standard deviation better
than 2% in laboratory conditions and discrepancies
lower than 7% b) is valid for exploiting the
radiometric possibilities of this low-cost sensor,
which ranges from the pathological analysis
(moisture, crusts, etc.…); to agricultural and forest
resource evaluation.
3D data acquired with Kinect 2 (left) and digital
number (DN) distribution (right) for the reference
panel at 0.7 m (units: counts).
Visible-RGB view of the brick wall (a), intensity-IR
digital levels (DN) (b-d) and calibrated reflectance
values (e-g) for the three acquisition distances
The objective of this paper was to develop a radiometric calibration equation of an IR projector-
camera for the second generation of gaming sensors, Kinect 2, to convert the recorded
digital levels into physical values (reflectance). By the proposed equation, the reflectance
properties of the IR projector-camera set of Kinect 2 were obtained. This new equation will
increase the number of application fields of gaming sensors, favored by the possibility of
working outdoors.
The process of radiometric calibration should be incorporated as part of an integral process
where the geometry obtained is also corrected (i.e., lens distortion, mapping function, depth
errors, etc.). As future perspectives, the effects of the diffuse radiance, which does not belong to
the sensor footprint and contaminate the received signal, will be evaluated to determine the
error budget in the active sensor.
PipelineDepth image enhancement #3
‘Old-school’ depth refining techniques
Depth enhancement with improved exemplar-based
inpainting and joint trilateral guided filtering
Liang Zhang ; Peiyi Shen ; Shu'e Zhang ; Juan Song ; Guangming Zhu
Image Processing (ICIP), 2016 IEEE International Conference on
https://doi.org/10.1109/ICIP.2016.7533131
In this paper, a novel depth enhancement algorithm with improved
exemplar-based inpainting and joint trilateral guided filtering is
proposed. The improved examplar-based inpainting method is
applied to fill the holes in the depth images, in which the level set
distance component is introduced in the priority evaluation function.
Then a joint trilateral guided filter is adopted to denoise and smooth
the inpainted results. Experimental results reveal that the proposed
algorithm can achieve better enhancement results compared with the
existing methods in terms of subjective and objective quality
measurements.
Robust depth enhancement and optimization based on
advanced multilateral filters
Ting-An ChangYang-Ting ChouJar-Ferr Yang
EURASIP Journal on Advances in Signal Processing December 2017, 2017:51
https://doi.org/10.1186/s13634-017-0487-7
Results of the depth enhancement coupled with hole filling results obtainedby a noisy depth map, b joint bilateral filter (JBF) [16
], c intensity guided depth superresolution (IGDS) [39], d compressive sensing based depth upsampling (CSDU) [40], e adaptive
joint trilateral filter (AJTF) [18], and f the proposed AMF for Art, Books, Doily, Moebius, RGBD_1, and RGBD_2
PipelineDepth image enhancement #4A
Deep learning-based depth refining techniques
DepthComp : real-time depth image completion based on
prior semantic scene segmentation
Atapour-Abarghouei, A. and Breckon, T.P.
28th British Machine Vision Conference (BMVC) 2017 London, UK, 4-7 September 2017.
http://dro.dur.ac.uk/22375/
Exemplar results on the KITTI dataset. S denotes the segmented images [3] and D the original (unfilled) disparity maps.
Results are compared with [1, 2, 29, 35, 45]. Results of cubic and linear interpolations are omitted due to space.
Comparison of the proposed method using different initial segmentation techniques on the KITTI dataset [27].
Original color and disparity image (top-left), results with manual labels (top-right), results with SegNet [3] (bottom-left)
and results with mean-shift [26] (bottom-right).
Fast depth image denoising and enhancement using a deep
convolutional network
Xin Zhang and Ruiyuan Wu
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEE
https://doi.org/10.1109/ICASSP.2016.7472127
PipelineDepth image enhancement #4b
Deep learning-based depth refining techniques
Guided deep network for depth map super-resolution: How
much can color help?
Wentian Zhou ; Xin Li ; Daryl Reynolds
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE
https://doi.org/10.1109/ICASSP.2017.7952398
https://anvoy.github.io/publication.html
Depth map upsampling using joint edge-guided
convolutional neural network for virtual view synthesizing
Yan Dong; Chunyu Lin; Yao Zhao; Chao Yao
Journal of Electronic Imaging Volume 26, Issue 4
http://dx.doi.org/10.1117/1.JEI.26.4.043004
Depth map upsampling. Input: (a) low-resolution depth map and (b) the corresponding color image.
Output: (c) recovered high-resolution depth map.
When the depth edges become unreliable, our network
tends to rely on color-based prediction network (CBPN) for
restoring more accurate depth edges. Therefore,
contribution of color image increases when the reliability of
the LR depth map decreases (e.g., as noise gets stronger).
We adopt the popular deep CNN to learn non-linear
mapping between LR and HR depth maps. Furthermore, a
novel color-based prediction network is proposed to
properly exploit supplementary color information in
addition to the depth enhancement network.
In our experiments, we have shown that deep neural
network based approach is superior to several existing
state-of-the-art methods. Further comparisons are
reported to confirm our analysis that the contributions of
color image vary significantly depending on the reliability
of LR depth maps.
Future Image restoration Depth Images (Laser scanning)
PipelineLaser range Finding #1a
Versatile Approach to Probabilistic
Modeling of Hokuyo UTM-30LX
IEEE Sensors Journal ( Volume: 16, Issue: 6, March15, 2016 )
https://doi.org/10.1109/JSEN.2015.2506403
When working with Laser Range Finding (LRF), it is
necessary to know the principle of sensor’s
measurement and its properties. There are several
measurement principles used in LRFs [Nejad and Olyaee 2006], [
Łabęcki et al. 2012], [Adams 1999]
:
● Triangulation
● Time of flight (TOF)
● Frequency modulation continuous wave (FMCW)
● Phase shift measurement (PSM)
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Distance measurement principle of time-of-flight laser
scanners (top) and phase based laser scanners
(bottom).
Laser Range Finding : Image formation #1
PipelineLaser range Finding #1b
Laser Range Finding : Image formation #2
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Two ways link budget between the receiver (Rx) and
the transmitter (Tx) in a Free Space Path (FSP)
propagation model.
Schematic representation of the signal propagation from
the transmitter to the receiver.
Effect of increasing incidence angle and range to the signal
deterioration. (left) Plot of the the signal deterioration due
to increasing incidence angle α, (right) plot of the signal
deterioration due to increasing ranges ρ, with ρmin
= 0 m
and ρmax
= 100 m
Relationship between scan angle and normal
vector orientation used for the segmentation of
point cloud with respect to planar features. A
point P = [ , , ]θ ϕ ρ is measured on the plane with
the normal parameters N = [ , , ]α β γ . Different
angles used for the range image gradients are
plotted
Theoretical number of points. Practical
example of a plate of 1×1 m placed at 3 m, oriented
at 0º and being rotated at 60º.
Theoretical number of points. (left) Number of
points with respect to the orientation of the patch
and the distance.
Reference plate
measurement set-up. A white
coated plywood board is
mounted on a tripod via a
screw clamp mechanism
provided with a 2º precision
goniometer.
PipelineLaser range Finding #1c
Laser Range Finding : Image formation #3
The geometry of terrestrial laser scanning; identification of
errors, modeling and mitigation of scanning geometry
Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016)
http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5
Terrestrial Laser Scanning (TLS) good practice of survey planning
Future directions At the time this research
started, terrestrial laser scanners were mainly
being used by research institutes and
manufacturers. However, nowadays, terrestrial
laser scanners are present in almost every field
of work, e.g. forensics, architecture, civil
engineering, gaming industry, movie industry.
Mobile mapping systems, such as scanners
capturing a scene while driving a car, or
scanners mounted on drones are currently
making use of the same range determination
techniques used in terrestrial laser scanners.
The number of applications that make use of
3D point clouds is rapidly growing. The need
for a sound quality product is even more
significant as it impacts the quality of a huge
panel of end-products.
PipelineLaser range Finding #1D
Laser Range Finding : Image formation #4
Ray-Tracing Method for Deriving
Terrestrial Laser Scanner Systematic
Errors
Derek D. Lichti, Ph.D., P.Eng.
Journal of Surveying Engineering | Volume 143 Issue 2 - May 2017
https://www.doi.org/10.1061/(ASCE)SU.1943-5428.0000213
Error model of direct georeferencing
procedure of terrestrial laser scanning
Pandžić, Jelena; Pejić, Marko; Božić, Branko; Erić, Verica
Automation in Construction Volume 78, June 2017, Pages 13-23
https://doi.org/10.1016/j.autcon.2017.01.003
PipelineLaser range Finding #2A
Calibration #1
Statistical Calibration Algorithms for Lidars
Anas Alhashimi, Luleå University of Technology, Control Engineering
Licentiate thesis (2016), ORCID iD: 0000-0001-6868-2210
A rigorous cylinder-based self-calibration approach for
terrestrial laser scanners
Ting On Chan; Derek D. Licht; David Belton
ISPRS Journal of Photogrammetry and Remote Sensing; Volume 99, January 2015
https://doi.org/10.1016/j.isprsjprs.2014.11.003
The proposed method and its variants were first applied to two simulated datasets, to compare
their effectiveness, and then to three real datasets captured by three different types of scanners
are presented: a Faro Focus 3D (a phase-based panoramic scanner); a Velodyne HDL-32E (a
pulse-based multi spinning beam scanner); and a Leica ScanStation C10 (a dual operating-mode
scanner).
In situ self-calibration is essential for
terrestrial laser scanners (TLSs) to
maintain high accuracy for many
applications such as structural
deformation monitoring (Lindenbergh, 2010)
. This
is particularly true for aged TLSs and
instruments being operated for long hours
outdoors with varying environmental
conditions.
Although the plane-based methods are now widely adopted for TLS
calibration, they also suffer from the problem of high parameter correlation
when there is a low diversity in the plane orientations (Chow et al., 2013). In
practice, not all locations possess large and smooth planar features that can be
used to perform a calibration. Even though planar features are available, their
planarity is not always guaranteed. Because of the drawbacks to the point-
based and plane-based calibrations, an alternative geometric feature, namely
circular cylindrical features (e.g. Rabbani et al., 2007), should be considered
and incorporated in to the self-calibration procedure.
Estimate d without being aware of the mode hopping,
i.e., assuming a certain λ0
without actually knowing that
the average λ jumps between different lasing modes,
reflects thus in a multimodal measurement of d
Potential temperature-bias dependencies for the
polynomial model.
The plot explaining the cavity
modes, gain profile and lasing
modes for typical laser diode. The
upper drawing shows the
wavelength v1
as the dominant
lasing mode while the lower drawing
shows how both wavelengths v1
and
v2
are competing; this latter case is
responsible for the mode-hopping
effects.
PipelineLaser range Finding #2b
Calibration #2
Calibration of a multi-beam Laser System
by using a TLS-generated Reference
Gordon, M.; Meidow, J.
ISPRS Annals of Photogrammetry, Remote Sensing and Spatial
Information Sciences, Volume II-5/W2, 2013, pp.85-90
http://dx.doi.org/10.5194/isprsannals-II-5-W2-85-2013
Extrinsic calibration of a multi-beam LiDAR
system with improved intrinsic laser parameters
using v-shaped planes and infrared images
Po-Sen Huang ; Wen-Bin Hong ; Hsiang-Jen Chien ; Chia-Yen Chen
IVMSP Workshop, 2013 IEEE 11th
https://doi.org/10.1109/IVMSPW.2013.6611921
Velodyne HDL-64E S2, the
LiDAR system studied in this
work , for example, is a mobile
scanner consisting of 64 pairs of
laser emitter-receiver which are
rigidly attached to a rotating
motor and provides real-time
panoramic range data with
measurement errors of around
2.5 mm.
In this paper we propose a
method to use IR images as
feedbacks in finding
optimized intrinsic and
extrinsic parameters of the
LiDAR-vision scanner.
First, we apply the IR-based calibration technique to a LiDAR system that
fires multiple beams, which significantly increases the problem's
complexity and difficulty. Second, the adjustment of parameters is applied
to not only the extrinsic parameters, but also the laser parameters as well
as the intrinsic parameters of the camera. Third, we use two different
objective functions to avoid generalization failure of the optimized
parameters.
It is assumed that the accuracy of this point clouds is considerably higher than that from the multi-
beam LIDAR and that the data represent faces of man-made objects at different distances. We
inspect the Velodyne HDL-64E S2 system as the best-known representative for this kind of sensor
system, while Z+F Imager 5010 serve as reference data. Beside the improvement of the point
accuracy by considering the calibration results, we test the significance of the parameters related to
the sensor model and consider the uncertainty of measurements w.r.t. the measured distances.
Standard deviation of the planar misclosure is
nearly halved from 3.2 cm to 1.7 cm. The variance
component estimation as well as the standard
deviation of the range residuals reveal that the
manufactures standard deviation of the distance
accuracy with 2 cm is a bit too optimistic.
The histograms of the planar misclosures and the residuals reveal that this
quantities are not normal distributed. Our investigation of the distance
depending misclosure variance change is one reason. Other sources were
investigated by Glennie and Lichti (2010): the incidence angle and the vertical
angle. A further possibility is the focal distance, which is different for each
laser and the average is at 8 m for the lower block and at 15 m for the
upper block. This may introduce a distance depending—but nonlinear—
variance change. Further research is needed to find the sources of these
observations.
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems

More Related Content

What's hot

Lighting design for Startup Offices
Lighting design for Startup OfficesLighting design for Startup Offices
Lighting design for Startup Offices
PetteriTeikariPhD
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
AshrafDabbas1
 
Using Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in ImagesUsing Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in Images
ijtsrd
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET Journal
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
Vijayananda Mohire
 
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
sipij
 
OCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep LearningOCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep Learning
PetteriTeikariPhD
 
Interpretable AI: Not Just For Regulators
Interpretable AI: Not Just For RegulatorsInterpretable AI: Not Just For Regulators
Interpretable AI: Not Just For Regulators
Databricks
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
Advanced-Concepts-Team
 
Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)
Hiroto Honda
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
Universitat Politècnica de Catalunya
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
geetachauhan
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining496573
 
Mechanical
MechanicalMechanical
Mechanical
jumbokuna
 
V01 i010411
V01 i010411V01 i010411
V01 i010411
IJARBEST JOURNAL
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 

What's hot (17)

Lighting design for Startup Offices
Lighting design for Startup OfficesLighting design for Startup Offices
Lighting design for Startup Offices
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Using Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in ImagesUsing Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in Images
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
Development and Hardware Implementation of an Efficient Algorithm for Cloud D...
 
OCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep LearningOCT Monte Carlo & Deep Learning
OCT Monte Carlo & Deep Learning
 
Interpretable AI: Not Just For Regulators
Interpretable AI: Not Just For RegulatorsInterpretable AI: Not Just For Regulators
Interpretable AI: Not Just For Regulators
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining
 
Mechanical
MechanicalMechanical
Mechanical
 
V01 i010411
V01 i010411V01 i010411
V01 i010411
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 

Similar to Dataset creation for Deep Learning-based Geometric Computer Vision problems

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
Edge AI and Vision Alliance
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
IRJET Journal
 
Presentation for min project
Presentation for min projectPresentation for min project
Presentation for min project
araya kiros
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
IRJET Journal
 
A Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesA Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural Communities
AIRCC Publishing Corporation
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
AIRCC Publishing Corporation
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
AIRCC Publishing Corporation
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
ijcsit
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
Marco Parenzan
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
Akbarali206563
 
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET Journal
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
Marco Parenzan
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
DhirajGidde
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
IRJET Journal
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
Enrico Busto
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
Enrico Busto
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
vivatechijri
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
IRJET Journal
 

Similar to Dataset creation for Deep Learning-based Geometric Computer Vision problems (20)

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
 
Presentation for min project
Presentation for min projectPresentation for min project
Presentation for min project
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
A Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesA Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural Communities
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
Portfolio
PortfolioPortfolio
Portfolio
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
 

More from PetteriTeikariPhD

ML and Signal Processing for Lung Sounds
ML and Signal Processing for Lung SoundsML and Signal Processing for Lung Sounds
ML and Signal Processing for Lung Sounds
PetteriTeikariPhD
 
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and OculomicsNext Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
PetteriTeikariPhD
 
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
PetteriTeikariPhD
 
Wearable Continuous Acoustic Lung Sensing
Wearable Continuous Acoustic Lung SensingWearable Continuous Acoustic Lung Sensing
Wearable Continuous Acoustic Lung Sensing
PetteriTeikariPhD
 
Precision Medicine for personalized treatment of asthma
Precision Medicine for personalized treatment of asthmaPrecision Medicine for personalized treatment of asthma
Precision Medicine for personalized treatment of asthma
PetteriTeikariPhD
 
Two-Photon Microscopy Vasculature Segmentation
Two-Photon Microscopy Vasculature SegmentationTwo-Photon Microscopy Vasculature Segmentation
Two-Photon Microscopy Vasculature Segmentation
PetteriTeikariPhD
 
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
Skin temperature as a proxy for core body temperature (CBT) and circadian phaseSkin temperature as a proxy for core body temperature (CBT) and circadian phase
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
PetteriTeikariPhD
 
Summary of "Precision strength training: The future of strength training with...
Summary of "Precision strength training: The future of strength training with...Summary of "Precision strength training: The future of strength training with...
Summary of "Precision strength training: The future of strength training with...
PetteriTeikariPhD
 
Precision strength training: The future of strength training with data-driven...
Precision strength training: The future of strength training with data-driven...Precision strength training: The future of strength training with data-driven...
Precision strength training: The future of strength training with data-driven...
PetteriTeikariPhD
 
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
Intracerebral Hemorrhage (ICH): Understanding the CT imaging featuresIntracerebral Hemorrhage (ICH): Understanding the CT imaging features
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
PetteriTeikariPhD
 
Hand Pose Tracking for Clinical Applications
Hand Pose Tracking for Clinical ApplicationsHand Pose Tracking for Clinical Applications
Hand Pose Tracking for Clinical Applications
PetteriTeikariPhD
 
Precision Physiotherapy & Sports Training: Part 1
Precision Physiotherapy & Sports Training: Part 1Precision Physiotherapy & Sports Training: Part 1
Precision Physiotherapy & Sports Training: Part 1
PetteriTeikariPhD
 
Multimodal RGB-D+RF-based sensing for human movement analysis
Multimodal RGB-D+RF-based sensing for human movement analysisMultimodal RGB-D+RF-based sensing for human movement analysis
Multimodal RGB-D+RF-based sensing for human movement analysis
PetteriTeikariPhD
 
Creativity as Science: What designers can learn from science and technology
Creativity as Science: What designers can learn from science and technologyCreativity as Science: What designers can learn from science and technology
Creativity as Science: What designers can learn from science and technology
PetteriTeikariPhD
 
Light Treatment Glasses
Light Treatment GlassesLight Treatment Glasses
Light Treatment Glasses
PetteriTeikariPhD
 
Deep Learning for Biomedical Unstructured Time Series
Deep Learning for Biomedical  Unstructured Time SeriesDeep Learning for Biomedical  Unstructured Time Series
Deep Learning for Biomedical Unstructured Time Series
PetteriTeikariPhD
 
Hyperspectral Retinal Imaging
Hyperspectral Retinal ImagingHyperspectral Retinal Imaging
Hyperspectral Retinal Imaging
PetteriTeikariPhD
 
Instrumentation for in vivo intravital microscopy
Instrumentation for in vivo intravital microscopyInstrumentation for in vivo intravital microscopy
Instrumentation for in vivo intravital microscopy
PetteriTeikariPhD
 
Future of Retinal Diagnostics
Future of Retinal DiagnosticsFuture of Retinal Diagnostics
Future of Retinal Diagnostics
PetteriTeikariPhD
 
Optical Designs for Fundus Cameras
Optical Designs for Fundus CamerasOptical Designs for Fundus Cameras
Optical Designs for Fundus Cameras
PetteriTeikariPhD
 

More from PetteriTeikariPhD (20)

ML and Signal Processing for Lung Sounds
ML and Signal Processing for Lung SoundsML and Signal Processing for Lung Sounds
ML and Signal Processing for Lung Sounds
 
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and OculomicsNext Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
 
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
 
Wearable Continuous Acoustic Lung Sensing
Wearable Continuous Acoustic Lung SensingWearable Continuous Acoustic Lung Sensing
Wearable Continuous Acoustic Lung Sensing
 
Precision Medicine for personalized treatment of asthma
Precision Medicine for personalized treatment of asthmaPrecision Medicine for personalized treatment of asthma
Precision Medicine for personalized treatment of asthma
 
Two-Photon Microscopy Vasculature Segmentation
Two-Photon Microscopy Vasculature SegmentationTwo-Photon Microscopy Vasculature Segmentation
Two-Photon Microscopy Vasculature Segmentation
 
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
Skin temperature as a proxy for core body temperature (CBT) and circadian phaseSkin temperature as a proxy for core body temperature (CBT) and circadian phase
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
 
Summary of "Precision strength training: The future of strength training with...
Summary of "Precision strength training: The future of strength training with...Summary of "Precision strength training: The future of strength training with...
Summary of "Precision strength training: The future of strength training with...
 
Precision strength training: The future of strength training with data-driven...
Precision strength training: The future of strength training with data-driven...Precision strength training: The future of strength training with data-driven...
Precision strength training: The future of strength training with data-driven...
 
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
Intracerebral Hemorrhage (ICH): Understanding the CT imaging featuresIntracerebral Hemorrhage (ICH): Understanding the CT imaging features
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
 
Hand Pose Tracking for Clinical Applications
Hand Pose Tracking for Clinical ApplicationsHand Pose Tracking for Clinical Applications
Hand Pose Tracking for Clinical Applications
 
Precision Physiotherapy & Sports Training: Part 1
Precision Physiotherapy & Sports Training: Part 1Precision Physiotherapy & Sports Training: Part 1
Precision Physiotherapy & Sports Training: Part 1
 
Multimodal RGB-D+RF-based sensing for human movement analysis
Multimodal RGB-D+RF-based sensing for human movement analysisMultimodal RGB-D+RF-based sensing for human movement analysis
Multimodal RGB-D+RF-based sensing for human movement analysis
 
Creativity as Science: What designers can learn from science and technology
Creativity as Science: What designers can learn from science and technologyCreativity as Science: What designers can learn from science and technology
Creativity as Science: What designers can learn from science and technology
 
Light Treatment Glasses
Light Treatment GlassesLight Treatment Glasses
Light Treatment Glasses
 
Deep Learning for Biomedical Unstructured Time Series
Deep Learning for Biomedical  Unstructured Time SeriesDeep Learning for Biomedical  Unstructured Time Series
Deep Learning for Biomedical Unstructured Time Series
 
Hyperspectral Retinal Imaging
Hyperspectral Retinal ImagingHyperspectral Retinal Imaging
Hyperspectral Retinal Imaging
 
Instrumentation for in vivo intravital microscopy
Instrumentation for in vivo intravital microscopyInstrumentation for in vivo intravital microscopy
Instrumentation for in vivo intravital microscopy
 
Future of Retinal Diagnostics
Future of Retinal DiagnosticsFuture of Retinal Diagnostics
Future of Retinal Diagnostics
 
Optical Designs for Fundus Cameras
Optical Designs for Fundus CamerasOptical Designs for Fundus Cameras
Optical Designs for Fundus Cameras
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Dataset creation for Deep Learning-based Geometric Computer Vision problems

  • 2. Purpose of this presentation ● This is the more ‘pragmatic set’ accompanying the slideset for analyzing SfM-Net architecture from Google. ● The main idea in the dataset creation is to have multiple sensor quality levels in the same rig, in order for us to obtain good quality reference data (ground truth, gold standard) with terrestrial laser scanner that can be used for image restoration deep learning networks - In order to get more out from the inference of lower quality sensors as well. Think of Google Tango, iPhone 8 with depth sensing, Kinect, etc. ● Presentation tries to address the typical problem of finding the relevant “seed literature” for a new topic helping fresh grad students, postdocs, software engineers and startup founders. - Answer to “Do you know if someone has done some work on the various steps involved in SfM” to identify what wheels do not need to be re-invented ●● Some of the RGB image enhancement/styling slides are not the most relevant when designing the hardware pipe per se, but are there highlighting the need for systems engineering approach for the design of the whole pipe rather than just obtaining the data somewhere and expecting the deep learning software to do all the magic for you. Deep Learning for Structure-from-Motion (SfM) https://www.slideshare.net/PetteriTeikariPhD/deconstructing-sfmnet
  • 3. Future Hardware and dataset creation
  • 4. Pipeline Dataset creation #1A The Indoor Multi-sensor Acquisition System (IMAS) presented in this paper consists of a wheeled platform equipped with two 2D laser heads, RGB cameras, thermographic camera, thermohygrometer, and luxmeter. http://dx.doi.org/10.3390/s16060785 Inspired by the system of Armesto et al., one could have a custom rig with: ● high-quality laser scanner giving the “gold standard” for depth, ● accompanied with smart phone quality RGB and depth sensing, ● accompanied by DSLR gold standard for RGB ● and some mid-level structured light scanner? Rig config would allow multiframe exposure techniques to be used easily than a handheld system (see next slide) We saw previous that the brightness constancy assumption might be tricky with some materials, and polarization measurement for example can help distinguishing materials (dielectric materials polarize light, whereas conductive do not), or if there is some other way of estimating Bidirectional Reflectance Distribution Function (BRDF) Multicamera rig calibration by double-sided thick checkerboard Marco Marcon; Augusto Sarti; Stefano Tubaro IET Computer Vision 2017 http://dx.doi.org/10.1049/iet-cvi.2016.0193
  • 5. Pipeline Dataset creation #2a : Multiframe Techniques Note! In deep learning, the term super-resolution refers to “statistical upsampling” whereas in optical imaging super-resolution typically refers to imaging techniques. Note2! Nothing should stop someone marrying them two though In practice anyone can play with super-resolution at home by putting a camera on a tripod and then taking multiple shots of the same static scene, and post-processing them through super-resolution that can improve modulation transfer function (MTF) for RGB images, improve depth resolution and reduce noise for laser scans and depth sensing e.g. with Kinect. https://doi.org/10.2312/SPBG/SPBG06/009-015 Cited by 47 articles (a) One scan. (b) Final super-resolved surface from 100 scans. “PhotoAcute software processes sets of photographs taken in continuous mode. It utilizes superresolution algorithms to convert a sequence of images into a single high-resolution and low-noise picture, that could only be taken with much better camera.” Depth looks a lot nicer when reconstructed using 50 consecutive Kinec v1 frames in comparison to just one frame. [Data from Petteri Teikari[ Kinect multiframe reconstruction with SiftFu [Xiao et al. (2013)] https://github.com/jianxiongxiao/ProfXkit
  • 6. Pipeline Dataset creation #2b : Multiframe Techniques Boring to take manually e.g. 100 shots of the same scene involving even 360 rotation of the imaging devices, in practice this would need to be automated in some way with a step motor driven by Arduino or if no good commercial systems are not available. Multiframe techniques would allow another level of “nesting” of ground truths for a joint image enhancement block along with the proposed structure and motion network. ● The reconstructed laser scan / depth image / RGB from 100 images would the target, and the single-frame version the input that need to be enhanced Meinhardt et al. (2017) Diamond et al. (2017)
  • 7. Pipeline Dataset creation #3 A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake Submitted on 15 Jul 2017, last revised 25 Jul 2017 https://arxiv.org/abs/1707.04796 In this paper we develop a pipeline to rapidly generate high quality RGBD data with pixelwise labels and object poses. We use an RGBD camera to collect video of a scene from multiple viewpoints and leverage existing reconstruction techniques to produce a 3D dense reconstruction. We label the 3D reconstruction using a human assisted ICP- fitting of object meshes. By reprojecting the results of labeling the 3D scene we can produce labels for each RGBD image of the scene. This pipeline enabled us to collect over 1,000,000 labeled object instances in just a few days. We use this dataset to answer questions related to how much training data is required, and of what quality the data must be, to achieve high performance from a DNN architecture. Overview of the data generation pipeline. (a) Xtion RGBD sensor mounted on Kuka IIWA arm for raw data collection. (b) RGBD data processed by ElasticFusion into reconstructed pointcloud. (c) User annotation tool that allows for easy alignment using 3 clicks. User clicks are shown as red and blue spheres. The transform mapping the red spheres to the green spheres is then the user specified guess. (d) Cropped pointcloud coming from user specified pose estimate is shown in green. The mesh model shown in grey is then finely aligned using ICP on the cropped pointcloud and starting from the user provided guess. (e) All the aligned meshes shown in reconstructed pointcloud. (f) The aligned meshes are rendered as masks in the RGB image, producing pixelwise labeled RGBD images for each view. Increasing the variety of backgrounds in the training data for single-object scenes also improved generalization performance for new backgrounds, with approximately 50 different backgrounds breaking into above- 50% IoU on entirely novel scenes. Our recommendation is to focus on multi-object data collection in a variety of backgrounds for the most gains in generalization performance. We hope that our pipeline lowers the barrier to entry for using deep learning approaches for perception in support of robotic manipulation tasks by reducing the amount of human time needed to generate vast quantities of labeled data for your specific environment and set of objects. It is also our hope that our analysis of segmentation network performance provides guidance on the type and quantity of data that needs to be collected to achieve desired levels of generalization performance.
  • 8. Pipeline Dataset creation #4 A Novel Benchmark RGBD Dataset for Dormant Apple Trees and Its Application to Automatic Pruning Shayan A. Akbar, Somrita Chattopadhyay, Noha M. Elfiky, Avinash Kak; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016 https://doi.org/10.1109/CVPRW.2016.50 Extending of the Kinect device functionality and the corresponding database Libor Bolecek ; Pavel Němec ; Jan Kufa ; Vaclav Ricny Radioelektronika (RADIOELEKTRONIKA), 2017 https://doi.org/10.1109/RADIOELEK.2017.7937594 One of the possible research directions is use of infrared version of the investigated scene for improvement of the depth map. However, the databases of the Kinect data which would contain the corresponding infrared images do not exist. Therefore, our aim was to create such database. We want to increase the usability of the database by adding stereo images. Moreover, the same scenes were captured by Kinect v2. It was also investigated the impact of simultaneous use Kinect v1 and Kinect v2 to improve depth map investigated the scene. The database contains sequences of objects on turntable and simple scenes containing several objects. The depth map of the scene obtained by a) Kinect v1, b) Kinect v2. The comparison of the one row of the depth map obtained by a) Kinect v1 b) Kinect v2 with true depth map. Kinect inrared image after change of the dynamics of brightness
  • 9. Pipeline Multiframe Pipe #1 100 Multiframe reconstruction enhancement block 1 2 100 3 ... ... Depth image (e.g. Kinect) Laser scan (e.g. Velodyne) RGB Image Target Learn to improve image quality from single image when the system is deployed Reconstruction could be done using traditional algorithms (e.g. OpenCV) to start with need then to save all individual frames when reconstruction algorithms improve, and all blocks can be iterated then ad infinitum Mix different image qualities and sensor qualities then in the training set to build invariance to scan quality
  • 10. Pipeline Multiframe Pipe #2 You could cascade different levels of quality If you want to make things complex in deeply supervised fashion LOWEST QUALITY Just with RGB HIGHEST QUALITY Depth map with professional laser scanner 2 1 3 4 5 6 The following step in the cascade is closer in quality to the previous one, and one could assume that this enhancement would be easier to learn, and the pipeline would output the enhanced quality as a “side effect” which is good for visualization purposes.
  • 11. Pipeline acquisition example with Kinect https://arxiv.org/abs/1704.07632 KinectFusion (Newcombe et al. 2011), one of the pioneering works, showed that a real-world object as well as an indoor scene canbe reconstructed in real-time with GPU acceleration. It exploits the iterative closest point (ICP) algorithm (Besl and McKay 1992) to track 6-DoF poses and the volumetric surface representation scheme with signed distance functions (Curless and Levoy, 1996) to fuse 3D measurements. A number of following studies (e.g. Choi et al. 2015) have tackled the limitation of KinectFusion; as the scale ofa scene increases, it is hard to completely reconstruct thescene due to the drift problem of the ICP algorithm as wellas the large memory consumption of volumetric integration. To scale up the KinectFusion algorithm, Whelan et al . (2012)] presented a spatially extended KinectFusion, named as Kintinuous, by incrementally adding KinectFusion results as the form of triangular meshes. Whelan et al . (2015) also proposed ElasticFusion to tackle similar problems as well as to overcome the problem of a pose graph optimization by using the surface loop closure optimization and the surfel-based representation. Moreover, to decrease the space complexity, ElasticFusion deallocates invisible surfels from the memory; invisible surfels are allocated in the memory again only if they are likely to be visible in the near future.
  • 12. Pipeline Multiframe Pipe into Sfm-Net
  • 13. Pipeline Multiframe Pipe Quality simulation Simulated Imagery Rendering Workflow for UAS- Based Photogrammetric 3D Reconstruction Accuracy Assessments Richard K. Slocum and Christopher E. Parrish Remote Sensing 2017, 9(4), 396; doi :10.3390/rs9040396 “Here, we present a workflow to render computer generated imagery using a virtual environment which can mimic the independent variables that would be experienced in a real-world UAS imagery acquisition scenario. The resultant modular workflow utilizes Blender Python API, an open source computer graphics software, for the generation of photogrammetrically-accurate imagery suitable for SfM processing, with explicit control of camera interior orientation, exterior orientation, texture of objects in the scene, placement of objects in the scene, and ground control point (GCP) accuracy.” Pictorial representation of the simUAS (simulated UAS) imagery rendering workflow. Note: The SfM-MVS step is shown as a “black box” to highlight the fact that the procedure can be implemented using any SfM-MVS software, including proprietary commercial software. The imagery from Blender, rendered using a pinhole camera model, is postprocessed to introduce lens and camera effects. The magnitudes of the postprocessing effects are set high in this example to clearly demonstrate the effect of each. The fullsize image (left) and a close up image (right) are both shown in order to depict both the large and small scale effects. A 50 cm wide section of the point cloud containing a box (3 m cube) is shown with the dense reconstruction point clouds overlaid to demonstrate the effect of point cloud dense reconstruction quality on accuracy near sharp edges. The points along the side of a vertical plane on a box were isolated and the error perpendicular to the plane of the box were visualized for each dense reconstruction setting, with white regions indicating no point cloud data. Notice that the region with data gaps in the point cloud from the ultra-high setting corresponds to the region of the plane with low image texture, as shown in the lower right plot.
  • 14. Data fusion combining multimodal data
  • 15. Pipeline data Fusion / Registration #1 “Rough estimates for 3D structure obtained using structure from motion (SfM) on the uncalibrated images are first co- registered with the lidar scan and then a precise alignment between the datasets is estimated by identifying correspondences between the captured images and reprojected images for individual cameras from the 3D lidar point clouds. The precise alignment is used to update both the camera geometry parameters for the images and the individual camera radial distortion estimates, thereby providing a 3D-to- 2D transformation that accurately maps the 3D lidar scan onto the 2D image planes. The 3D to 2D map is then utilized to estimate a dense depth map for each image. Experimental results on two datasets that include independently acquired high-resolution color images and 3D point cloud datasets indicate the utility of the framework. The proposed approach offers significant improvements on results obtained with SfM alone.” Fusing structure from motion and lidar for dense accurate depth map estimation Li Ding ; Gaurav Sharma Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on https://doi.org/10.1109/ICASSP.2017.7952363 https://arxiv.org/abs/1707.03167 “In this paper, we present RegNet, the first deep convolutional neural network (CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between multimodal sensors, exemplified using a scanning LiDAR and a monocular camera. Compared to existing approaches, RegNet casts all three conventional calibration steps (feature extraction, feature matching and global regression) into a single real-time capable CNN.” Development of the mean absolute error (MAE) of the rotational components over training iteration for different output representations: Euler angles are represented in red, quaternions in brown and dual quaternions in blue. Both quaternion representations outperform the Euler angles representation. “Our method yields a mean calibration error of 6 cm for translation and 0.28◦ for rotation with decalibration magnitudes of up to 1.5 m and 20◦ , which competes with state-of-the-art online and offline methods.”
  • 16. Pipeline data Fusion / Registration #2 Depth refinement for binocular Kinect RGB-D cameras Jinghui Bai ; Jingyu Yang ; Xinchen Ye ; Chunping Hou Visual Communications and Image Processing (VCIP), 2016 https://doi.org/10.1109/VCIP.2016.7805545
  • 17. Pipeline data Fusion / Registration #3 Used Kinects inexpensive ~£29.95 (eBay) Use multiple Kinects at once for better occlusion handling Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 ) https://doi.org/10.1109/JSEN.2014.2309987 Characterization of Different Microsoft Kinect Sensor Models IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015) https://doi.org/10.1109/JSEN.2015.2422611 An ANOVA analysis was performed to determine if the model of the Kinect, the operating temperature, or their interaction were significant factors in the Kinect's ability to determine the distance to the target. Different sized gauge blocks were also used to test how well a Kinect could reconstruct precise objects. Machinist blocks were used to examine how well the Kinect could reconstruct objects setup on an angle and determine the location of the center of a hole. All the Kinect models were able to determine the location of a target with a low standard deviation (<;2 mm). At close distances, the resolutions of all the Kinect models were 1 mm. Through the ANOVA analysis, the best performing Kinect at close distances was the Kinect model 1414, and at farther distances was the Kinect model 1473. The internal temperature of the Kinect sensor had an effect on the distance reported by the sensor. Using different correction factors, the Kinect was able to determine the volume of a gauge block and the angles machinist blocks were setup at, with under a 10% error.
  • 18. Pipeline data Fusion / Registration #4 A Generic Approach for Error Estimation of Depth Data from (Stereo and RGB-D) 3D Sensors Luis Fernandez, Viviana Avila and Luiz Gonçalves Preprints | Posted: 23 May 2017 | http://dx.doi.org/10.20944/preprints201705.0170.v1 “We propose an approach for estimating the error in depth data provided by generic 3D sensors, which are modern devices capable of generating an image (RGB data) and a depth map (distance) or other similar 2.5D structure (e.g. stereo disparity) of the scene. We come up with a multi-platform system and its verification and evaluation has been done, using the development kit of the board NVIDIA Jetson TK1 with the MS Kinects v1/v2 and the Stereolabs ZED camera. So the main contribution is the error determination procedure that does not need any data set or benchmark, thus relying only on data acquired on- the-fly. With a simple checkerboard, our approach is able to determine the error for any device” In the article of Yang [16], an MS Kinect v2 structure is proposed to improve the accuracy of the sensors and the depth of capture of objects that are placed more than four meters apart. It has been concluded that an object covered with light-absorbing materials, may cause less reflected IR light back to the MS Kinect and therefore erroneous depth data. Other factors, such as power consumption, complex wiring and high requirement for laptop computer also limits the use of the sensor. The characteristics of MS Kinect stochastic errors are presented for each direction of the axis in the work by Choo [17]. The depth error is measured using a 3D chessboard, similarly the one used in our approach. The results show that, for all three axes, the error should be considered independently. In the work of Song [18] it is proposed an approach to generate a per-pixel confidence measure for each depth map captured by MS Kinect in indoor scenes through supervised learning and the use of artificial intelligence Detection (a) and ordering (b) of corners in the three planes of the pattern. It would make sense to combine versions 1 and 2 for the same rig as Kinect v1 is more accurate for close distances, and Kinect v2 more accurate for far distances
  • 19. Pipeline data Fusion / Registration #5 Precise 3D/2D calibration between a RGB-D sensor and a C-arm fluoroscope International Journal of Computer Assisted Radiology and Surgery August 2016, Volume 11, Issue 8, pp 1385–1395 https://doi.org/10.1007/s11548-015-1347-2 “A RMS reprojection error of 0.5 mm is achieved using our calibration method which is promising for surgical applications. Our calibration method is more accurate when compared to Tsai’s method. Lastly, the simulation result shows that using a projection matrix has a lower error than using intrinsic and extrinsic parameters in the rotation estimation.” While the color camera has a relative high resolution (1920 px × 1080 px for Kinect 2.0), the depth camera is mid-resolution (512 px × 424 px for Kinect 2.0) and highly noisy. Furthermore, RGB-D sensors have a minimal distance to the scene from which they can estimate the depth. For instance, the minimum optimal distance of Kinect 2.0 is 50 cm. On the other hand, C-arm fluoroscopes have a short focus, which is typically 40 cm, and a much narrower field of view than the RGB-D sensor with also a mid-resolution image (ours is 640 px × 480 px). All these factors lead to a high disparity in the field of view between the C-arm and the RGB-D sensor if the two were to be integrated in a single system. This means that the calibration process is crucial. We need to achieve high accuracy for the localization of 3D points using RGB-D sensors, and we require a calibration phantom which can be clearly imaged by both devices. Workflow of the calibration process between the RGB-D sensor and a C-arm. The input data include a sequence of infrared, depth, and color images from the RGB-D sensor and X-ray images from the C-arm. The output of the calibration pipeline is the projection matrix, which is calculated by the 3D/2D correspondences detected from the input data
  • 20. Pipeline data Fusion / Registration #6 Fusing Depth and Silhouette for Scanning Transparent Object with RGB-D Sensor Yijun Ji, Qing Xia, and Zhijiang Zhang System overview; TSDF: truncated signed distance function; SFS: shape from silhouette. Results on noise region. (a) Color images captured by stationary camera with a rotating platform. (b) The noisy voxels detected by multiple depth images are in red. (c) and (d) show the experimental results done by a moving Kinect; the background is changing in these two cases.
  • 21. Pipeline data Fusion / Registration #7 Intensity Video Guided 4D Fusion for Improved Highly Dynamic 3D Reconstruction Jie Zhang, Christos Maniatis, Luis Horna, Robert B. Fisher (Submitted on 6 Aug 2017) https://arxiv.org/abs/1708.01946 Temporal tracking of intensity image points (of moving and deforming objects) allows registration of the corresponding 3D data points, whose 3D noise and fluctuations are then reduced by spatio-temporal multi-frame 4D fusion. The results demonstrate that the proposed algorithm is effective at reducing 3D noise and is robust against intensity noise. It outperforms existing algorithms with good scalability on both stationary and dynamic objects. The system framework (using 3 consecutive frames as an example) Static Plane (first row): (a) mean roughness; (b) std of roughness vs. number of frames fused. Falling ball (second row): (c) mean roughness; (d) std of roughness vs. number of frames fused Texture-related 3D noise on a static plane: (a) 3D frame; (b) 3D frame with textures. The 3D noise is closely related to the textures in the intensity image. Illustration of 3D noise reduction on the ball. Spatial-temporal divisive normalized bilateral filter (DNBF)
  • 22. Pipeline data Fusion / Registration #8 Utilization of a Terrestrial Laser Scanner for the Calibration of Mobile Mapping Systems Seunghwan Hong, Ilsuk Park, Jisang Lee, Kwangyong Lim, Yoonjo Choi and Hong-Gyoo Sohn Sensors 2017, 17(3), 474; doi :10.3390/s17030474 Configuration of mobile mapping system: network video cameras (F: front, L: left, R: right), mobile laser scanner, and Global Navigation Satellite System (GNSS)/Inertial Navigation System (INS). To integrate the datasets captured by each sensor mounted on the Mobile Mapping System (MMS) into the unified single coordinate system, the calibration, which is the process to estimate the orientation (boresight) and position (lever-arm) parameters, is required with the reference datasets [Schwarz and El-Sheimy 2004, Habib et al. 2010, Chan et al. 2010]. When the boresight and lever-arm parameters defining the geometric relationship between each sensing data and GNSS/INS data are determined, georeferenced data can be generated. However, even after precise calibration, the boresight and lever- arm parameters of an MMS can be shaken and the errors that deteriorate the accuracy of the georeferenced data might accumulate. Accordingly, for the stable operation of multiple sensors, precise calibration must be conducted periodically. (a) Sphere target used for registration of terrestrial laser scanning data; (b) sphere target detected in a point cloud (the green sphere is a fitted sphere model). Network video camera: AXIS F1005-E GNSS/INS unit: OxTS Survey+ Terrestrial laser scanner (TLS): Faro Focus 3D Mobile laser scanner: Velodyne HDL 32-E
  • 23. Pipeline data Fusion / Registration #9 Dense Semantic Labeling of Very-High-Resolution Aerial Imagery and LiDAR with Fully-Convolutional Neural Networks and Higher-Order CRFs Yansong Liu, Sankaranarayanan Piramanayagam, Sildomar T. Monteiro, Eli Saber http://openaccess.thecvf.com/content_cvpr_2017_workshops/w18/papers/Liu_Dense_Semantic_Labeling_CVPR_2017_paper.pdf Our proposed decision-level fusion scheme: training one fully-convolutional neural network on the color-infrared image (CIR) and one logistic regression using hand-crafted features. Two probabilistic results: PFCN and PLR are then combined in a higher-order CRF framework Main original contributions of our work are: 1) the use of energy based CRFs for efficient decision- level multisensor data fusion for the task of dense semantic labeling. 2) the use of higher-order CRFs for generating labeling outputs with accurate object boundaries. 3) the proposed fusion scheme has a simpler architecture than training two separate neural networks, yet it still yields the state-of-the- art dense semantic labeling results. Guiding multimodal registration with learned optimization updates Gutierrez-Becker B, Mateus D, Peter L, Navab N Medical Image Analysis Volume 41, October 2017, Pages 2-17 https://doi.org/10.1016/j.media.2017.05.002 Training stage (left): A set of aligned multimodal images is used to generate a training set of images with known transformations. From this training set we train an ensemble of trees mapping the joint appearance of the images to displacement vectors. Testing stage (right): We register a pair of multimodal images by predicting with our trained ensemble the required displacements δ for alignment at different locations z. The predicted displacements are then used to devise the updates of the transformation parameters to be applied to the moving image. The procedure is repeated until convergence is achieved. Corresponding CT (left) and MR-T1 (middle) images of the brain obtained from the RIRE dataset. The highlighted regions are corresponding areas between both images (right). Some multimodal similarity metrics rely on structural similarities between images obtained using different modalities, like the ones inside the blue boxes. However in many cases structures which are clearly visible in one imaging modality correspond to regions with homogeneous voxel values in the other modality (red and green boxes).
  • 24. Future Image restoration Natural Images (RGB)
  • 25. PipelineRGB image Restoration #1 https://arxiv.org/abs/1704.02738 Our method includes a sub- pixel motion compensation (SPMC) layer that can better handle inter-frame motion for this task. Our detail fusion (DF) network that can effectively fuse image details from multiple images after SPMC alignment “Hardware Super-resolution” of course all via deep learning too https://petapixel.com/2015/02/21/a-pract ical-guide-to-creating-superresolution-p hotos-with-photoshop/
  • 26. PipelineRGB image Restoration #2A “Data-driven Super-resolution” what super-resolution typically means in the deep learning space Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution” External Prior Guided Internal Prior Learning for Real Noisy Image Denoising Jun Xu, Lei Zhang, David Zhang (Submitted on 12 May 2017) https://arxiv.org/abs/1705.04505 Denoised images of a region cropped from the real noisy image from DSLR “Nikon D800 ISO 3200 A3”, Nam et al. 2016 (+video) by different methods. The scene was shot 500 times with the same camera and camera setting. The mean image of the 500 shots is roughly taken as the “ground truth”, with which the PSNR index can be computed. The images are better viewed by zooming in on screen Benchmarking Denoising Algorithms with Real Photographs Tobias Plötz, Stefan Roth (Submitted on 5 Jul 2017) https://arxiv.org/abs/1707.01313 “We then capture a novel benchmark dataset, the Darmstadt Noise Dataset (DND), with consumer cameras of differing sensor sizes. One interesting finding is that various recent techniques that perform well on synthetic noise are clearly outperformed by BM3D on photographs with real noise. Our benchmark delineates realistic evaluation scenarios that deviate strongly from those commonly used in the scientific literature.” Image formation process underlying the observed low-ISO image xr and high-ISO image xn . They are generated from latent noise-free images yr and yn , respectively, which in turn are related by a linear scaling of image intensities (LS), a small camera translation (T), and a residual low- frequency pattern (LF). To obtain the denoising ground truth yp , we apply post-processing to xr aiming at undoing these undesirable transformations. Mean PSNR (in dB) of the denoising methods tested on our DND benchmark. We apply denoising either on linear raw intensities, after a variance stabilizing transformation (VST, Anscombe), or after conversion to the sRGB space. Likewise, we evaluate the result either in linear raw space or in sRGB space. The noisy images have a PSNR of 39.39 dB (linear raw) and 29.98 dB (sRGB). Difference between blue channels of low- and high-ISO images from Fig. 1 after various post- processing stages. Images are smoothed for display to highlight structured residuals, attenuating the noise.
  • 27. PipelineRGB image Restoration #2b “Data-driven Super-resolution” what super-resolution typically means in the deep learning space MemNet: A Persistent Memory Network for Image Restoration Ying Tai, Jian Yang, Xiaoming Liu, Chunyan Xu (Submitted on 7 Aug 2017) https://arxiv.org/abs/1708.02209 https://github.com/tyshiwo/MemNet. Output of the “hardware super-resolution” can be used as a target for the “data-driven super-resolution” The same MemNet structure achieves the state-of-the-art performance in image denoising, super-resolution and JPEG deblocking. Due to the strong learning ability, our MemNet can be trained to handle different levels of corruption even using a single model. Training Setting: Following the method of Mao et al. (2016), for image denoising, the grayscale image is used; while for SISR and JPEG deblocking, the luminance component is fed into the model. Deep Generative Adversarial Compression Artifact Removal Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Alberto Del Bimbo (Submitted on 8 Apr 2017) https://arxiv.org/abs/1704.02518 In this work we address the problem of artifact removal using convolutional neural networks. The proposed approach can be used as a post-processing technique applied to decompressed images, and thus can be applied to different compression algorithms (typically applied in YCrCb color space) such as JPEG, intra-frame coding of H.264/AVC and H.265/HEVC. Compared to super resolution techniques, working on compressed images instead of down-sampled ones, is more practical, since it does not require to change the compression pipeline, that is typically hardware based, to subsample the image before its coding; moreover, camera resolutions have increased during the latest years, a trend that we can expect to continue.
  • 28. PipelineRGB image Restoration #3 An attempt to improve smartphone camera quality with DSLR high quality image as the ‘gold standard’ with deep learning https://arxiv.org/abs/1704.02470 Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool Computer Vision Laboratory, ETH Zurich, Switzerland “Quality transfer”
  • 30. Pipelineimage enhancement #1 Aesthetics enhancement: “AI-driven Interior Design” “Re-colorization” of scanned indoor scenes or intrinsic decomposition based editing Limitations. We have to manually correct inaccurate segmentation, though seldom encountered. This is a limitation of our method. However, segmentation errors are seldom encountered during experiments. Since our method is object-based, our segmentation method does not consider the color patterns among similar components of an image object. Currently, our system is not capable of segmenting the mesh according to the colored components with similar geometry for this kind of objects. This is another limitation of our method. An intrinsic image decomposition method could be helpful to our image database, for extracting lighting-free textures to be further used in rendering colorized scenes. However, such methods are not so robust that can be directly applied to various images in a large image database. On the other hand, intrinsic image decomposition is not essential to achieve good results in our experiments. So we did not incorporate it in our work, but we will further study it to improve our database.
  • 31. Pipelineimage enhancement #2 “Auto-adjust” RGB texture maps for indoor scans with user interaction We use the CIELab color space for both the input and output images. We can use 3-channel Lab color as the color features. However, it generates color variations in smooth regions since each color is processed independently. To alleviate this issue, we add the local neighborhood information by concatenating the Lab color and the L2 normalized first-layer convolutional feature maps of ResNet-50. Although the proposed method provides the users with automatically adjusted photos, some users may want their photos to be retouched by their own preference. In the first row of Fig. 2 for example, a user may want only the color of the people to be changed. For such situations, we provide a way for the users to give their own adjustment maps to the system. Figure 4 shows some examples of the personalization. When the input image is forwarded, we substitute the extracted semantic adjustment map with the new adjustment map from the user. As shown in the figure, the proposed method effectively creates the personalized images adjusted by user’s own style. Deep Semantics-Aware Photo Adjustment Seonghyeon Nam, Seon Joo Kim (Submitted on 26 Jun 2017) https://arxiv.org/abs/1706.08260
  • 32. Pipelineimage enhancement #3 Aesthetic-Driven Image Enhancement by Adversarial Learning Yubin Deng, Chen Change Loy, Xiaoou Tang (Submitted on 17 Jul 2017) https://arxiv.org/abs/1707.05251 GAN GAN GAN Pro Pro Pro Examples of image enhancement given original input (a). The architecture of our proposed EnhanceGAN framework. ResNet module is the feature extractor (for image in CIELab color space); in this work, we use the ResNet-101 and removed the last average pooling layer and the final fc layer. The switch icons in the discriminator network represent zero-masking during stage-wise training “Auto-adjust” RGB texture maps for indoor scans with GANs
  • 33. Pipelineimage enhancement #4 “Auto-adjust” RGB texture maps for indoor scans with GANs for “auto-matting” Creatism: A deep-learning photographer capable of creating professional work Hui Fang, Meng Zhang (Submitted on 11 Jul 2017) https://arxiv.org/abs/1707.03491 https://google.github.io/creatism/ Datasets were created that contain ratings of photographs based on aesthetic quality [Murray et al., 2012] [Kong et al., 2016] [Lu et al., 2015]. Using our system, we mimic the workflow of a landscape photographer, from framing for the best composition to carrying out various post-processing operations. The environment for our virtual photographer is simulated by a collection of panorama images from Google Street View. We design a "Turing- test"-like experiment to objectively measure quality of its creations, where professional photographers rate a mixture of photographs from different sources blindly. We work with professional photographers to empirically de- fine 4 levels of aesthetic quality: ● 1: point-and-shoot photos without consideration. ● 2: Good photos from the majority of population without art background. Nothing artistic stands out. ● 3: Semi-pro. Great photos showing clear artistic aspects. The photographer is on the right track of becoming a professional. ● 4: Pro-work. Clearly each professional has his/her unique taste that needs calibration. We use AVA dataset [Murray et al., 2012] wto bootstrap a consensus among them. Assume there exists a universal aesthetics metric, Φ. By definition, needs to incorporate all aesthetic aspects, suchΦ as saturation, detail level, composition... To define withΦ examples, number of images needs to grow exponentially to cover more aspects [Jaroensri et al., 2015]. To make things worse, unlike traditional problems such as object recognition, what we need are not only natural images, but also pro- level photos, which are much less in quantity.
  • 34. Pipelineimage enhancement #5 “Auto-adjust” images based on different user groups (or personalizing for different markets for indoor scan products) Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams CSAIL, MI; Adobe Research; Media Lab, MIT; Harvard and Google Brain (Submitted on 17 Apr 2017) https://arxiv.org/abs/1704.04997 The main goals of our proposed models: (a) Multimodal photo edits: For a given photo, there may be multiple valid aesthetic choices that are quite different from one another. (b) User categorization: A synthetic example where different user clusters tend to prefer different slider values. Group 1 users prefer to increase the exposure and temperature for the baby images; group 2 users reduce clarity and saturation for similar images. Predictive log-likelihood for users in the test set of different datasets. For each user in the test set, we compute the predictive log-likelihood of 20 images, given 0 to 30 images and their corresponding sliders from the same user. 30 sample trajectories and the overall average ± s.e. is shown for casual, frequent and expert users. The figure shows that knowing more about the user (up to around 10 images) can increase the predictive log-likelihood. The log-likelihood is normalized by subtracting off the predictive log-likelihood computed given zero images. Note the different y-axis in the plots. The rightmost plot is provided for comparing the average predictive log-likelihood across datasets.
  • 35. Pipelineimage enhancement #6 Combining semantic segmentation for higher quality “Instagram filters” Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features Feida Zhu ; Zhicheng Yan ; Jiajun Bu ; Yizhou Yu IEEE Transactions on Image Processing ( Volume: 26, Issue: 7, July 2017 ) https://doi.org/10.1109/TIP.2017.2703099 Color and tone stylization in images and videos strives to enhance unique themes with artistic color and tone adjustments. It has a broad range of applications from professional image postp-rocessing to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of image contents and the resulting global color transform limits the range of artistic styles it can represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various semantic regions. Such an ability enables a broader range of visual styles. Traditional professional video editing softwares (Adobe After Effects, Nuke, etc.) offer a suite of predefined operations with tunable parameters that apply common global adjustments (exposure/color correction, white balancing, sharpening, denoising, etc). Local adjustments within specific spatiotemporal regions are usually accomplished with masking layers created with intensive user interaction. Both parameter tuning and masking layer creation are labor intensive processes. An example of learning semantics-aware photo adjustment styles. Left: Input image. Middle: Manually enhanced by photographer. Distinct adjustments are applied to different semantic regions. Right: Automatically enhanced by our deep learning model trained from image exemplars. (a) Input image. (b) Ground truth. (c) Our result. Given a set of exemplar image pairs, each representing a photo before and after pixel-level color (in CIELab space) and tone adjustments following a particular style, we wish to learn a computational model that can automatically adjust a novel input photo in the same style. We still cast this learning task as a regression problem as in Yan et al. (2016). For completeness, let us first review their problem definition and then present our new deep learning based architecture and solution.
  • 36. Pipelineimage enhancement #7A Combining semantic segmentation for higher quality “Instagram filters” Deep Bilateral Learning for Real-Time Image Enhancement Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, Frédo Durand MIT CSAIL, Google Research, MIT CSAIL / Inria, Université Côte d’Azur (Submitted on 10 Jul 2017) https://arxiv.org/abs/1707.02880 | https://github.com/mgharbi/hdrnet | https://groups.csail.mit.edu/graphics/hdrnet/ https://youtu.be/GAe0qKKQY_I Our novel neural network architecture can reproduce sophisticated image enhancements with inference running in real time at full HD resolution on mobile devices. It can not only be used to dramatically accelerate reference implementations, but can also learn subjective effects from human retouching (“copycat” filter). By performing most of its computation within a bilateral grid and by predicting local affine color transforms, our model is able to strike the right balance between expressivity and speed. To build this model we have introduced two new layers: a data-dependent lookup that enables slicing into the bilateral grid, and a multiplicative operation for affine transformation. By training in an end-to-end fashion and optimizing our loss function at full resolution (despite most of our network being at a heavily reduced resolution), our model is capable of learning full-resolution and non-scale- invariant effects.
  • 37. Pipelineimage enhancement #8 Blind Image Quality assessment e.g. for quantifying RGB scan quality real-time RankIQA: Learning from Rankings for No- reference Image Quality Assessment Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov (Submitted on 26 Jul 2017) https://arxiv.org/abs/1707.08347 The classical approach trains a deep CNN regressor directly on the ground-truth. Our approach trains a network from an image ranking dataset. These ranked images can be easily generated by applying distortions of varying intensities. The network parameters are then transferred to the regression network for finetuning. This allows for the training of deeper and wider networks. Siamese network output for JPEG distortion considering 6 levels. This graphs illustrate the fact that the Siamese network successfully manages to separate the different distortion levels. Blind Deep S3D Image Quality Evaluation via Local to Global Feature Aggregation Heeseok Oh ; Sewoong Ahn ; Jongyoo Kim ; Sanghoon Lee IEEE Transactions on Image Processing ( Volume: 26, Issue: 10, Oct. 2017 ) https://doi.org/10.1109/TIP.2017.2725584
  • 39. Pipelineimage Styling #1 Aesthetics enhancement: High Dynamic Range from SfM Large scale structure-from-motion (SfM) algorithms have recently enabled the reconstruction of highly detailed 3-D models of our surroundings simply by taking photographs. In this paper, we propose to leverage these reconstruction techniques to automatically estimate the outdoor illumination conditions for each image in a SfM photo collection. We introduce a novel dataset of outdoor photo collections, where the ground truth lighting conditions are known at each image. We also present an inverse rendering approach that recovers a high dynamic range estimate of the lighting conditions for each low dynamic range input image. Our novel database is used to quantitatively evaluate the performance of our algorithm. Results show that physically plausible lighting estimates can faithfully be recovered, both in terms of light direction and intensity. Lighting Estimation in Outdoor Image Collections Jean-Francois Lalonde (Laval University); Iain Matthews (Disney Research) 3D Vision (3DV), 2014 2nd International Conference on https://www.disneyresearch.com/publication/lighting-estimation-in-outdoor-image-collections/ https://doi.org/10.1109/3DV.2014.112 The main limitation of our approach is that it can recover precise lighting parameters only when lighting actually creates strongly visible effects—such as cast shadows, shading differences amongst surfaces of different orientations—on the image. When the camera does not observe significant lighting variations, for example when the sun is shining on a part of the building that the camera does not observe, or when the camera only see a very small fraction of the landmark with little geometric details, our approach recovers a coarse estimate of the full lighting conditions. In addition, our approach is sensitive to errors in geometry estimation, or to the presence of unobserved, nearby objects. Because it does not know about these objects, our method tries to explain their cast shadows with the available geometry, which may result in errors. Our approach is also sensitive to inter-reflections. Incorporating more sophisticated image formation models such as radiosity could help alleviating this problem, at the expense of significantly more computation. Finally, our approach relies on knowledge of the camera exposure and white balance settings, which might be less applicable to the case of images downloaded on the Internet. We plan to explore these issues in future work. Exploring material recognition for estimating reflectance and illumination from a single image Michael Weinmann; Reinhard Klein MAM '16 Proceedings of the Eurographics 2016 Workshop on Material Appearance Modeling https://doi.org/10.2312/mam.20161253 We demonstrate that reflectance and illumination can be estimated reliably for several materials that are beyond simple Lambertian surface reflectance behavior because of exhibiting mesoscopic effects such as interreflections and shadows. Shading Annotations in the Wild Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala (Submitted on 2 May 2017) https://arxiv.org/abs/1705.01156 http://opensurfaces.cs.cornell.edu/saw/ We use this data to train a convolutional neural network to predict per-pixel shading information in an image. We demonstrate the value of our data and network in an application to intrinsic images, where we can reduce decomposition artifacts produced by existing algorithms.
  • 40. Pipelineimage Styling #2A Aesthetics enhancement: High Dynamic Range #1 Learning High Dynamic Range from Outdoor Panoramas Jinsong Zhang, Jean-François Lalonde (Submitted on 29 Mar 2017 (v1), last revised 8 Aug 2017 (this version, v2)) https://arxiv.org/abs/1703.10200 http://www.jflalonde.ca/projects/learningHDR Qualitative results on the synthetic dataset. Top row: the ground truth HDR panorama, middle row: the LDR panorama, and bottom row: the predicted HDR panorama obtained with our method. To illustrate dynamic range, each panorama is shown at two exposures, with a factor of 16 between the two. For each example, we show the panorama itself (left column), and the rendering of a 3D object lit with the panorama (right column). The object is a “spiky sphere” on a ground plane, seen from above. Our method accurately predicts the extremely high dynamic range of outdoor lighting in a wide variety of lighting conditions. A tonemapping of γ = 2.2 is used for display purposes. Real cameras have non-linear response functions. To simulate this, we randomly sample real camera response functions from the Database of Response Functions (DoRF) [Grossberg and Nayar, 2003], and apply them to the linear synthetic data before training. Examples from our real dataset. For each case, we show the LDR panorama captured by the Ricoh Theta S camera, a consumer grade point-and-shoot 360º camera (left), and the corresponding HDR panorama captured by the Canon 5D Mark III DSLR mounted on a tripod, equipped with a Sigma 8mm fisheye lens (right, shown at a different exposure to illustrate the high dynamic range). We present a full end-to-end learning approach to estimate the extremely high dynamic range of outdoor lighting from a single, LDR 360º panorama. Our main insight is to exploit a large dataset of synthetic data composed of a realistic virtual city model, lit with real world HDR sky light probes [Lalonde et al. 2016 http://www.hdrdb.com/] to train a deep convolutional autoencoder
  • 41. Pipelineimage Styling #2b High Dynamic Range #2: Learn illumination for relighting purposes Learning to Predict Indoor Illumination from a Single Image Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde (Submitted on 1 Apr 2017 (v1), last revised 25 May 2017 (this version, v2)) https://arxiv.org/abs/1704.00090
  • 42. Pipelineimage Styling #3a Improving photocompositing and relighting of RGB textures Deep Image Harmonization Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang (Submitted on 28 Feb 2017) https://arxiv.org/abs/1703.00069 Our method can adjust the appearances of the composite foreground to make it compatible with the background region. Given a composite image, we show the harmonized images generated by Xue et al. (2012), Zhu et al. (2015) and our deep harmonization network. The overview of the proposed joint network architecture. Given a composite image and a provided foreground mask, we first pass the input through an encoder for learning feature representations. The encoder is then connected to two decoders, including a harmonization decoder for reconstructing the harmonized output and a scene parsing decoder to predict pixel-wise semantic labels. In order to use the learned semantics and improve harmonization results, we concatenate the feature maps from the scene parsing decoder to the harmonization decoder (denoted as dot-orange lines). In addition, we add skip links (denoted as blue-dot lines) between the encoder and decoders for retaining image details and textures. Note that, to keep the figure clean, we only depict the links for the harmonization decoder, while the scene parsing decoder has the same skip links connected to the encoder. Given an input image (a), our network can adjust the foreground region according to the provided mask (b) and produce the output (c). In this example, we invert the mask from the one in the first row to the one in the second row, and generate harmonization results that account for different context and semantic information.
  • 43. Pipelineimage Styling #3b Sky is not the limit: semantic-aware sky replacement YH Tsai, X Shen, Z Lin, K Sunkavalli; Ming-Hsuan Yang ACM Transactions on Graphics (TOG) - Volume 35 Issue 4, July 2016 https://doi.org/10.1145/2897824.2925942 In order to find proper skies for replacement, we propose a data-driven sky search scheme based on semantic layout of the input image. Finally, to re-compose the stylized sky with the original foreground naturally, an appearance transfer method is developed to match statistics locally and semantically. Sample sky segmentation results. Given an input image, the FCN generates results that localize the sky well but contain inaccurate boundaries and noisy segments. The proposed online model refines segmentations that are complete and accurate, especially around the boundaries (best viewedin color with enlarged images). Overview of the proposed algorithm. Given an input image, we first utilize the FCN to obtain scene parsing results and semantic response for each category. A coarse-to-fine strategy is adopted to segment sky regions (illustrated as the red mask). To find reference images for sky replacement, we develop a method to search images with similar semantic layout. After re-composing images with the found skies, we transfer visual semantics to match foreground statistics between the input image and the reference image. Finally, a set of composite images with different stylized skies are generated automatically. GP-GAN: Towards Realistic High-Resolution Image Blending Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang (Submitted on 21 Mar 2017 (v1), last revised 25 Mar 2017 (this version, v2)) https://arxiv.org/abs/1703.07195 Qualitative illustration of high-resolution image blending. a) shows the composited copy-and-paste image where the inserted object is circled out by red lines. Users usually expect image blending algorithms to make this image more natural. b) represents the result based on Modified Poisson image editing [32]. c) indicates the result from Multi-splines approach. d) is the result of our method Gaussian-Poisson GAN (GP-GAN). Our approach produces better quality images than that from the alternatives in terms of illumination, spatial, and color consistencies. We advanced the state-of-the-art in conditional image generation by combining the ideas from the generative model GAN, Laplacian Pyramid, and Gauss-Poisson Equation. This combination is the first time a generative model could produce realistic images in arbitrary resolution. In spite of the effectiveness, our algorithm fails to generate realistic images when the composited images are far away from the distribution of the training dataset. We aim to address this issue in future work. Improving photocompositing and relighting of RGB textures
  • 44. Pipelineimage Styling #3c Live User-Guided Intrinsic Video for Static Scenes Abhimitra Meka ; Gereon Fox ; Michael Zollhofer ; Christian Richardt ; Christian Theobalt IEEE Transactions on Visualization and Computer Graphics ( Volume: PP, Issue: 99 ) https://doi.org/10.1109/TVCG.2017.2734425 Improving photocompositing and relighting of RGB textures User constraints, in the form of constant shading and reflectance strokes, can be placed directly on the real-world geometry using an intuitive touch- based interaction metaphor, or using interactive mouse strokes. Fusing the decomposition results and constraints in three-dimensional space allows for robust propagation of this information to novel views by re-projection. We propose a novel approach for live, user-guided intrinsic video decomposition. We first obtain a dense volumetric reconstruction of the scene using a commodity RGB-D sensor. The reconstruction is leveraged to store reflectance estimates and user-provided constraints in 3D space to inform the ill-posed intrinsic video decomposition problem. Our approach runs at real-time frame rates, and we apply it to applications such as relighting, recoloring and material editing. Our novel user-guided intrinsic video approach enables real-time applications such as recoloring, relighting and material editing. Constant reflectance strokes improve the decomposition by moving the high-frequency shading of the cloth to the shading layer. Comparison to state-of-the-art intrinsic video decomposition techniques on the ‘girl’ dataset. Our approach matches the real-time performance of Meka et al. (2016), while achieving the same quality as previous off-line techniques
  • 45. Pipelineimage Styling #4 Beyond low-level style transfer for high-level manipulation Generative Semantic Manipulation with Contrasting GAN Xiaodan Liang, Hao Zhang, Eric P. Xing (Submitted on 1 Aug 2017) https://arxiv.org/abs/1708.00315 Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo sketch and artist painting style transfer. However, existing models→ can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high- level semantic meanings (e.g., geometric structure or content) of objects. Some example semantic manipulation results by our model, which takes one image and a desired object category (e.g. cat, dog) as inputs and then learns to automatically change the object semantics by modifying their appearance or geometric structure. We show the original image (left) and manipulated result (right) in each pair. Although our method can achieve compelling results in many semantic manipulation tasks, it shows little success for some cases which require very large geometric changes, such as car truck and↔ car bus. Integrating spatial transformation layers for explicitly learning pixel-wise offsets may help↔ resolve very large geometric changes. To be more general, our model can be extended to replace the mask annotations with the predicted object masks or automatically learned attentive regions via attention modeling. This paper pushes forward the research of unsupervised setting by demonstrating the possibility of manipulating high-level object semantics rather than the low-level color and texture changes as previous works did. In addition, it would be more interesting to develop techniques that are able to manipulate object interactions and activities in images/videos for the future work.
  • 46. Pipelineimage Styling #5A Aesthetics enhancement: Style Transfer | Introduction #1 Neural Style Transfer: A Review Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song (Submitted on 11 May 2017) https://arxiv.org/abs/1705.04058 A list of mentioned papers in this review, corresponding codes and pre-trained models are publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers One of the reasons why Neural Style Transfer catches eyes in both academia and industry is its popularity in some social networking sites (e.g., Twitter and Facebook). A mobile application Prisma [36] is one of the first industrial applications that provides the Neural Style Transfer algorithm as a service. Before Prisma, the general public almost never imagines that one day they are able to turn their photos into art paintings in only a few minutes. Due to its high quality, Prisma achieved great success and is becoming popular around the world. Another use of Neural Style Transfer is to act as user-assisted creation tools. Although, to the best of our knowledge, there are no popular applications that applied the Neural Style Transfer technique in creation tools, we believe that it will be a promising potential usage in the future. Neural Style Transfer is capable of acting as a creation tool for painters and designers. Neural Style Transfer makes it more convenient for a painter to create an artifact of a specific style, especially when creating computer-made fine art images. Moreover, with Neural Style Transfer algorithms it is trivial to produce stylized fashion elements for fashion designers and stylized CAD drawings for architects in a variety of styles, which is costly to produce them by hand.
  • 47. Pipelineimage Styling #5b Aesthetics enhancement: Style Transfer | Introduction #2 Neural Style Transfer: A Review Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Mingli Song (Submitted on 11 May 2017) https://arxiv.org/abs/1705.04058 A list of mentioned papers in this review, corresponding codes and pre-trained models are publicly available at: https://github.com/ ycjing/Neural-Style-Transfer-Papers Promising directions for future research in Neural Style Transfer mainly focus on two aspects. The first aspect is to solve the existing aforementioned challenges for current algorithms, i.e., problem of parameter tuning, problem of stroke orientation control and problem existing in “Fast” and “Faster” Neural Style Transfer algorithms. The second aspect of promising directions is to focus on new extensions to Neural Style Transfer (e.g., Fashion Style Transfer and Character Style Transfer). There are already some preliminary work related with this direction, such as the recent work of Yang et al. (2016) on Text Effects Transfer. These interesting extensions may become trending topics in the future and related new areas may be created subsequently.
  • 48. Pipelineimage Styling #5C Aesthetics enhancement: Video Style Transfer DeepMovie: Using Optical Flow and Deep Neural Networks to Stylize Movies Alexander G. Anderson, Cory P. Berg, Daniel P. Mossing, Bruno A. Olshausen (Submitted on 26 May 2016) https://arxiv.org/abs/1605.08153 https://github.com/anishathalye/neural-style Coherent Online Video Style Transfer Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua (Submitted on 27 Mar 2017 (v1), last revised 28 Mar 2017 (this version, v2)) https://arxiv.org/abs/1703.09211 The main contribution of this paper is to use optical flow to initialize the style transfer optimization so that the texture features move with the objects in the video. Finally, we suggest a method to incorporate optical flow explicitly into the cost function. Overview of Our Approach: We begin by applying the style transfer algorithm to the first frame of the movie using the content image as the initialization. Next, we calculate the optical flow field that takes the first frame of the movie to the second frame. We apply this flow-field to the rendered version of the first frame and use that as the initialization for the style transfer optimization for the next frame. Note, for instance, that a blue pixel in the flow field image means that the underlying object in the video at that pixel moved to the left from frame one to frame two. Intuitively, in order to apply the flow field to the styled image, you move the parts of the image that have a blue pixel in the flow field to the left. We propose the first end-to-end network for online video style transfer, which generates temporally coherent stylized video sequences in near real-time. Two key ideas include an efficient network by incorporating short- term coherence, and propagating short-term coherence to long-term, which ensures the consistency over larger period of time. Our network can incorporate different image stylization networks. We show that the proposed method clearly outperforms the per-frame baseline both qualitatively and quantitatively. Moreover, it can achieve visually comparable coherence to optimization-based video style transfer, but is three orders of magnitudes faster in runtime. There are still some limitations in our method. For instance, limited by the accuracy of ground-truth optical flow (given by DeepFlow2 [Weinzaepfel et al. 2013] ), our results may suffer from some incoherence where the motion is too large for the flow to track. And after propagation over a long period, small flow errors may accumulate, causing blurriness. These open questions are interesting for further exploration in the future work.
  • 49. Pipelineimage Styling #6A Aesthetics enhancement: Texture synthesis and upsampling TextureGAN: Controlling Deep Image Synthesis with Texture Patches Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays (Submitted on 9 Jun 2017) https://arxiv.org/abs/1706.02823 TextureGAN pipeline. A feed-forward generative network is trained end-to-end to directly transform a 4- channel input to a high- res photo with realistic textural details. Photo-realistic Facial Texture Transfer Parneet Kaur, Hang Zhang, Kristin J. Dana (Submitted on 14 Jun 2017) https://arxiv.org/abs/1706.04306 Overview of our method. Facial identity is preserved using Facial Semantic Regularization which regularizes the update of meso-structures using a facial prior and facial semantic structural loss. Texture loss regularizes the update of local textures from the style image. The output image is initialized with the content image and updated at each iteration by back-propagating the error gradients for the combined losses. Content/style photos: Martin Scheoller/Art+Commerce. Identity-preserving Facial Texture Transfer (FaceTex). The textural details are transferred from style image to content image while preserving its identity. FaceTex outperforms existing methods perceptually as well as quantitatively. Column 3 uses input 1 as the style image and input 2 as the content. Column 4 uses input 1 as the content image and input 2 as the style image. Figure 3 shows more examples and comparison with existing methods. Input photos: Martin Scheoller/Art+Commerce.
  • 50. Pipelineimage Styling #6B Aesthetics enhancement: Texture synthesis with style transfer Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses Eric Risser, Pierre Wilmot, Connelly Barnes Artomatix, University of Virginia (Submitted on 31 Jan 2017 (v1), last revised 1 Feb 2017 (this version, v2)) https://arxiv.org/abs/1701.08893 Our style transfer and texture synthesis results. The input styles are shown in (a), and style transfer results are in (b, c). Note that the angular shapes of the Picasso painting are successfully transferred on the top row, and that the more subtle brush strokes are transferred on the bottom row. The original content images are inset in the upper right corner. Unless otherwise noted, our algorithm is always run with default parameters (we do not manually tune parameters). Input textures are shown in (d) and texture synthesis results are in (e). For the texture synthesis, note that the algorithm synthesizes creative new patterns and connectivities in the output. Different statistics that can be used for neural network texture synthesis.
  • 51. Pipelineimage Styling #6C Aesthetics enhancement: Enhancing texture maps Depth Texture Synthesis for Realistic Architectural Modeling Félix Labrie-Larrivée ; Denis Laurendeau ; Jean-François Lalonde Computer and Robot Vision (CRV), 2016 13th Conference on https://doi.org/10.1109/CRV.2016.77 In this paper, we present a novel approach that improves the resolution and geometry of 3D meshes of large scenes with such repeating elements. By leveraging structure from motion reconstruction and an off-the-shelf depth sensor, our approach captures a small sample of the scene in high resolution and automatically extends that information to similar regions of the scene. Using RGB and SfM depth information as a guide and simple geometric primitives as canvas, our approach extends the high resolution mesh by exploiting powerful, image-based texture synthesis approaches. The final results improves on standard SfM reconstruction with higher detail. Our approach benefits from reduced manual labor as opposed to full RGBD reconstruction, and can be done much more cheaply than with LiDAR- based solutions. In the future, we plan to work on a more generalized 3D texture synthesis procedure capable of synthesizing a more varied set of objects, and able to reconstruct multiple parts of the scene by exploiting several high resolution scan samples at once in an effort to address the tradeoff mentioned above. We also plan to improve the robustness of the approach to a more varied set of large scale scenes, irrespective of the lighting conditions, material colors, and geometric configurations. Finally, we plan to evaluate how our approach compares to SfM on a more quantitative level by leveraging LiDAR data as ground truth. Overview of the data collection and alignment procedure. Top row: a collection of photos of the scene is acquired with a typical camera, and used to generate a point cloud via SfM [Agarwal et al. 2009] and dense multi-view stereo (MVS) [ Furukawa and Ponce, 2012]. Bottom row: a repeating feature of the scene (in this example, the left-most window) is recorded with a Kinect sensor, and reconstructed into a high resolution mesh via the RGB-D SLAM technique KinectFusion [ Newcombe et al. 2011]. The mesh is then automatically aligned to the SfM reconstruction using bundle adjustment and our automatic scale adaptation technique (see sec. III-C). Right: the high resolution Kinect mesh is correctly aligned to the low resolution SfM point cloud
  • 52. Pipelineimage Styling #6D Aesthetics enhancement: Towards photorealism with good maps One Ph.D. position (supervision by Profs Niessner and Rüdiger Westermann) is available at our chair in the area of photorealistic rendering for deep learning and online reconstruction Research in this project includes the development of photorealistic realtime rendering algorithms that can be used in deep learning applications for scene understanding, and for high-quality scalable rendering of point scans from depth sensors and RGB stereo image reconstruction. If you are interested in applying, you should have a strong background in computer science, i.e., efficient algorithms and data structures, and GPU programming, have experience implementing C/C++ algorithms, and you should be excited to work on state-of-the-art research in the 3D computer graphics. https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le arning-and-online-reconstruction.html Ph.D. Position – Photorealistic Rendering for Deep Learning and Online Reconstruction Photorealism Explained Blender Guru Published on May 25, 2016 http://www.blenderguru.com/tutorials/photorealism-explained/ https://youtu.be/R1-Ef54uTeU Stop wasting time creating texture maps by hand. All materials on Poliigon come with the relevant normal, displacement, reflection and gloss maps included. Just plug them into your software, and your material is ready to render. https://www.poliigon.com/ How to Make Photorealistic PBR Materials - Part 1 Blender Guru Published on Jun 28, 2016 http://www.blenderguru.com/tutoria ls/pbr-shader-tutorial-pt1/ https://youtu.be/V3wghb Z-Vh4?t=24m5s Physically Based Rendering (PBR)
  • 53. Pipelineimage Styling #7 Styling line graphics (e.g. floorplans, 2D CADs) and monochrome images e.g. for desired visual identity Real-Time User-Guided Image Colorization with Learned Deep Priors Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros (Submitted on 8 May 2017) https://arxiv.org/abs/1705.02999 Our proposed method colorizes a grayscale image (left), guided by sparse user inputs (second), in real-time, providing the capability for quickly generating multiple plausible colorizations (middle to right). Photograph of Migrant Mother by Dorothea Lange, 1936 (Public Domain). Network architecture We train two variants of the user interaction colorization network. Both variants use the blue layers for predicting a colorization. The Local Hints Network also uses red layers to (a) incorporate user points Ul and (b) predict a color distribution bZ. The Global Hints Network uses the green layers, which transforms global input Uд by 1 × 1 conv layers, and adds the result into the main colorization network. Each box represents a conv layer, with vertical dimension indicating feature map spatial resolution, and horizontal dimension indicating number of channels. Changes in resolution are achieved through subsampling and upsampling operations. In the main network, when resolution is decreased, the number of feature channels are doubled. Shortcut connections are added to upsampling convolution layers. Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN Lvmin Zhang, Yi Ji, Xin Lin (Submitted on 11 Jun 2017 (v1), last revised 13 Jun 2017 (this version, v2)) https://arxiv.org/abs/1706.03319 Examples of combination results on sketch images (top-left) and style images (bottom-left). Our approach automatically applies the semantic features of an existing painting to an unfinished sketch. Our network has learned to classify the hair, eyes, skin and clothes, and has the ability to paint these features according to a sketch. In this paper, we integrated residual U-net to apply the style to the grayscale sketch with auxiliary classifier generative adversarial network (AC-GAN, Odena et al. 2016). The whole process is automatic and fast, and the results are creditable in the quality of art style as well as colorization Limitation: the pretrained VGG is for ImageNet photograph classification, but not for paintings. In the future, we will train a classification network only for paintings to achieve better results. Furthermore, due to the large quantity of layers in our residual network, the batch size during training is limited to no more than 4. It remains for future study to reach a balance between the batch size and quantity of layers. +
  • 54. Future Image restoration Depth Images (Kinect, etc.)
  • 55. PipelineDepth image enhancement #1a Image Formation #1 Pinhole Camera Model: ideal projection of a 3D object on a 2D image. Fernandez et al. (2017) Dot patterns of a Kinect for Windows (a) and two Kinects for Xbox (b) and (c) are projected on a flat wall from a distance of 1000 mm. Note that the projection of each pattern is similar, and related by a 3-D rotation depending on the orientation of the Kinect diffuser installation. The installation variability can clearly be observed from differences in the bright dot locations (yellow stars), which differ by an average distance of 10 pixels. Also displayed in (d) is the idealized binary replication of the Kinect dot pattern [Kinect Patter Uncovered] , which was used in this project to simulate IR images. - Landau et al. (2016) Landau et al. (2016)
  • 56. PipelineDepth image enhancement #1b Image Formation #2 Characterizations of Noise in Kinect Depth Images: A Review Tanwi Mallick ; Partha Pratim Das ; Arun Kumar Majumdar IEEE Sensors Journal ( Volume: 14, Issue: 6, June 2014 ) https://doi.org/10.1109/JSEN.2014.2309987 Kinect outputs for a scene. (a) RGB Image. (b) Depth data rendered as an 8- bit gray-scale image with nearer depth values mapped to lower intensities. Invalid depth values are set to 0. Note the fixed band of invalid (black) pixels on left. (c) Depth image showing too near depths in blue, too far depths in red and unknown depths due to highly specular objects in green. Often these are all taken as invalid zero depth. Shadow is created in a depth image (Yu et al. 2013) when the incident IR from the emitter gets obstructed by an object and no depth can be estimated. PROPERTIES OF IR LIGHT [Rose]
  • 57. Pipeline Depth image enhancement #1c Image Formation #3 Authors’ experiments on structural noise using a plane in 400 frames. (a) Error at 1.2m. (b) Error at 1.6m. (c) Error at 1.8m. Smisek et al. (2013) calibrate a Kinect against a stereo-rig (comprising two Nikon D60 DSLR cameras) to estimate and improve its overall accuracy. They have taken images and fitted planar objects at 18 different distances (from 0.7 to 1.3 meters) to estimate the error between the depths measured by the two sensors. The experiments corroborate that the accuracy varies inversely with the square of depth [2]. However, even after the calibration of Kinect, the procedure still exhibits relatively complex residual errors (Fig. 8). Fig. 8. Residual noise of a plane. (a) Plane at 86cm. (b) Plane at 104cm. Authors’ experiments on temporal noise. Entropy and SD of each pixel in a depth frame over 400 frames for a stationary wall at 1.6m. (a) Entropy image. (b) SD image. Authors’ experiments with vibrating noise showing ZD samples as white dots. A pixel is taken as noise if it is zero in frame i and nonzero in frames i±1. Note that noise follows depth edges and shadow. (a) Frame (i−1). (b) Frame i. (c) Frame (i+1). (d) Noise for frame i.
  • 58. PipelineDepth image enhancement #1d Image Formation #4 The filtered intensity samples generated from unsaturated IR dots (blue dots) were used to fit the intensity model (red line), which follows an inverse square model for the distance between the sensor and the surface point Landau et al. (2016) (a) Multiplicative speckle distribution is unitless, and can be represented as a gamma distribution Γ (4.54, 0.196). (b) Additive detector noise distribution can be represented as a normal distribution Ν (−0.126, 10.4), and has units of 10-bit intensity. Landau et al. (2016) The standard error in depth estimation (mm) as a function of radial distance (pix) is plotted for the (a) experimental and (b) simulated data sets of flat walls at various depths (mm). The experimental standard depth error increases faster with an increase in radial distance due to lens distortion. Landau et al. (2016)
  • 59. PipelineDepth image enhancement #2A Metrological Calibration #1 A New Calibration Method for Commercial RGB-D Sensors Walid Darwish, Shenjun Tang, Wenbin Li and Wu Chen Sensors 2017, 17(6), 1204; doi:10.3390/s17061204 Based on these calibration algorithms, different calibration methods have been implemented and tested. Methods include the use of 1D [Liu et al. 2012] 2D [Shibo and Qing 2012] , and 3D [Gui et al. 2014] calibration objects that work with the depth images directly; calibration of the manufacture parameters of the IR camera and projector [Herrera et al. 2012] ; or photogrammetric bundle adjustments used to model the systematic errors of the IR sensors [ Davoodianidaliki and Saadatseresht 2013; Chow and Lichti 2013] . To enhance the depth precision, additional depth error models are added to the calibration procedure [7,8,21,22,23]. All of these error models are used to compensate only for the distortion effect of the IR projector and camera. Other research works have been conducted to obtain the relative calibration between an RGB camera and an IR camera by accessing the IR camera [24,25,26]. This can achieve relatively high accuracy calibration parameters for a baseline between IR and RGB cameras, while the remaining limitation is that the distortion parameters for the IR camera cannot represent the full distortion effect for the depth sensor. This study addressed these issues using a two-step calibration procedure to calibrate all of the geometric parameters of RGB-D sensors. The first step was related to the joint calibration between the RGB and IR cameras, which was achieved by adopting the procedure discussed in [27] to compute the external baseline between the cameras and the distortion parameters of the RGB camera. The second step focused on the depth sensor calibration. Point cloud of two perpendicular planes (blue color: default depth; red color: modeled depth): highlighted black dashed circles shows the significant impact of the calibration method on the point cloud quality. The main difference between both sensors is the baseline between the IR camera and projector. The longer the sensor’s baseline, the longer working distance can be achieved. The working range of Kinect v1 is 0.80 m to 4.0 m, while it is 0.35 m to 3.5 m for Structure Sensor.
  • 60. PipelineDepth image enhancement #2A Metrological Calibration #2 Photogrammetric Bundle Adjustment With Self- Calibration of the PrimeSense 3D Camera Technology: Microsoft Kinect IEEE Access ( Volume: 1 ) 2013 https://doi.org/10.1109/ACCESS.2013.2271860 Roughness of point cloud before calibration. (Bottom) Roughness of point cloud after calibration. The colours indicate the roughness as measured by the normalized smallest eigenvalue. Estimated Standard Deviation of the Observation Residuals To quantify the external accuracy of the Kinect and the benefit of the proposed calibration, a target board located at 1.5–1.8 m away with 20 signalized targets was imaged using an in- house program based on the Microsoft Kinect SDK and with RGBDemo. Spatial distances between the targets were known from surveying using the FARO Focus3D terrestrial laser scanner with a standard deviation of 0.7 mm. By comparing the 10 independent spatial distances measured by the Kinect to those made by the Focus3D, the RMSE was 7.8 mm using RGBDemo and 3.7 mm using the calibrated Kinect results; showing a 53% improvement to the accuracy. This accuracy check assesses the quality of all the imaging sensors and not just the IR camera-projector pair alone. The results show improvements in geometric accuracy up to 53% compared with uncalibrated point clouds captured using the popular software RGBDemo. Systematic depth discontinuities were also reduced and in the check-plane analysis the noise of the Kinect point cloud was reduced by 17%.
  • 61. PipelineDepth image enhancement #2B Metrological Calibration #3 Evaluating and Improving the Depth Accuracy of Kinect for Windows v2 Lin Yang ; Longyu Zhang ; Haiwei Dong ; Abdulhameed Alelaiwi ; Abdulmotaleb El Saddik IEEE Sensors Journal (Volume: 15, Issue: 8, Aug. 2015) https://doi.org/10.1109/JSEN.2015.2416651 Illustration of accuracy assessment of Kinect v2. (a) Depth accuracy. (b) Depth resolution. (c) Depth entropy. (d) Edge noise. (e) Structural noise. The target plates in (a- c) and (d-e) are parallel and perpendicular with the depth axis, respectively. Accuracy error distribution of Kinect for Windows v2.
  • 62. PipelineDepth image enhancement #2c A Comparative Error Analysis of Current Time-of-Flight Sensors IEEE Transactions on Computational Imaging (Volume: 2, Issue: 1, March 2016) https://doi.org/10.1109/TCI.2015.2510506 For evaluating the presence of wiggling, ground truth distance information is required. We calculate the true distance by setting up a stereo camera system. This system consists of the ToF camera to be evaluated and a high resolution monochrome camera (IDS UI-1241LE7) which we call the reference camera. The cameras are calibrated with Zhang (2000)’s algorithm with point correspondences computed with ROCHADE ( Placht et al. 2014). Ground truth is calculated by intersecting the rays of all ToF camera pixels with the 3D plane of the checkerboard. For higher accuracy, we compute this plane from corners detected in the reference image and transform the plane into the coordinate system of the ToF camera This experiment aims to quantify the so- called amplitude-related distance error and also to show that this effect is not related to scattering. This effect can be observed when looking at a planar surface with high reflectivity variations. With some sensors the distance measurements for pixels with different amplitudes do not lie on the same plane, even though they should. To the best of our knowledge no evaluation setup has been presented for this error source so far. In the past this error has been typically observed with images of checkerboards or other high contrast patterns. However, the analysis of single images allows no differentiation between amplitude-related errors andinternal scattering Metrological Calibration #4
  • 63. PipelineDepth image enhancement #2c Metrological Calibration #5 Low-Cost Reflectance-Based Method for the Radiometric Calibration of Kinect 2 IEEE Sensors Journal ( Volume: 16, Issue: 7, April1, 2016 ) https://doi.org/10.1109/JSEN.2015.2508802 In this paper, a reflectance-based radiometric method for the second generation of gaming sensors, Kinect 2, is presented and discussed. In particular, a repeatable methodology generalizable to different gaming sensors by means of a calibrated reference panel with Lambertian behavior is developed. The relationship between the received power and the final digital level is obtained by means of a combination of linear sensor relationship and signal attenuation, into a least squares adjustment with an outlier detector. The results confirm that the quality of the method (standard deviation better than 2% in laboratory conditions and discrepancies lower than 7% b) is valid for exploiting the radiometric possibilities of this low-cost sensor, which ranges from the pathological analysis (moisture, crusts, etc.…); to agricultural and forest resource evaluation. 3D data acquired with Kinect 2 (left) and digital number (DN) distribution (right) for the reference panel at 0.7 m (units: counts). Visible-RGB view of the brick wall (a), intensity-IR digital levels (DN) (b-d) and calibrated reflectance values (e-g) for the three acquisition distances The objective of this paper was to develop a radiometric calibration equation of an IR projector- camera for the second generation of gaming sensors, Kinect 2, to convert the recorded digital levels into physical values (reflectance). By the proposed equation, the reflectance properties of the IR projector-camera set of Kinect 2 were obtained. This new equation will increase the number of application fields of gaming sensors, favored by the possibility of working outdoors. The process of radiometric calibration should be incorporated as part of an integral process where the geometry obtained is also corrected (i.e., lens distortion, mapping function, depth errors, etc.). As future perspectives, the effects of the diffuse radiance, which does not belong to the sensor footprint and contaminate the received signal, will be evaluated to determine the error budget in the active sensor.
  • 64. PipelineDepth image enhancement #3 ‘Old-school’ depth refining techniques Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filtering Liang Zhang ; Peiyi Shen ; Shu'e Zhang ; Juan Song ; Guangming Zhu Image Processing (ICIP), 2016 IEEE International Conference on https://doi.org/10.1109/ICIP.2016.7533131 In this paper, a novel depth enhancement algorithm with improved exemplar-based inpainting and joint trilateral guided filtering is proposed. The improved examplar-based inpainting method is applied to fill the holes in the depth images, in which the level set distance component is introduced in the priority evaluation function. Then a joint trilateral guided filter is adopted to denoise and smooth the inpainted results. Experimental results reveal that the proposed algorithm can achieve better enhancement results compared with the existing methods in terms of subjective and objective quality measurements. Robust depth enhancement and optimization based on advanced multilateral filters Ting-An ChangYang-Ting ChouJar-Ferr Yang EURASIP Journal on Advances in Signal Processing December 2017, 2017:51 https://doi.org/10.1186/s13634-017-0487-7 Results of the depth enhancement coupled with hole filling results obtainedby a noisy depth map, b joint bilateral filter (JBF) [16 ], c intensity guided depth superresolution (IGDS) [39], d compressive sensing based depth upsampling (CSDU) [40], e adaptive joint trilateral filter (AJTF) [18], and f the proposed AMF for Art, Books, Doily, Moebius, RGBD_1, and RGBD_2
  • 65. PipelineDepth image enhancement #4A Deep learning-based depth refining techniques DepthComp : real-time depth image completion based on prior semantic scene segmentation Atapour-Abarghouei, A. and Breckon, T.P. 28th British Machine Vision Conference (BMVC) 2017 London, UK, 4-7 September 2017. http://dro.dur.ac.uk/22375/ Exemplar results on the KITTI dataset. S denotes the segmented images [3] and D the original (unfilled) disparity maps. Results are compared with [1, 2, 29, 35, 45]. Results of cubic and linear interpolations are omitted due to space. Comparison of the proposed method using different initial segmentation techniques on the KITTI dataset [27]. Original color and disparity image (top-left), results with manual labels (top-right), results with SegNet [3] (bottom-left) and results with mean-shift [26] (bottom-right). Fast depth image denoising and enhancement using a deep convolutional network Xin Zhang and Ruiyuan Wu Acoustics, Speech and Signal Processing (ICASSP), 2016 IEE https://doi.org/10.1109/ICASSP.2016.7472127
  • 66. PipelineDepth image enhancement #4b Deep learning-based depth refining techniques Guided deep network for depth map super-resolution: How much can color help? Wentian Zhou ; Xin Li ; Daryl Reynolds Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE https://doi.org/10.1109/ICASSP.2017.7952398 https://anvoy.github.io/publication.html Depth map upsampling using joint edge-guided convolutional neural network for virtual view synthesizing Yan Dong; Chunyu Lin; Yao Zhao; Chao Yao Journal of Electronic Imaging Volume 26, Issue 4 http://dx.doi.org/10.1117/1.JEI.26.4.043004 Depth map upsampling. Input: (a) low-resolution depth map and (b) the corresponding color image. Output: (c) recovered high-resolution depth map. When the depth edges become unreliable, our network tends to rely on color-based prediction network (CBPN) for restoring more accurate depth edges. Therefore, contribution of color image increases when the reliability of the LR depth map decreases (e.g., as noise gets stronger). We adopt the popular deep CNN to learn non-linear mapping between LR and HR depth maps. Furthermore, a novel color-based prediction network is proposed to properly exploit supplementary color information in addition to the depth enhancement network. In our experiments, we have shown that deep neural network based approach is superior to several existing state-of-the-art methods. Further comparisons are reported to confirm our analysis that the contributions of color image vary significantly depending on the reliability of LR depth maps.
  • 67. Future Image restoration Depth Images (Laser scanning)
  • 68. PipelineLaser range Finding #1a Versatile Approach to Probabilistic Modeling of Hokuyo UTM-30LX IEEE Sensors Journal ( Volume: 16, Issue: 6, March15, 2016 ) https://doi.org/10.1109/JSEN.2015.2506403 When working with Laser Range Finding (LRF), it is necessary to know the principle of sensor’s measurement and its properties. There are several measurement principles used in LRFs [Nejad and Olyaee 2006], [ Łabęcki et al. 2012], [Adams 1999] : ● Triangulation ● Time of flight (TOF) ● Frequency modulation continuous wave (FMCW) ● Phase shift measurement (PSM) The geometry of terrestrial laser scanning; identification of errors, modeling and mitigation of scanning geometry Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016) http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5 Distance measurement principle of time-of-flight laser scanners (top) and phase based laser scanners (bottom). Laser Range Finding : Image formation #1
  • 69. PipelineLaser range Finding #1b Laser Range Finding : Image formation #2 The geometry of terrestrial laser scanning; identification of errors, modeling and mitigation of scanning geometry Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016) http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5 Two ways link budget between the receiver (Rx) and the transmitter (Tx) in a Free Space Path (FSP) propagation model. Schematic representation of the signal propagation from the transmitter to the receiver. Effect of increasing incidence angle and range to the signal deterioration. (left) Plot of the the signal deterioration due to increasing incidence angle α, (right) plot of the signal deterioration due to increasing ranges ρ, with ρmin = 0 m and ρmax = 100 m Relationship between scan angle and normal vector orientation used for the segmentation of point cloud with respect to planar features. A point P = [ , , ]θ ϕ ρ is measured on the plane with the normal parameters N = [ , , ]α β γ . Different angles used for the range image gradients are plotted Theoretical number of points. Practical example of a plate of 1×1 m placed at 3 m, oriented at 0º and being rotated at 60º. Theoretical number of points. (left) Number of points with respect to the orientation of the patch and the distance. Reference plate measurement set-up. A white coated plywood board is mounted on a tripod via a screw clamp mechanism provided with a 2º precision goniometer.
  • 70. PipelineLaser range Finding #1c Laser Range Finding : Image formation #3 The geometry of terrestrial laser scanning; identification of errors, modeling and mitigation of scanning geometry Soudarissanane, S.S.. TU Delft. Doctoral Thesis (2016) http://doi.org/10.4233/uuid:b7ae0bd3-23b8-4a8a-9b7d-5e494ebb54e5 Terrestrial Laser Scanning (TLS) good practice of survey planning Future directions At the time this research started, terrestrial laser scanners were mainly being used by research institutes and manufacturers. However, nowadays, terrestrial laser scanners are present in almost every field of work, e.g. forensics, architecture, civil engineering, gaming industry, movie industry. Mobile mapping systems, such as scanners capturing a scene while driving a car, or scanners mounted on drones are currently making use of the same range determination techniques used in terrestrial laser scanners. The number of applications that make use of 3D point clouds is rapidly growing. The need for a sound quality product is even more significant as it impacts the quality of a huge panel of end-products.
  • 71. PipelineLaser range Finding #1D Laser Range Finding : Image formation #4 Ray-Tracing Method for Deriving Terrestrial Laser Scanner Systematic Errors Derek D. Lichti, Ph.D., P.Eng. Journal of Surveying Engineering | Volume 143 Issue 2 - May 2017 https://www.doi.org/10.1061/(ASCE)SU.1943-5428.0000213 Error model of direct georeferencing procedure of terrestrial laser scanning Pandžić, Jelena; Pejić, Marko; Božić, Branko; Erić, Verica Automation in Construction Volume 78, June 2017, Pages 13-23 https://doi.org/10.1016/j.autcon.2017.01.003
  • 72. PipelineLaser range Finding #2A Calibration #1 Statistical Calibration Algorithms for Lidars Anas Alhashimi, Luleå University of Technology, Control Engineering Licentiate thesis (2016), ORCID iD: 0000-0001-6868-2210 A rigorous cylinder-based self-calibration approach for terrestrial laser scanners Ting On Chan; Derek D. Licht; David Belton ISPRS Journal of Photogrammetry and Remote Sensing; Volume 99, January 2015 https://doi.org/10.1016/j.isprsjprs.2014.11.003 The proposed method and its variants were first applied to two simulated datasets, to compare their effectiveness, and then to three real datasets captured by three different types of scanners are presented: a Faro Focus 3D (a phase-based panoramic scanner); a Velodyne HDL-32E (a pulse-based multi spinning beam scanner); and a Leica ScanStation C10 (a dual operating-mode scanner). In situ self-calibration is essential for terrestrial laser scanners (TLSs) to maintain high accuracy for many applications such as structural deformation monitoring (Lindenbergh, 2010) . This is particularly true for aged TLSs and instruments being operated for long hours outdoors with varying environmental conditions. Although the plane-based methods are now widely adopted for TLS calibration, they also suffer from the problem of high parameter correlation when there is a low diversity in the plane orientations (Chow et al., 2013). In practice, not all locations possess large and smooth planar features that can be used to perform a calibration. Even though planar features are available, their planarity is not always guaranteed. Because of the drawbacks to the point- based and plane-based calibrations, an alternative geometric feature, namely circular cylindrical features (e.g. Rabbani et al., 2007), should be considered and incorporated in to the self-calibration procedure. Estimate d without being aware of the mode hopping, i.e., assuming a certain λ0 without actually knowing that the average λ jumps between different lasing modes, reflects thus in a multimodal measurement of d Potential temperature-bias dependencies for the polynomial model. The plot explaining the cavity modes, gain profile and lasing modes for typical laser diode. The upper drawing shows the wavelength v1 as the dominant lasing mode while the lower drawing shows how both wavelengths v1 and v2 are competing; this latter case is responsible for the mode-hopping effects.
  • 73. PipelineLaser range Finding #2b Calibration #2 Calibration of a multi-beam Laser System by using a TLS-generated Reference Gordon, M.; Meidow, J. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013, pp.85-90 http://dx.doi.org/10.5194/isprsannals-II-5-W2-85-2013 Extrinsic calibration of a multi-beam LiDAR system with improved intrinsic laser parameters using v-shaped planes and infrared images Po-Sen Huang ; Wen-Bin Hong ; Hsiang-Jen Chien ; Chia-Yen Chen IVMSP Workshop, 2013 IEEE 11th https://doi.org/10.1109/IVMSPW.2013.6611921 Velodyne HDL-64E S2, the LiDAR system studied in this work , for example, is a mobile scanner consisting of 64 pairs of laser emitter-receiver which are rigidly attached to a rotating motor and provides real-time panoramic range data with measurement errors of around 2.5 mm. In this paper we propose a method to use IR images as feedbacks in finding optimized intrinsic and extrinsic parameters of the LiDAR-vision scanner. First, we apply the IR-based calibration technique to a LiDAR system that fires multiple beams, which significantly increases the problem's complexity and difficulty. Second, the adjustment of parameters is applied to not only the extrinsic parameters, but also the laser parameters as well as the intrinsic parameters of the camera. Third, we use two different objective functions to avoid generalization failure of the optimized parameters. It is assumed that the accuracy of this point clouds is considerably higher than that from the multi- beam LIDAR and that the data represent faces of man-made objects at different distances. We inspect the Velodyne HDL-64E S2 system as the best-known representative for this kind of sensor system, while Z+F Imager 5010 serve as reference data. Beside the improvement of the point accuracy by considering the calibration results, we test the significance of the parameters related to the sensor model and consider the uncertainty of measurements w.r.t. the measured distances. Standard deviation of the planar misclosure is nearly halved from 3.2 cm to 1.7 cm. The variance component estimation as well as the standard deviation of the range residuals reveal that the manufactures standard deviation of the distance accuracy with 2 cm is a bit too optimistic. The histograms of the planar misclosures and the residuals reveal that this quantities are not normal distributed. Our investigation of the distance depending misclosure variance change is one reason. Other sources were investigated by Glennie and Lichti (2010): the incidence angle and the vertical angle. A further possibility is the focal distance, which is different for each laser and the average is at 8 m for the lower block and at 15 m for the upper block. This may introduce a distance depending—but nonlinear— variance change. Further research is needed to find the sources of these observations.