Emerging 3D Scanning Technologies for PropTech
Falling costs with rising quality via hardware innovations and deep learning
Outlineofthepresentation
StructurefromMotion(SfM) Low-cost passive sensing
360°imaging Omnidirectional immersiveimagesandvideos
Rangesensing Structuredlight, Matterport,Kinectforexample
Laserscanning LiDARs fromVelodyne for example
Data-drivenprocessing DeepLearning
3DDatasets Withwhat totrain yourdeeplearningpipelines
FutureProspects Short overview of future applications
Thepresentationismeant asatechnical introductionfor typical hardware andsoftware
processingtechniquesusedinreal estateandconstruction site scanning.
Computerscientistsnew to proptechorganizations andreal estate fieldin generalmight
especiallyfindthispresentation useful.One assumesthat thereaderisfamiliarwiththe basics
ofdeeplearning.
Datastructuresfor realestatescans
RGB+D Pixel grid presenting colorand depth
Example
from Prof. Li
Mesh(Polygon) from voxel data(“3Dpixels”)
Voxel grid meshing using marching cubes (StackExchange)
PointCloud unordered datatypically (i.e. not on agrid but sparse
PropTechResources for domaininsights
https://www.inman.com/
Inman Hacker Connect is created by and for the real
estate technology community. Debate, discuss and
define the future of real estate’s most pressing tech
issues at Hacker Connect. Join more than 400
engineers, developers, designers, product managers,
database architects, webmasters, and technology
executives from across the real estate space. Build
partnerships, connect with peers, tackle thorny tech
issues, learn best practices discover innovative
breakthroughs and collaborate during special
hands-on keyboard sessions at this day-long, tech-
first event.
WHY YOU SHOULD ATTEND Hear from industry
leaders on APIs, bots, data security, ownership, user
experience, blockchain and more. Take part in
collaborative hands-on-keyboard sessions and
come out with a new tool to apply to your job. Learn
how to better integrate data, workflows and be
competitive in your recruitment efforts
https://www.inman.com/event/hacker-17-sf/ http://www.moderneventures.com/accelerator/
https://gust.com/accelerators/moderne-accelerator
(Pi Labs) is Europe’s first venture capital platform
investing exclusively in early stage ventures in the
property tech vertical. London, United Kingdom.
http://pilabs.co.uk/
http://www.jamesdearsley.co.uk/
“The only PropTech site for the latest Property
Technology news and views”
#PropTech community across Europe. Join us for our next event in #Berlin
http://futureproptech.de/
StructurefromMotion(SfM)
Low-costpassivesensing
StructurefromMotionBasics
Structure-from-Motion (SfM). Instead of a
single stereo pair, the SfM technique requires
multiple, overlapping photographs as input to
feature extraction and 3-D reconstruction
algorithms. - Westoby et al
praehistorische-archaeologie.de - Florian Tubbesing
Structure from Motion can achieve good
accuracy compared to laser scanners.
James and Robson (2012)
Cited by 281 Articles, and see Related articles
This volcanic bomb (~10 cm across) from Soufrière Hills
volcano was scanned by an Arius3d laser scanner (
Stuart Robson, University College London) and also
reconstructed using the SfM-MVS technique, with the
results scaled by sfm_georef. Differences between cross
sections through the two models have RMS values of
~0.3 mm. Point cloud: low res (6 Mb)
http://www.lancaster.ac.uk/staff/jamesm/software/sfm_georef.htm
SfM method basically computes the relative camera
positions between all related photos. After every
relative camera position is found, the scheme uses
these matrices to reconstruct all feature points using
triangulation. Thus there are two main problems:
1) Image registration (e.g. SIFT, SURF, ORB, etc)
2) Pose Estimation (e.g. Perspective-n-Point with RANSAC)
By Dr Calle Olsson
https://www.youtube.com/watch?v=i7ierVkXYa8
StructurefromMotionLiteratureReferences
https://doi.org/10.1016/j.geomorph.2012.08.021
Cited by 631 articles, and see Related articles
https://arxiv.org/abs/1701.08493
Structure-from-Motion’ (SfM) operates under
the same basic tenets as stereoscopic
photogrammetry, namely that 3-D structure
can be resolved from a series of overlapping,
offset images. However, it differs fundamentally
from conventional photogrammetry, in that the
geometry of the scene, camera positions and
orientation is solved automatically without the
need to specify a priori, a network of targets
which have known 3-D positions. Instead, these
are solved simultaneously using a highly
redundant, iterative bundle adjustment
procedure, based on a database of features
automatically extracted from a set of multiple
overlapping images (Snavely et al 2008).
Finally, even though there exist various theoretical works in the literature
that study fundamental problems in SfM and/or provide rigorous analysis of
stability and robustness of specific methods, we believe that the SfM
community would still highly benefit from rigorous results on fundamental
problems (e.g., what is the theoretically maximal amount of mismatched
features or level of noise in the images that can be tolerated for a stable
structure recovery, and can this be achieved efficiently?) and theoretical
analysis of stability, robustness and computational efficiency of existing
or new methods
SLAM Simultaneouslocalizationandmapping
SLAM, Visual Odometry, Structure from Motion, Multiple View Stereo
Yu Huang, Senior Architect, Autonomous Driving@Baidu USA
https://www.slideshare.net/yuhuang/visual-slam-structure-from-motion-multiple-view-stereo
Samsung R&D Institute
Necessary Skills / Attributes:
● 5+ years’ experience delivering computer vision based products using C++ or Python
(Masters or PhD study will be considered).
● Theoretical and practical understanding of multi-view geometry and 3D
reconstruction.
● Experience with machine learning techniques within a computer vision context.
● PhD/MS in Computer Vision, Artificial Intelligence or Machine Learning.
● Expertise with Deep Neural Networks using TensorFlow or Keras.
SLAM stands for Simultaneous Localization and Mapping and one way to understand
it is to imagine yourself entering an unfamiliar building for the first time. As you move about
the building, you don't completely forget where you have already been. Indeed, at any
moment you have a pretty good idea where you are within the current map that you have
so far constructed in your head, and unless you have a really bad sense of direction, you
could probably turn around and get back out of the building without too much trouble.
Finding your way around the building is a good example of simultaneously
constructing a map and localizing yourself within that map.
http://www.pirobot.org/blog/0015/
SLAM Traditionalalgorithm comparison
http://dx.doi.org/10.1186/s41074-017-0027-2
The framework is mainly composed of three modules as follows.
1) Initialization
2) Tracking
3) Mapping
Additional modules for stable and accurate vSLAM
+ Relocalization
+Global map optimization
“ From the technical point of views, there is no definitive difference between SLAM and real-time SfM.”
Even though visual SLAM algorithms have been developed since 2003, vSLAM is
still an active research field. Each algorithm has different characteristics. We need
to choose an appropriate algorithm by considering a purpose of an application.
VisualOdometry
Taketomi et al. (2017):
http://dx.doi.org/10.1186/s41074-017-0027-2
“Odometry is to estimate the sequential changes of
sensor positions over time using sensors such as
wheel encoder to acquire relative sensor movement.
Camera-based odometry called visual odometry
(VO) is also one of the active research fields in the
literature [16, 17].
From the technical point of views, vSLAM and VO
are highly relevant techniques because both
techniques basically estimate sensor positions.
According to the survey papers in robotics [18, 19],
the relationship between vSLAM and VO can be
represented as follows.
vSLAM = VO + global map optimization
The relationship between vSLAM and VO can also
be found from the papers [20, 21] and the papers [22,
23]. In the paper [20, 22], a technique on VO was first
proposed. Then, a technique on vSLAM was
proposed by adding the global optimization in VO [21,
23].”
Towards stable visual odometry & SLAM solutions
for autonomous vehicles
https://www.youtube.com/watch?v=T5Y6OPG-d08
NavStik Hackerspace | Projects at Hackerspace
Visual Odometry using Optic Flow
SoftwareOpen-sourceVisualSFM
VisualSFM:AVisualStructurefromMotion
System Changchang Wu
Cited by 326 articles, and see Related articles
VisualSFM is a GUI application for 3D reconstruction using structure
from motion (SFM). The reconstruction system integrates several of my
previous projects: SIFT on GPU(SiftGPU), Multicore Bundle Adjustment,
and Towards Linear-time Incremental Structure from Motion
. VisualSFM runs fast by exploiting multicore parallelism for feature
detection, feature matching, and bundle adjustment.
Using VisualSFM and Meshlab as an offline alternative
to Autodesk's excellent 123D catch. I walk you through my
workflow for converting multiple images into a 3D model
suitable for use in Blender.
Tutorial for amateur photographers by Jamie Fuller.
https://www.youtube.com/watch?v=V4iBb_j6k_g
OpenSourcePhotogrammetrywithVisualSFM:
Ditching123DCatchJuly12,2013 by Jesse
Indoor Navigation from Multiple Images
By Jaan Tollander de Balsch, 2016, Aalto
https://jaantollander.github.io/SCI-C1000/pr
ototype.html
What is the best method for 3D object
modelling and reconstruction from photos
or videos taken by flying robots or drones?
What is the accuracy of such reconstruction
methods with regards to the vibrations of the
flying drones, quality of camera and resolution?
Is it possible to improve the results by organizing
multiple flights and overlaying/accumulating the
data in the point cloud? Is there any free
software available?
SoftwarePythonPhotogrammetryToolbox(PPT)GUI
Real photo x SfM with texture color x SfM with simple shader. Made
with Python Photogrammetry Toolbox GUI and rendered in Blender
with Cycles.
http://184.106.205.13/arcteam/ppt.php
https://github.com/archeos/ppt-gui/
Converting pictures into a 3D mesh with PPT, MeshLab and Blender
http://arc-team-open-research.blogspot.co.uk/2012/09/converting-pi
ctures-into-3d-mesh-with.html
Blender camera tracking + Python Photogrammetry Toolbox
http://arc-team-open-research.blogspot.co.uk/2012/11/blender-camer
a-tracking-python.html
The video show the skull reconstructed in 3D with Python Photogrammetry Toolkit GUI.
Smilodon, the 3D reconstruction of the saber-toothed cat
http://arc-team-open-research.blogspot.co.uk/2013/03/
Open-sourcelibraries forSfM
OpenSfM is a Structure from Motion
library written in Python on top of
OpenCV. The library serves as a
processing pipeline for reconstructing
camera poses and 3D scenes from
multiple images.
https://github.com/mapillary/OpenSfM
656 stars
OpenSfM
OpenMVG (Multiple View Geometry)
"open Multiple View Geometry" is a
library for computer-vision scientists and
especially targeted to the Multiple View
Geometry community.
https://github.com/openMVG/openMVG
1,1856 stars
OpenMVG
https://doi.org/10.1007/978-3-319-56414-2_5
http://imagine.enpc.fr/~marletr/publi/RRPR-2016
-Moulon-et-al.pdf
Sung and Lin (2017): “VisualSFM uses the pre-
emptive feature matching, the incremental
structure from motion and the re-triangulation
techniques. The incremental feature matching
can greatly speed up the process because
this kind of matching will first sort all feature
points and match only first h feature points for
each photo.”
Sung and Lin (2017): “OpenMVG also
contains incremental structure from
motion technique. Besides that, they
proposed a new iterative sampling
method called a contrario Random
Sample Consensus (AC-RANSAC) as a
substitution to the original RANSAC in
order to acquire higher precision and
better performance. The AC-RANSAC
using the “a contrario” methodology in
order to find a model that best fits the
data with a threshold T that adapts
automatically to the noise. Hence, it is
able to find a model and its associated
noise without a fixed threshold.”
Open-sourcelibraries forSfM+SLAM
OpenChisel
https://github.com/personalrobotics/OpenChisel
An open-source version of the Chisel chunked TSDF
library. It contains two packages:
open_chisel
open_chisel is an implementation of a generic
truncated signed distance field (TSDF) 3D mapping
library; based on the Chisel mapping framework
developed originally for Google's Project Tango. It is
a complete re-write of the original mapping system
(which is proprietary). open_chisel is chunked and
spatially hashed inspired by this work from
Neissner et. al, making it more memory-efficient than
fixed-grid mapping approaches, and more performant
than octree-based approaches. A technical
description of how it works can be found in our
RSS 2015 paper.
http://ri.cmu.edu/pub_files/2015/7/ChiselPaper.pdf
Research-gradeSfM old-school monovideo
http://dx.doi.org/10.1186/s13640-017-0168-3
Inspired by the structure from motion systems, we
propose a system that reconstructs sparse feature
points to a 3D point cloud using a mono video
sequence so as to achieve higher computation
efficiency. The system keeps tracking all detected
feature points and calculates both the amount of these
feature points and their moving distances. We only use
the key frames to estimate the current position of the
camera in order to reduce the computation load and
the noise interference on the system. Furthermore, for
the sake of avoiding duplicate 3D points, the system
reconstructs the 2D point only when the point shifts
out of the boundary of a camera. In our experiments,
we show that our system is able to be implemented on
tablets and can achieve state-of-the-art accuracy with
a denser point cloud with high speed.
Research-gradeSfM DeepLearning -based#1
Research-gradeSfM DeepLearning -based#2
https://arxiv.org/abs/1702.01381, 2 May 2017
We evaluated the performance of our proposal on the DTU dataset comparing it
with two traditional feature based methods, namely SURF (Cited by 8683
articles) and ORB ( Cited by 2739 articles).
The system is trained in an end-to-end manner utilising transfer
learning from a large scale classification dataset. In addition, a
variant of the proposed architecture containing a spatial pyramid
pooling (SPP) layer is evaluated and shown to further improve the
performance.
RegNet is able to correct even large decalibrations such as
depicted in the top image. The inputs for the deep neural
network are an RGB image and a projected depth map. RegNet
is able to establish correspondences between the two
modalities which enables it to estimate a 6 DOF extrinsic
calibration.
Additionally, with an iterative execution of multiple CNNs, that
are trained on different magnitudes of decalibration, our
approach compares favorably to state-of-the-art methods in
terms of a mean calibration error of 0.28º for the rotational and
6 cm for thetranslation components even for large
decalibrations up to 1.5 m and 20º
.
https://arxiv.org/abs/1702.02295
Research-gradePose/Structure DeepLearning -based#1
Essentially the same technology for stereo matching and depth map generation as for SfM
https://arxiv.org/abs/1703.04309 https://arxiv.org/abs/1704.07813
Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our
approach: 1) monocular depth performs
comparably with supervised methods that
use either ground-truth pose or depth for
training, and 2) pose estimation performs
favorably compared to established SLAM
systems under comparable input settings.
Research-gradePose/Structure DeepLearning -based#2
GANs on everything, so here as well :) The usefulness of VisualSFM/ openSFM/ openMVG for defensible startup products?
Inversion is often ambiguous, e.g., many compositions of 3D shape and camera pose give rise to the same 2D projection. To
address this ambiguity, we impose priors on the predicted latent factors, through an adversarial discriminator network
trained to discriminate between predicted factors and ground-truth ones. Training adversarial inversion does not require
input-output paired annotations, but merely a collection of ground-truth factors, unrelated (unpaired) to the current input.
Our model can thus be self-supervised by unlabelled image data, by minimizing a joint reconstruction and adversarial
loss, complementing any direct supervision provided by paired annotations.
Applying adversarial inversion to super-resolution and inpainting results in automated “visual plastic surgery”
Structure-from-motion(SfM) results with and without adversarial priors. The results of the baseline (columns 5th and 8th)
are obtained from a model with depth smooothness prior, trained with early stopping at 40K iterations (before divergence).
SfMonMobileDevices
https://arxiv.org/abs/1611.09498
https://doi.org/10.1109/ICCV.2013.15 | Cited by 141 articles, see Related articles
https://doi.org/10.1016/j.cviu.2016.09.007
After introducing the reconstruction algorithms at the base of our approach, we show how to build
applications able to generate 3D floor plans scaled to their real-world metric dimensions and
capable to manage scene not necessary limited by Manhattan World assumptions. Then, exploiting
the resulting structural and visual model, we propose a client-server interactive exploration system
implementing a low-DOF navigation interface, specifically developed for touch interaction on
smartphones and tablets.
https://doi.org/10.1145/2999508.2999526
SfMonMobileDevices CaseDacuda
Magic Leap, the augmented reality
startup that has raised $1.4 billion in
funding but has yet to release a product,
has made an acquisition to expand its
work in computer vision and deep
learning, and to build out its operations
into Europe.
The company has acquired the 3D division
of Dacuda, a computer vision startup
based out of Zurich. One of
Dacuda’s focuses had been
developing algorithms for consumer-
grade cameras (and not just cameras, but
any device with a camera function) to
capture 2D and 3D imaging in real time,
“making 3D content as easy as taking a
video.”
https://techcrunch.com/2017/02/18/confir
med-magic-leap-acquires-3d-division-of-d
As you can see, no detail about what the two might be working on. The acquisition was first rumored
last week — after Dacuda posted a note on its blog about selling its 3D division, and then
some Dacuda employees updated their LinkedIn profiles as Magic Leap employees (one example
here). Tom’s Hardware then speculated it could signal Magic Leap using technology developed by
Dacuda to enable room-scale, six degrees of freedom tracking (essentially to improve its image
capturing sensors in 3D environments).
The ecosystem there is attracting other big-name M&A. Faceshift, a motion capture startup
acquired by Apple in 2015, was also founded in Zurich. Facebook’s Oculus VR in August 2016
also quietly acquired a startup called Zurich Eye, incubated at the University of Zurich and ETH,
the federal institute of technology. Zurich Eye became the basis of Oculus and Facebook’s office in
the city. Zurich Eye, ironically, was co-founded by a three former software engineers from Dacuda
(they all now work for Oculus VR).
For example, in October the company had linked up with MindMaze, another virtual/augmented
reality startup out of Switzerland, to build a platform they were calling “MMI, the world’s first
multisensory computing platform for mobile-based, immersive and social virtual reality
applications,” MindMaze noted.
MindMaze said it planned to “deploy the technology for users globally to address a void left by
Google’s DayDream View for positional tracking and multiplayer interactions.” We have contacted
Magic Leap for comment and will update this post if and when we learn more.
AppleARKit Technology
https://developer.apple.com/arkit/
Since the iPhone 6, iPhones have used what Apple calls “Focus Pixels”, which is its term for phase
detection AF. Fast Company reports that system will be replaced with laser autofocus possibly as soon
as the next iPhone, which is set to debut this fall. It is likely that Apple would use both AF technologies,
as Google does in its Pixel line of phones. The technology would serve a dual purpose, also allowing for
better depth perception with the inbuilt camera for augmented reality apps. ARKit rolls out with iOS 11
this fall, so it would make sense to also include the VSCEL laser system in the phone launching at the
same time.
https://petapixel.com/2017/07/20/apple-bring-3d-laser-autofocus-iphone-cameras-report-says/
https://www.theverge.com/2017/6/26/15872332/apple-arkit-ios-11-augmented-reality-developer-excitement
AppleARKit ExampleApplications
https://twitter.com/madewithARKit
Measuring kitchen dimensions
http://bit.ly/2tJ5KV8 app by→ @SmartPicture3D
Measure distances with your
iPhone. Clever little #ARKit app by
@BalestraPatrick http://bit.ly/2sFl8RB
Inter-dimensional iPhone
AR portals are closer than they
appear http://bit.ly/2sufO0d ARkit
demo by @nedd
Demo Shows How Augmented Reality Will
Make Advertising More Immersive. Mixed
reality producer Bilawal Singh Sidhu show peek of
what the world of advertising could be with the
ARKit. #adtech
https://mobile-ar.reality.news/news/apple-ar-demo-shows-
augmented-reality-will-make-advertising-more-immersive-0
178905/
Google’s responsetoARKit ARCore
DAVID JAGNEUX, UPLOADVR@UPLOADVR SEPTEMBER 2, 2017 6:00 AM “Earlier this week, Google
announced ARCore, a software-based solution for making more Android devices AR-capable without the need for depth
sensors and extra cameras. It will even work on the Google Pixel, Galaxy S8, and several other devices very soon and
supports Java, Unity, and Unreal from day one. In short, it’s kind of like Google’s answer to Apple’s ARKit.”
- https://venturebeat.com/2017/09/02/googles-first-arcore-goal-100-million-ar-capable-android-phones/
“Another example, which is especially relevant for
developers that build traditional smartphone apps in
Java, is that we want to make it easier than ever for
people to get into 3D modeling that haven’t done it
before,” Bavor says. “We know there are a lot of people
that want to get into 3D development and AR but
aren’t experts in Maya, or Unity, or anything. So Blocks
is an app we built with the intention of enabling
people that have never done a 3D model in their
life to feel comfortable building 3D assets. We even
made it easy to export right from Blocks and pull into
ARCore apps you’re developing.”
ARCore tooearlytotellhowitwilldoagainst“AppleCult”
Verge Adi Robertson
https://youtu.be/NhJydpMkpug
FusedVR https://youtu.be/dNXBvDKRg1M
https://venturebeat.com/2017/08/29/google-launches-arcore
-sdk-in-preview-ar-on-android-phones-no-extra-hardware-re
quired/
https://youtu.be/ttdPqly4OF8
Super Ventures Blog Matt Miesnieks
CEO 6D.ai, Partner @Super_Ventures, AR technology & cycling
https://medium.com/super-ventures-blog/how-is-arcore-better-than-arkit-5223e6b3e79d
● Isn’t ARCore just Tango-lite?
● The iPhone-8-keynote sized elephant in the room
● So should I build on ARCore now?
● Is ARCore better than ARKit?
Scottie Gardonio Aug 30
AR / VR enthusiast. Creative Manager. Passionate graphic designer.
https://medium.com/iotforall/arcore-vs-arkit-google-counters-apple-33483c08d3da
ARCore vs. ARKit: Google Counters Apple
Let the Dueling Begin
Google announcing inside-out 6-DOF tracking support for Daydream back at Google IO earlier this year.
DeepLearningonMobileDevices
https://techcrunch.com/2017/05/17/googles-tensorflow-lite-brings-machine-learning-to-android-devices/
http://blog.stratospark.com/creating-a-deep-learning-ios-app-with-keras-and-tensorflow.html
● 3D Face Capture
● 3D Scene Reconstruction
● 2.5D Scene Reconstruction and Computational Photography
● SLAM and Object Tracking
● Augmented Reality
● Google Cardboard SDK for iOS
https://doi.org/10.1109/IPSN.2016.7460664 | Cited by 28 articles, see Related articles
Thursday 20 July 2017, Movidius USB stick
https://techcrunch.com/2017/07/20/movidius-launches-a-79-deep-learning-usb-stick/
Snapchat secretly acquires Seene, a computer vision
startup that lets ...
https://techcrunch.com/.../snapchat-secretly-acquires-seene-a-
computer-vision-startup-... 3 Jun 2016
https://doi.org/10.1109/PDP.2017.98
https://arxiv.org/abs/1705.06224
360°imaging
360°(omnidirectionalimaging) Introduction
The Panoptic Camera platform developed
jointly by Microelectronic Systems
Laboratory (LSM) and Signal Processing
Laboratory (LTS2) of EPFL.*
http://lsm.epfl.ch/page-52820-en.html
Wikipedia: “360-degree videos, also known as immersive videos[1] or spherical videos ,[2] are video recordings where a view in every direction is recorded
at the same time, shot using an omnidirectional camera or a collection of cameras. During playback the viewer has control of the viewing direction like a
panorama.”
Consumer-level camera review
http://thewirecutter.com/reviews/best-360-degree-camera/
By DANIEL CULPANWednesday 12 August 2015
http://www.wired.co.uk/article/9-mind-blowing-360-degree-videos
Scuba Diving Short Film in 360° Green Island, Taiwan
https://youtu.be/2OzlksZBTiA
360°aspartof “10BreakthroughTechnologiesof2017”
https://www.technologyreview.com/s/603496/10-breakthrough-technologies-2017-the-360-degree-selfie/
Seasonal changes to vegetation fascinate Koen Hufkens. So last fall Hufkens, an
ecological researcher at Harvard, devised a system to continuously broadcast
images from a Massachusetts forest to a website called VirtualForest.io. And
because he used a camera that creates 360°pictures, visitors can do more than
just watch the feed; they can use their mouse cursor (on a computer) or finger (on a
smartphone or tablet) to pan around the image in a circle or scroll up to view the
forest canopy and down to see the ground.
Journalists from the New York Times and Reuters are using $350
Samsung Gear 360 cameras to produce spherical photos and videos that
document anything from hurricane damage in Haiti to a refugee camp in Gaza.
One New York Times video that depicts people in Niger fleeing the militant group
Boko Haram puts you in the center of a crowd receiving food from aid groups.
Or consider the spherical videos of medical procedures that the Los Angeles
startup Giblib makes to teach students about surgery. The company films the
operations by attaching a $500 360fly 4K camera, which is the size of a baseball,
to surgical lights above the patient. The 360° view enables students to see not just
the surgeon and surgical site, but also the way the operating room is organized and
how the operating room staff interacts.
These applications are feasible because of the smartphone boom and
innovations in several technologies that combine images from multiple lenses and
sensors. For instance, 360° cameras require more horsepower than regular
cameras and generate more heat, but that is handled by the energy-efficient chips
that power smartphones. Both the 360fly and the $499 ALLie camera use
Qualcomm Snapdragon processors similar to those that run Samsung’s high-
end handsets.
Once people discover spherical videos, research suggests, they shift their
viewing behavior quickly. The company Humaneyes, which is developing an
$800 camera that can produce 3-D spherical images, says people need to watch
only about 10 hours of 360° content before they instinctively start trying to interact
with all videos. When you see 360°imagery that truly transports you somewhere
else, you want it more and more.
Low-costendSamsung Gear andGalaxy
Samsung Gear360, ~£250
Samsung GearVR, ~£100
Samsung Galaxy S6-8, smartphone, ~£200-£700
http://www.samsung.com/uk/wearables/gear-360-c200/
If you’re clamoring to shoot in 360 degrees, the Gear 360 balances
simple design with workable image quality — but you really need a
Samsung phone (and a Gear VR, and a good hunk of money) to get
the most out of it. And, for now, that's fine.
This version of the Gear 360 is more likely to be looked back on as a
relic anyway, a recognizable but eventually dismissible attempt at a
new idea, and the foundation for whatever Samsung does next.
Low-costend#2Ricoh Theta
Ricoh’s Theta V 4K camera sports 360-
degree video and wireless playback
RYAN WINTERHALTER, UPLOADVR@@UPLOADVR
SEPTEMBER 02, 2017 07:03 PM
https://venturebeat.com/2017/09/02/ricohs-theta-v-4k-camera-sport
s-360-degree-video-and-wireless-playback/
Ricoh is unveiling its latest 360-degree camera this morning. Dubbed the Ricoh Theta V, the $430 4K camera
is the latest in the line which launched in 2013 with the Ricoh Theta.
Available for pre-order now, and shipping in mid-September, the Theta V features 3,820-by-1,920 resolution
video capture. That’s a massive improvement on the earlier Theta S, which offered a sub-1,080p 1,920-by-960,
and the Theta SC, which allowed for 1,920-by-1,080 recording.
Perhaps the biggest usability improvement to the Theta V is the inclusion of remote playback. Users can now
wirelessly stream their video to an external display directly from the camera. Previous devices in the Theta line
(except the developer-only Theta R) required users to export their raw footage into a computer to stitch the
image and create a useable video. That’s now all done on the device. Videographers can watch their footage
on any display, and move the POV by moving the camera itself.
The Theta V boosts sound quality as well. Four microphones capture data from their respective dimensions,
creating spatial audio that allows users to hear where the sound is coming from within the recording.
Ricoh Theta V hands-on
Published Aug 31, 2017 | Jeff Keller
Based on some quick tests of a non-final Theta V,
both stills and videos are noticeably better than
those from its predecessor. We're looking forward
to getting our hands on a production model in a few
weeks and putting it through its paces.
For higher quality audio
capture, Ricoh is offering
the TA-1 3D Microphone
($269). Developed by
Audio Technica, the mic
attaches via the tripod
mount and uses a
standard 3.5mm audio
jack.
HigherEndGoPro, Nokia Ozo, FacebookSurround, etc.
GoPro (NASDAQ:GPRO) recently unveiled the Omni, a six-camera rig
for filming interactive spherical videos that can be explored through a
smartphone's movements, a user's finger swipes, or a virtual reality
headset. The device is the smaller sibling of the 16-camera Odyssey
rig ($15,000), which hasn't been launched despite being announced
nearly a year ago. Let's take a look at four key things investors should
know about the Omni ($3,500), and how they might impact GoPro's
future.
https://www.fool.com/investing/general/2016/04/14/4-things-inves
tors-need-to-know-about-gopro-incs-o.aspx
What's next for GoPro? GoPro investors don't have many catalysts
to look forward to this year. The Omni is too pricey relative to its
peers to gain any mainstream traction. The Karma drone, which is
due to arrive within the next two months, faces tough competition
from market leader DJI Innovations. By the time the Hero 5 cameras
arrive near the end of the year, the mainstream market could be
saturated with cheap VR and flying cameras.
Introducing Facebook Surround
360: An open, high-quality 3D-360
video capture system
Brian K Cabral, April 12, 2016
● Facebook has designed and built a durable, high-
quality 3D-360 video capture system.
● The system includes a design for camera hardware
and the accompanying stitching code, and we will
make both available on GitHub this summer. We're
open-sourcing the camera and the software to
accelerate the growth of the 3D-360 ecosystem —
developers can leverage the designs and code, and
content creators can use the camera in their
productions.
● The system exports 4K, 6K, and 8K video for each
eye. The 8K videos double industry standard output
and can be played on Gear VR with Facebook's
custom Dynamic Streaming technology.
https://code.facebook.com/posts/1755691291326688/introduc
ing-facebook-surround-360-an-open-high-quality-3d-360-vid
eo-capture-system/
https://www.theverge.com/2016/4/25/11421992/disney-nokia-oz
o-camera-virtual-reality-star-wars-marvel
Ever since Nokia announced its
360-degree Ozo virtual reality camera it has positioned the
system as a high-end option for Hollywood filmmakers, and
today the company is announcing a partnership with Disney
that should help deliver on that promise. As part of the deal,
Ozo cameras will be put into the hands of Disney filmmakers
and its marketing teams to create 360-degree, virtual reality
content across all of the studio’s various brands.
LytroImmerge The world'sfirst professional Light Field solution forcinematicVR
roadtovr.com/lytros-immerge-360
https://www.lytro.com/immerge
Consequently, to create a virtual reality that even the human eye cannot distinguish from the real
world, we must achieve the perfect immersive viewing experience, such that human viewers feel
they can walk into the scene. This is known as the virtual walk-in effect, and it requires light-field
technology—3D imaging technology that emerged from the field of computational
imaging/photography to capture the light rays that people perceive from different locations and
directions. When combined with computer vision and deep learning, light- field technology
provides a viable path for producing low-cost, high-quality VR content, positioning this technology
to be the most profitable segment of the VR industry.
“DepthLytro”‘Depth sensing with light fieldtechniques
Refocusing in spite of foreground occlusions: (a) Scene containing a
monkey toy being partially occluded by a plant in the foreground, (b)
traditional synthetic aperture refocusing on light field is partially effective in
removing the effect of foreground plants, (c) synthetic aperture refocusing
of depth displays corruption due to occlusion, (d) histogram of depth
clearly shows two clusters corresponding to plant and monkey, (e) virtual
aperture refocusing after removal of plant pixels shows sharp depth image
of monkey, (f) Quantitative comparison of indicated scan line of the
monkey’s head for (c) and (e)
We use coding techniques from Tadano et al. (2015) to image beyond
backscattering nets. Notice how the corrupted depth maps are improved
using the codes. We show how digital refocusing can be performed on the
images without the scattering occluders by combining depth fields with
coded TOF.
https://arxiv.org/abs/1509.00816
Post-processingfor360° imaging
https://doi.org/10.1007/s00371-017-1368-7
Overall process. a Input image. b Lines detected and classified: red for
vertical lines and yellow for horizontal lines. c Great circles from the
classified lines. Green dots are vanishing points computed from
horizontal (yellow) lines. d Upright adjustment result
We implemented our method using C++ and the OpenCV library on a 64-bit Windows
PC with an Intel i7- 6700K 4.00GHz CPU and 32GB RAM. For an input image of size
5376 × 2688 px, it takes a few hundred milliseconds (less than one second) to
obtain the final rotation matrix R for upright adjustment.
https://arxiv.org/abs/1703.10798
http://vllab1.ucmerced.edu/~wlai24/360hyperlapse
Pipeline of the proposed algorithm. Given a 360 video, we first stabilize the sequence to smooth the relative rotation◦
between adjacent frames. We estimate the focus of expansion (i.e., the direction of forward motion) as a prior information for
our camera path planning. To extract the regions of interest, we compute the spatial-temporal saliency and semantic
segmentation. The detected regions of interest are used to guide the camera path planning. Finally, we use an adaptive 2D
video stabilization to render a smooth hyperlapse.
360°DeepLearning #1
http://dx.doi.org/10.3390/s17061341
https://arxiv.org/abs/1705.01759
Watching a 360º sports video
requires a viewer to
continuously select a viewing
angle, either through a
sequence of mouse clicks or
head movements. To relieve
the viewer from this “360
piloting” task, we propose
“deep 360 pilot” – a deep
learning-based agent for
piloting through 360º sports
videos automatically
Panel (a) overlaps three panoramic frames
sampled from a 360 skateboarding video◦
with two skateboarders. One skateboarder
is more active than the other in this
example. For each frame, the proposed
“deep 360 pilot” selects a view – a
viewing angle, where a Natural Field of View
(NFoV) (cyan box) is centered at. It first
extracts candidate objects (yellow boxes),
and then selects a main object (green dash
boxes) in order to determine a view (just like
a human agent). Panel (b) shows the NFoV
from a viewer’s perspective.
360°DeepLearning #2
Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery
Yu-Chuan Su, Kristen Grauman (Submitted on 2 Aug 2017) https://arxiv.org/abs/1708.00919
We propose to learn a spherical
convolutional network that translates a
planar CNN to process 360° imagery
directly in its equirectangular projection.
Our approach learns to reproduce the flat
filter outputs on 360° data, sensitive to
the varying distortion effects across the
viewing sphere. The key benefits are
1) Efficient feature extraction for
360°images and video, and
2) The ability to leverage powerful pre-
trained networks researchers have
carefully honed (together with massive
labeled image training sets) for
perspective images.
We validate our approach compared to
several alternative methods in terms of
both raw CNN output accuracy as well as
applying a state-of-the-art "flat" object
detector to 360° data. Our method yields
the most accurate results while saving
orders of magnitude in computation
versus the existing exact reprojection
solution.
360°Therolein PropTech? #1a
Usefor real estate agents, still a novelty/gimmicky? (from 2014 until 2017)
MAY 26, 2014 By James Dearsley
http://www.jamesdearsley.co.uk/is-the-property-industry-intereste
d-in-360-degree-hd-filming/
USES OF 360 DEGREE HD FILMING IN REAL ESTATE:
1. Sales and Marketing. Firstly, from a realtor or estate agent perspective there are several uses
here of 360 degree cameras, the first being obvious, that of sales and marketing. It will be simple
and efficient to take a quick film of each room, or just walk through the property with these devices
to record what you need
2. Property Management issues. We have also seen interest from companies looking to use these
bits of equipment for inventory taking. Seeing as they are of HD quality it means you can quickly
take photographs of properties which can later be looked at in more detail should problems arise in
letting disputes.
3. Virtual Reality. With Facebook recently buying Oculus Rift for $2 Billion, it is getting less far
fetched. Considering the price of an Oculus is relatively cheap (reckoned to be less than
$500/£360 when released next year) it would not be surprising if Facebook are hoping for a lot of
people to be purchasing these (Candy Crush Saga in Virtual Reality anyone?!). It isn’t just Facebook
though; Sony have a VR headset in production as does Samsung (it was recently announced) and so
this space is going to move quickly. By using these cameras you can put your clients into these
homes very quickly and easily – either in the office, if you get a set of these yourself, or, in time, in
their own home if Facebook get their way.
https://www.forbes.com/sites/forbesagencycouncil/2017/06/28/want-to-use-360
-degree-photo-and-video-11-things-to-consider/#22fffa955002
1. I would recommend that marketers stay on the sidelines until the industry
matures. - Kristopher Jones, LSEO.com
4. Use A Strategic Approach The capabilities of 360-degree photo/video have
powerful applications in many industries, including real estate, retail and tourism. A
360-degree view has a better chance of selling a house than a static image. -
Brock Murray, seoplus+
7. Prepare For Tomorrow's Consumer Expectations Today, 360-degree photos
and videos are very helpful in industries such as the auto industry or real estate where
visualizing the product is essential. As VR continues to grow, 360-degree photos and
videos will likely become a standard. The consumers' expectations will likely adjust to
needing to learn more about the overall "360-degree" experience of the restaurant for
example, not just a picture of the dish. - Ahmad Kareh, Twistlab Marketing
11. Create An Emotional Connection 360-degree multimedia is a brilliant tool for
meaningful storytelling, as it allows the consumer to be transported to the experience
you want them to have, bringing the story to life. Companies should take advantage of
these tools to transform products into experiences, cultivating an immersive and
emotional connection with the brand. - Joey Hodges, Demonstrate PR
JUN 28, 2017 by Forbes Agency Council
360°Therolein PropTech? #1b
Usefor real estate agents
A four-wheeled tripod outfitted with a computer, 360-
degree camera and sensors can roam properties,
producing highly choreographed, immersive videos that
would be difficult — if not impossible — to replicate with
a normal video camera.
VirtualAPT (Brooklyn, NYC) offers residential tour service at now $1/ft² (~10.8$/m²), and for commercial uses,
for a monthly fee per building or $0.50/ft² (~5.4$/m²) for separate units.
Generated by technology from companies such as Matterport, 3-D home tours allow users to jump between
360-degree photos — sometimes situated within a 3-D model.
● A rover can shoot 360-degree footage of
a home while moving along a pre-plotted
route.
● Made by VirtualAPT, the videos can
include on-camera presentations from
real estate agents.
● They're an alternative to 3-D homes tours
from companies such as Matterport.
https://www.youtube.com/watch?v=JhfQK-tDvGU
360°Therolein PropTech? #2a
Use forconstruction andasatoolforconstructing4D/5D/6DBIM (BuildingInformationModel)
Construction site manager
manually taking photos of the
progress.
- Time-consuming to walk through
and take photos
- No full coverage of site
- Might forget some spots
- Nice initial 3D BIM not properly
maintained during construction site.
+ Ideally have a drone inspecting the
whole construction site with an on-
board 360 degree video and a
LIDAR / laser scanner.
+ One can go back in time and see
who of the subcontractors for
example are responsible for possible
problems
https://doi.org/10.1186/s40327-014-0016-9
360°Therolein PropTech? #2b
360 videos registered or not to 3D BIM model allows inspection of the progress (“4D BIM”) in the
construction site also retrospectively, and can possibly reduce legal battles when it is clearer who is
the one to be held responsible in case of discrepancies between as-built and as-planned data.
VISUAL ASSET MANAGEMENT Visual Asset Management (VAM) service digitizes industrial
and infrastructure assets using 360 degree images, 3D Models, and relative asset information.
3D MODELING We thrive on enabling 3D realistic visualization to projects while preserving the
minute details necessary to portray our world.
360 VIDEO 360 video enables viewers to be at the center of any medium, allowing for a unique
visual experience and situational awareness from any device.
VIRTUAL REALITY OcuTech’s virtual reality solutions stimulate creative thinking and enhanced
information sharing allowing for one of kind virtual experience.
Ocutech from Houston, Texas, USA is
already providing these type of
services
https://ocutech360.com/3d-architectural-visualization-solution/#3dvrvideo
360°imaging+SfM
360°intosmartphones howbigwillitbe?
https://www.engadget.com/2017/07/10/future-of-smartphone-camera/
1) Augmented reality
2) Dual-lens cameras
3)Better lenses
4)4K recording
5)Thermal imaging
6)Optical zoom
7)360 video
“Several smartphone makers, including Samsung and Huawei, have already released add-on 360-
degree cameras for their handsets, but this is something that could eventually be integrated into the
phones themselves. Immersive 360-degree videos are gradually making their mark, with Facebook
among the big firms pushing the technology, while virtual reality companies are gradually introducing
more 360-VR content that be viewed from mobile phones.”
https://techcrunch.com/2016/08/30/the-future-of-mobile-video-is-virtual-reality/
Are 360 cameras the future?
https://youtu.be/i8EUerX90-0 TechAltar
So whether teens in big
numbers will ever apply
Snapchat bunny ears to
immersive 360 degree
videos?
360°intosmartphones plentyofoptionscoming#1
Acer’s new Holo 360 degree camera
is essentially a smartphone
Acer has announced its entry into the VR
video market with a device that’s half
360-degree camera, half smartphone.
http://www.trustedreviews.com/news/acer-s-new-ho
lo-360-degree-camera-is-essentially-a-smartphone
-2953609
Paul Monckton CONTRIBUTOR
I write about photography and related subjects
https://www.forbes.com/sites/paulmonckton/2016/05/31/worlds-first-live-smartphone-vr-camera/#9
fea6921a8b0
Yesterday at this year’s Computex trade show in Taipei,
Quanta Computer and ImmerVision jointly announced what
is claimed to be the world’s first 360-degree live VR
streaming camera for smartphones, with demos starting from
today. The, as yet unnamed, camera fits in the palm of the
hand and is designed to attach magnetically to any
smartphone. It comes with a 360-degree by 187-degree lens
and uses a Sony Exmor-HDR imaging sensor to produce 16
megapixel panoramic images.
ImmerVision's Panamorph lens makes more efficient use of an image sensor
(Image credit: ImmerVision)
THIS ADD-ON CAMERA WILL TURN YOUR
SMARTPHONE INTO A 360 CAMERAJULY 26, 2017
ION360 U 4K 360-Degree Smartphone Camera
is comprised of a 360 camera that goes on top of
Essential's 360 Camera Is the World's Smallest
360-Degree Personal Camera for a Smartphone
30 May 2017
http://gadgets.ndtv.com/mobiles/news/essentials-360-camera-is-the-worlds-sm
allest-360-degree-personal-camera-for-a-smartphone-1705826
After months of teasing, Android creator Andy Rubin has
finally unveiled the Essential Phone that features a near
bezel-less display that tries to outdo Samsung's Galaxy
S8. Essential's 360 camera, which weighs around 35
grams and is being called the world's smallest 360-
degree personal camera by the company, includes a dual
12-megapixel fisheye sensors that can capture 4K 360
video at 30fps. The camera also features 4 microphones
to capture sound in 3D. The 360 camera can be bought
along with the Essential Phone for an additional $50, or
can be bought separately which will cost you $199.
@essential, Palo Alto, CA, essential.com
360°intosmartphones plentyofoptionscoming#2
ProTruly’s Darling
https://www.theverge.com/2017/3/5/14809
182/protruly-darling-360-degree-camera-
smartphone
A company called HT Optical
that makes the cameras
found on ProTruly’s devices.
The company said that it is
working on a much smaller
360 camera module that will
actually fit into a 7.6 mm thick
smartphone and will be
capable of capturing 16 MP
photos and shoot 4K videos.
What’s even more interesting
is that the module will only
add an extra 1 mm to the
overall thickness of a device.
https://www.theverge.com/ci
rcuitbreaker/2017/2/22/1469
8026/huawei-360-degree-came
ra-honor-vr-smartphones
http://360rumors.com/
https://www.vrfocus.com/2017/07/360-degree-video-edi
ting-app-for-smartphones/
V360 -360 video editor Avincel GroupInc 
360-DegreeVideo Editing App ForSmartphonesV360editingsuite alreadyout for Android, withiOS versioncomingsoon.
360°intosmartphones convergencewith AI players of course
https://www.embedded-vision.com/news/movidius-low-po
wer-vpu-technology-delivers-4k-vr-pixel-processing-p
erformance-motorola%E2%80%99s-newest
Movidius’ Myriad 2 Vision Processing Unit (VPU) technology,
known for its image signal processing and computer vision
capabilities with high energy efficiency, was selected by
Motorola Mobility to power their newest Moto Mod: the 360
Camera. Moto Mods are unique modular accessories for
Motorola smartphones that bring advanced functionality
beyond traditional smartphone features. Motorola’s newest
Moto Mod brings users the ability to live stream 360 videos⁰
while preserving battery life.
Say Hello to the moto z² Force Edition with moto mods
https://www.youtube.com/watch?v=0moMnChM6Ds
https://www.wsj.com/articles/intel-to-buy-semiconduct
or-startup-movidius-1473170441
https://www.altera.com/solutions/industry/automotive/applicat
ions/drive-assistance/surround-view-camera.html
http://www.nvidia.co.uk/object/drive-px-uk.html
360°VideoSfM Obviousextensiontocombineboth
Instead of manuallyrotatingyour camera,image all angles simultaneously while going through the
rooms in an apartment
https://uploadvr.com/adobe-algorithm-6dof-360-cam/
http://variety.com/2017/digital/news/adobe-6dof-vr-v
ideo-algorithms-1202394491/
Adobe Motion Parallax demo
https://youtu.be/37Z4f6p1HOY
https://www.roadtovr.com/adobes-new-research-aims-give-depth-monoscopic-360-video/: Other techniques to achieve 6-DoF VR video
usually require light-field cameras like HypeVR’s crazy 6k/60 FPS, LiDAR rig or Lytro’s giant Immerge camera. While these undoubtedly will
produce a higher quality 3D effect, they’re also custom-built and ungodly expensive.
6-DOF VR videos with a single 360-camera
Jingwei Huang ; Zhili Chen ; Duygu Ceylan ; Hailin Jin, Virtual Reality (VR), 2017 IEEE
http://dx.doi.org/10.1109/VR.2017.7892229, 18-22 March 2017
Given a 360-video captured by a single spherical panorama camera, in an offline pre-processing stage, we recover
the camera motion and the scene geometry first by performing structure-from-motion (SfM) followed by dense
reconstruction. Then, in real-time we playback the video in a VR headset where we track the 6-DOF motion of the
headset and synthesize new views by a novel warping algorithm.
360°VideoSfM KoreaAdvanced Institute ofScience andTechnology(KAIST)
Spherical panoramic cameras (Ricoh Theta S, Samsung
Gear 360 and LG 360)
Our sphere sweeping algorithm
enables to compute all-around
dense depth maps, minimizing the
loss of spatial resolution. With the
estimated all-around image and
depth map, we have shown
practical utilities by introducing
360 stereoscopic and anaglyph◦
images as VR contents.
European Conference on Computer Vision ECCV
2016: Computer Vision – ECCV 2016 pp 156-172
https://doi.org/10.1007/978-3-319-46487-9_10
All-Around Depth from Small Motion with a Spherical Panoramic Camera. Sunghoon ImEmail author Hyowon Ha François Rameau Hae-Gon Jeon Gyeongmin Choe In So Kweon
RangeSensing
Structured-LightandTime-of-Flight
MicrosoftKinect Democratizing structuredlightscanning
https://arxiv.org/abs/1505.05459
Structured light A sequence of known patterns is
sequentially projected onto an object, which gets
deformed by geometric shape of the object. The
object is then observed from a camera from a
different direction. By analyzing the distortion of
the observed pattern, i.e. the disparity from the
original projected pattern, depth information can
be extracted
The Time-of-Flight (ToF) technology is based on
measuring the time that light emitted by an illumination
unit requires to travel to an object and back to the sensor
array. The Kinec tToF camera applies this CW intensity
modulation approach. . Due to the distance between the
camera and the object (sensor and illumination are
assumed to be at the same location), and the finite speed
of light c, a time shift [s]φ is caused in the optical signal
which is equivalent to a phase shift in the periodic signal.
This shift is detected in each sensor pixel by a so-called
mixing process. The time shift can be easily transformed
into the sensor-object distance as the light has to travel
the distance twice,
Cited by 65 articles - see Related articles
KinectFusion Scanning with Kinect
https://doi.org/10.1145/2047196.2047270 Cited by 1356 articles, see Related articles
https://arxiv.org/abs/1704.01047
https://arxiv.org/abs/1612.02859
The semantic cue from floorplan
(i.e., door detection) resolves
ambiguities. The figure shows the
best placement based on the unary
potential with or without the
semantic cue
We show qualitative results on ModelNet using the TSDF encoding (Curless and Levoy, 1996) and 4 views. The
same TSDF truncation threshold has been used for traditional fusion, our OctNetFusion approach and the ground
truth generation process. While the baseline approach is not able to resolve conflicting TSDF information from
different viewpoints, our approach learns produce a smooth and accurate 3D model from highly noisy input.
By learning the structure of real world 3D objects and scenes, our approach is further able to
reconstruct occluded regions and to fill gaps in the reconstruction. We evaluate our approach
extensively on both synthetic and real-world datasets for volumetric fusion. Further, we apply
our approach to the problem of 3D shape completion from a single view where our approach
achieves state-of-the-art results.
Kinecttweaks depthresolution improvementswithpolarization measurement?
http://news.mit.edu/2015/object-recognition-robots-0724
https://youtu.be/m6sStUk3UVk
http://news.mit.edu/2015/algorithms-boost-3-d-imaging-resolution-1000-times-1201
https://doi.org/10.1007/s11263-017-1025-7
https://doi.org/10.1364/OE.25.001173
RangeSensing PlentyofOptions
http://3dscanexpert.com/photogrammetry-benchmarks-r
emake-vs-photoscan-vs-realitycapture-vs-zephyr/
This post is just an example based on a single photoset from a single
object. That makes it zero percent scientific. Also, RealityCapture
might have won this Drag Race in terms of both speed with the Fast
preset and quality with the Normal preset, but an organic object like
this is very favorable to its algorithms. Read my Full RC Review to see
that it can’t always handle non-organic objects well.
COMMERCIAL SOFTWARE
http://3dscanexpert.com/
By Nick Lievendag Entrepreneur at the intersection of Creativity × Technology. Writes, Speaks and Consults about 3D
Capture (3D Scanning & Photogrammetry). Founder of 3D Scan Expert.
Matterportdominating RealEstatescanning
This $4,500 camera turns the real world into the virtual one. Today, Matterport
’s hardware is a hit with real estate agents. But fueled by the $30 million Series C
it just raised, Matterport’s software and partnership with Google’s Project Tango
could let you wave your phone around to create VR tours of anywhere you want.
https://techcrunch.com/2015/06/25/matterport/
https://www.crunchbase.com/organization/matterport#/entity
Matterport spawned out of the Xbox Kinect hacker scene in 2010. Founder
Matt Bell had been working for a gesture recognition company that relied on a
$50,000 camera and expert operators to produce a huge CAD file that could
only be accessed through a specialized application. Bell was flabbergasted by
the power of the $150 Kinect. He realized the potential for a relatively cheap
device with similar technology that could let anyone map out rooms to create
3D models accessible straight from the web.
https://youtu.be/HZX8RupfQls
MatterportResearch onsemanticindoor segmentation
We collected the data using the Matterport Camera, which combines 3
structured-light sensors to capture 18 RGB and depth images during a
360 rotation at each scan location◦ . The output is the reconstructed 3D
textured meshes of the scanned area, the raw RGB-D images, and camera
metadata. We used this data as a basis to generate additional RGB-D data
and make point clouds by sampling the meshes. We semantically annotated
the data directly on the 3D point cloud, rather than images, and then
projected the per point labels on the 3D mesh and the image domains.
https://arxiv.org/abs/1702.01105 | Cited by 3 - Related articles
https://arxiv.org/abs/1702.07600
https://www.fastcompany.com/3059281/
introducing-hover-an-ai-powered-indo
or-safe-camera-drone
+
Indoor scanning with tripod-based Matterport
still requires a lot of manual work, and at some
point will be updated to autonomous AI-
powered indoor drone for better user
experience.
MatterportTechnologypatents
Capturing and aligning multiple 3-dimensional sceneswww.google.com/patents/US8879828Grant -
Filed Jun 29, 2012 - Issued Nov 4, 2014 - Matthew Bell - Matterport, Inc.
Multi-modal method for interacting with 3d models
www.google.com/patents/US20130342533App. - Filed Jun 24, 2013 - Published Dec 26, 2013 - Matthew Bell -
Matterport, Inc.
Identifying and filling holes across multiple aligned three-dimensional scenes
www.google.com/patents/US8861840Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc.
Building a three-dimensional composite scene
www.google.com/patents/US8861841Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc.
Processing and/or transmitting 3D data
www.google.com/patents/US9396586Grant - Filed Mar 14, 2014 - Issued Jul 19, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Semantic understanding of 3d data
www.google.com/patents/US20160055268App. - Filed Jun 6, 2014 - Published Feb 25, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Selecting two-dimensional imagery data for display within a three-dimensional model
www.google.com/patents/EP3120329A1?cl=enApp. - Filed Mar 13, 2015 - Published Jan 25, 2017 - Matthew Tschudy BELL -
Matterport,
Classifying, separating and displaying individual stories of a three-dimensional model
of a multi-story structure based on captured image data of the multi-story structure
www.google.com/patents/US20160217225App. - Filed Jan 28, 2016 - Published Jul 28, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Semantic understanding of 3d data
US 20160055268 A1
ABSTRACT Systems and techniques for processing three-
dimensional (3D) data are presented. Captured three-
dimensional (3D) data associated with a 3D model of an
architectural environment is received and at least a portion of
the captured 3D data associated with a flat surface is
identified. Furthermore, missing data associated with the
portion of the captured 3D data is identified and additional 3D
data for the missing data is generated based on other data
associated with the portion of the captured 3D data.
REFERENCED BY
US9576184 Textura Planswift Corporation
Detection of a perimeter of a region of interest in a floor plan document
US20130328872 Tekla Corporation
Computer aided modeling
US20150227644 Pictometry International Corp.
Method and system for displaying room interiors on a floor plan
US20160063722 Textura Planswift Corporation
Detection of a perimeter of a region of interest in a floor plan document
US20160379405 Jim S Baca
Technologies for generating computer models, devices, systems, and
methods utilizing the same
GoogleTangoTechnology
http://www.deccanchronicle.com/technology/gadgets/210717/i
s-google-tango-relevant-in-2017.html
https://arstechnica.co.uk/gadgets/2016/12/google-
tango-phab-2-pro-review/
A Project Tango device ‘sees’ the environment around it
through a combination of three core functions.
First up is motion tracking, which allows the device to
understand its position and orientation using a range of
sensors (including accelerometer and gyroscope).
Then there’s depth perception, which examines the
shape of the world around you. Intel provides a vital cog in
this respect with its RealSense 3D camera. With this
component on board, a device can gain accurate gesture
control and snappy 3D object rendering among other
things.
Finally, Project Tango incorporates area learning, which
means that it maps out and remembers the area around it.
Point Cloud Framework for Rendering 3D
Models Using Google Tango
Maxen Chung, Santa Clara University
Julian Callin, Santa Clara University
http://scholarcommons.scu.edu/cseng_senior/84
https://doi.org/10.1007/s11227-016-1891-8
Project Tango Tablet Development Kit, recently introduced by
Google, Inc. Equipped with the most powerful processor available
to date on a consumer-level mobile platform (i.e., NVIDIA Tegra K1
whose 192 programmable CUDA-enabled GPU cores use the
same efficient Kepler architecture found in the world’s most
powerful supercomputers and workstations) along with several
sensors (motion tracking camera, 3D depth sensor,
accelerometer, ambient light sensor, barometer, compass, GPS,
gyroscope), this mobile device can readily utilize GPU computing
making it an ideal platform for developing real-time contextual
awareness applications for the visually impaired (VI). Moreover,
being compact, lightweight, potentially wearable, relatively
discreet and affordable render it aesthetically appealing, socially
acceptable and accessible for VI users
GoogleTangoExampleApplications#1
We broke the news yesterday that Google
was producing a prototype 3D sensing
smartphone called Project Tango. We also
broke down the capabilities of the vision
processor inside the device and talked
about what it means for the future of
phones.
Now, we’ve got an exclusive look in the
video below at a real 3D indoor map of a
room captured with one of the prototype
devices by Matterport.
https://techcrunch.com/2014/02/21/heres-an-actual-3d-indoor-map-of-a-room-captured-with-googles-project-tango-phone/
https://matterport.com/mobile-3d-capture/
https://developers.google.com/tango/apis/overview
Daydream is Google’s platform for virtual
reality. It consists of Daydream-ready phones,
Daydream-ready headsets and controllers, and
Daydream apps. Daydream View is the first
Daydream-ready headset and controller
designed and developed by Google. It also
comes with a touch-and-motion enabled
controller so you can easily interact with VR
apps.
With the Daydream View, you will be able to
explore new worlds through Google Street View
and Fantastic Beasts. Kick back in your
personal cinema with YouTube, Netflix, Hulu,
and HBO. Get in the game with Gunjack 2,
LEGO® BrickHeadz, and Need for Speed.
That’s just the beginning of the VR possibilities
with Daydream.
http://www.techphlie.com/
2017/07/what-is-google-ta
ngo-and-daydream.html
Google has notably been pushing AR/VR
technologies with its latest Android OS. The
most prominent introduction however, has
been the ASUS ZenFone AR launch that took
place at CES, 2017, earlier this year.
GoogleTangoExampleApplications#2
Google Tango SDK
examples: how to
make a floor plan in
50 seconds
Alexander Grau
Google Tango and
Revit
Leonardo Manzione
https://www.youtube.com/watch?v=A-4cuJ1kOQ4
“GoogleTango”withoutdepth sensors
I have always believed that bringing 3D to consumers could only work without the need for
dedicated depth sensors. This pure-software approach is already being embraced for
Augmented Reality with Apple’s upcoming ARKit and Google’s ARCore which was announced
last week. Both can give modern smartphones AR-capabilities by just using the regular camera(s),
instead of using dedicated sensors like Tango.
https://3dscanexpert.com/sony-3d-creator-brings-sensor-less-3d-scanning-consumers/
But yesterday, at IFA Berlin, Sony announced its
latest smartphone, the XZ1. Which has all the
bells and whistles you expect from a flagship
Android phone but also an app called 3D Creator
. It basically does exactly what Microsoft showed
last year, but is actually available — albeit
exclusive for the XZ1.
https://www.sonymobile.com/global-en/products/phones/xperia
-xz1/3d-creator/
AppleDepthSensing
TheiPhoneX’s
notch isbasically
aKinect
365by Paul Miller@futurepaul  Sep 17,2017, 10:00am
EDT
https://www.theverge.com/circuitbreaker/2017/9/17/16315510/iphone-x-notch-kinect-apple-primesense-microsoft
And now, in late 2017, Apple is going to sell a phone witha front-facing depthcamera. Unlike the original Kinect,
which was built to track motion in a whole living room, the sensor is primarily designed for scanning faces and
powers Apple’s Face ID feature. Apple’s “TrueDepth” camera blasts “more than 30,000 invisible dots” and can
create incredibly detailed scans of a human face. In fact, while Apple’s Animoji feature is impressive, 
the developerAPIbehind it is even wilder: Apple generates, in real time, a full animated 3D mesh of your face,
while also approximating your face’s lighting conditions to improve the realismofAR applications.
How Apple’siPhone X
TrueDepth CameraWorks
By David Cardinal onSeptember 14, 2017
Beyond the Camera: Facial Motions and
Changing Features Getting a depth estimate for
portions of a scene is only the beginning of what’s
required for Apple’s implementation of secure facial
recognition and Animojis. For example, a mask could
be used to hack a facial recognition system that relied
solely on the shape of the face. So Apple is using
processing power to learn and recognize 50 different
facial motionsthat are muchharder toforge.Theyalso
provide the basis for making Animoji figures seem to
mimicthephone’sowner.
How Secure is Face ID? Given how willing Apple is
to commit to using Face ID for financial transactions,
I’m sure they have pushed the limits beyond either
simple 3D models or 2D motion. It is likely they are
relying on the phone’s abilitytorecognize minute facial
movements and feed them into a machine learning
system on the A11Bionicchip that will add another
layer of security to the system. That piece will also be
key in helping the phone decide whether you’re the
same person when you put on a pair of glasses, a hat,
or grow a beard — all of which Apple claims Face ID
willhandle.
Laserscanning
LIDARtechnology
LaserScanning LiDAR(LightDetection AndRanging)
http://dx.doi.org/10.1038/nphoton.2010.148
http://dx.doi.org/10.1080/19479832.2013.811124
3D building modeling
(BIM) using images and
LiDAR: a review
https://techcrunch.com/2017/07/12/nyu-releases-the-largest-lidar-
dataset-ever-to-help-urban-development/
http://ia.cr/2017/613
https://www.theregister.co.uk/2017/06/27/lidar_spoofed_bad_news_for_self_driving_cars/
VelodyneThemoston newsduetoautonomousdriving
http://velodynelidar.com/
https://www.youtube.com/watch?v=8nTFjVm9sTQ https://www.youtube.com/watch?v=nXlqv_k4P8Q
http://spectrum.ieee.org/cars-that-think/transportation/se
nsors/velodyne-announces-a-solidstate-lidar
http://spectrum.ieee.org/cars-that-think/transportati
on/sensors/israeli-stealth-startup-innoviz-promises-1
00-solidstate-automotive-lidar-by-2018
http://spectrum.ieee.org/transportation/advanced-cars/cheap-lidar-the-k
ey-to-making-selfdriving-cars-affordable
RieglA rangeof differentlaserscanners
http://www.riegl.com/products/unmanned-scanning/
RIEGL VZ-400 Indoor Scanned Data
by Jamis Choi, Published on Apr 1, 2010
https://www.youtube.com/watch?v=hOf0hpCn92I
Scanning made simple with RiSOLVE - RIEGL's new 3D Scene Capture Software
Published on Oct 4, 2012 (feat. horrible lounge music)
https://www.youtube.com/watch?v=lbxvzMlTWyg
Rieglsystemin practice
https://doi.org/10.1109/IROS.2016.7759501
Namely, we propose a method for the automatic selection of feature coordinate
locations, and introduce the concept of localized automatic relevance
determination (LARD) to the Hilbert Maps framework, in which different
dimensions in the projected Hilbert space operate within independent length scale
values. The proposed technique was tested against other state-of-the-art 3D
scene reconstruction tools in three different datasets: a simulated indoors
environment, RIEGL laser scans and dense LSD-SLAM pointclouds. The results
testify to the proposed framework’s ability to model complex structures and
correctly interpolate over unobserved areas of the input space while achieving
real-time training and querying performances.
HandheldScanning GeoSLAMZEB-REVO
Handheld Laser Scanning -
ZEB-REVO
The ZEB-REVO is the latest, lightweight
revolving laser scanner from GeoSLAM.
Handheld, pole-mounted or attached to a
mobile platform, the ZEB-REVO can
record more than 40,000 measurement
points per second from the survey
environment.
NEW ZEB-CAM
The new ZEB-CAM is an optional upgrade
for standard ZEB-REVO systems. Simply
attach ZEB-CAM to the underside of a
standard REVO and begin scanning
immediately.
The ZEB-CAM captures live video footage
of the survey environment and adds
contextual video and imagery to scan data
to aid feature identification.
Optical flow technology is utilised to
accurately synchronise the video and scan
together in GeoSLAM's Desktop software.
http://www.3dlasermapping.com/zeb-revo-
handheld-laser-scanning/
https://youtu.be/k8q5xr_eLgk
GeoSlamvs.Leica Portablescanningquality
http://dx.doi.org/10.1117/12.2270761
The paper investigates the performances of two portable
mobile mapping systems (MMSs), the handheld GeoSLAM
ZEB-REVO and Leica Pegasus:Backpack, in two typical
user-case scenarios: an indoor two-floors building and an
outdoor open city square.
Note! This paper would have
been even nicer with a
‘gold standard’ giving the
“correct measurements”
instead of just comparing
two “good enough” scanners.
ResearchScanners SensorFusion
The Indoor Multi-sensor Acquisition System
(IMAS) presented in this paper consists of a wheeled
platform equipped with two 2D laser heads, RGB
cameras, thermographic camera, thermohygrometer,
and luxmeter. One of the laser scanning sensors is
foreseen to obtain the building map and the navigation
information, and the other one to the 3D environment
reconstruction. The thermographic and optical
images, and the geometric and comfort data are
synchronized and automatically linked to trajectory
positions, so that they are georeferenced in the
building in terms of a relativepositioning system Software interface for virtual immersive navigation and ex situ data analysis.
http://dx.doi.org/10.3390/s16060785
AppliedPointCloud Scans Accessibility
Point Clouds to Indoor/Outdoor Accessibility
Diagnosis
J. Balado, L. Díaz-Vilariño, P. Arias, I. Garrido
https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W4/287/2017/isprs-annals-IV-2-
W4-287-2017.pdf
This work presents an approach to automatically detect structural floor elements such as steps or ramps in
the immediate environment of buildings, elements that may affect the accessibility to buildings. The
methodology is based on Mobile Laser Scanner (MLS) point cloud and trajectory information. The
methodology is tested in a real case study, consisting of 100 m of an urban street. Ground elements are
correctly classified in an acceptable computation time. Steps and ramps also are exported to GIS software to
enrich building models from Open Street Map with information about accessible/inaccessible entrances and
their locations.
http://www.wired.co.uk/article/wayfindr-app
A project initiated by the Royal London Society for the
Blind's (RLSB) Youth Forum has led to the prototyping of
a new app called Wayfindr, which has been built especially
to help blind and partially sighted people use London's
transport network independently. The app relies on
smartphones and iBeacons and has been developed in
collaboration with global digital product design studio
ustwo
Our Open Standard gives you
the tools to create inclusive
and consistent experiences for
your vision impaired
customers. From transport
networks and shopping
centres, to hospitals and any
other indoor space - we can
help. Through our on-site trials
and consultancy we will work
together with you to
understand how digital
wayfinding can make your
estate accessible.
https://www.wayfindr.net/
Post-processing
Rawpointcloudsaremassiveandpossiblycontain alotof
redundantdatapoints
DataQuality compromisebetweenfilesize,computationaltimeandquality
3D model reconstruction from point cloud processed either with OpenSFM,
VisualSFM or Pix4D (top row) to mesh model (middle row) to final textured 3D
model (bottom row) across a series of downsampled Sky Ranger UAV including full
resolution (first column) half resolution (second column) and quarter resolution (last
column).
Bolick and Harguess (2016), http://dx.doi.org/10.1117/12.2224677
Garbage in – garbage out true like as always. The
more high-quality images / points you have as input, the
higher the reconstruction quality will obviously be.
Top-left: points sampled on a sphere and corrupted
with a lot of noise. Top-right: reconstructed surface
mesh. Bottom-left: smoothed point set. Bottom-
right: reconstructed surface mesh.
Reconstruction error (mm) against number of points
for the Bimba con Nastrino point set with 1.6M points
as well as for simplified versions.
CGAL 4.10 - Poisson Surface Reconstruction
The sensitivity of biological finite element models to the
resolution of surface geometry: a case study of
crocodilian crania: “Example of the simplified models. C.
moreletti models composed of 20k, 30k, 90k and 300k
surface (mesh) elements.”
https://doi.org/10.7717/peerj.988
point cloud & mesh processing
MAY 27 2017, posted by Taylor Wang
The final goal is to get a fully editable NURBS CAD
model so that it can be modified by any CAD
software to improve the design or reproduce the
product.
PointCloudLibray(PCL) The mostpopular open-sourcelibrary
http://unanancyowen.com/en/pcl-with-velodyne/
https://www.youtube.com/watch?v=7BUFxkyH1r0
https://doi.org/10.1109/MRA.2012.2206675
Cited by 186 articles - see Related articles
Otherlibraries CGALandresearchcode
Driftcorrection forproperimageregistration
https://doi.org/10.1109/ROBOT.2010.5509312
Correcting for drift (distortion) between different
scans or overlapping point clouds with added
velocity information for ICP (Iterative Closest Point)
algorithm.
(a) is a given environment. Blue points in (b) shows distortion of
the scan, and red points in (b) show compensated scan.
Transformation estimated using distorted data includes inevitable
errors(c). Transformation estimated from the rectified scan gives
us more accurate results(d).
Kaarta - Common point cloud registration issues
http://www.kaarta.com/cloud-registration-issues/
Published: 8 March 2017
http://dx.doi.org/10.3390/s17030539
Keywords: LiDAR; inertial measurement unit; iterative closest
point; iterated sigma point Kalman filter; time delay calibration
DataReduction andsimplificationfor storage
Imran Ashraf ; Soojung Hur ; Yongwan Park
https://doi.org/10.1109/ACCESS.2017.2699686
LIDAR produces large point cloud, but, while generating
images for limited field of view, data sparsity results in poor
quality images. Moreover, 3D to 2D data transformation also
involves data reduction, which further deteriorates the
quality of images.
http://dx.doi.org/10.1117/12.2270833
31 October 2016
https://doi.org/10.1109/TIP.2016.2623488
https://www.google.com/patents/US9582939
https://arxiv.org/abs/1609.00893
Keywords: Tensor networks, Function-related tensors, CP decomposition,
Tucker models, tensor train (TT) decompositions, matrix product states (MPS),
matrix product operators (MPO), basic tensor operations, multiway component
analysis, multilinear blind source separation, tensor completion,
linear/multilinear dimensionality reduction, large-scale optimization problems,
symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear
equations, pseudo-inverse of very large matrices, Lasso and Canonical
Correlation Analysis (CCA)
https://doi.org/10.1016/j.isprsjprs.2016.06.012
In-base point cloud management pipeline in the point cloud server (PCS).
DataReduction CompressiongPointClouds
Dynamic polygon cloud compression
Eduardo Pavez ; Philip A. Chou (2017)
https://doi.org/10.1109/ICASSP.2017.7952694
We introduce a compressible representation of 3D
geometry (including its attributes, such as color texture)
intermediate between polygonal meshes and point clouds
called a polygon cloud. Polygon clouds, compared to
polygonal meshes, are more robust to live capture noise
and artifacts. Furthermore, dynamic polygon clouds,
compared to dynamic point clouds, are easier to
compress, if certain challenges are addressed. In this
paper, we propose methods for compressing dynamic
polygon clouds using transform coding of color and
motion residuals.
Real-time compression of point cloud
streams
Julius Kammerl ; Nico Blodow ; Radu Bogdan Rusu ;
Suat Gedikli ; Michael Beetz ; Eckehard Steinbach
(2012)
https://doi.org/10.1109/ICRA.2012.6224647
We present a novel lossy compression approach for point
cloud streams which exploits spatial and temporal
redundancy within the point data. Our proposed compression
framework can handle general point cloud streams of
arbitrary and varying size, point order and point density.
Furthermore, it allows for controlling coding complexity and
coding precision. To compress the point clouds, we perform
a spatial decomposition based on octree data structures.
3D Reconstruction Framework for
Multiple Remote Robots on Cloud
System
Phuong Minh Chu, Seoungjae Cho, Simon Fong, Yong Woon
Park and Kyungeun Cho (2017)
http://dx.doi.org/10.3390/sym9040055
This paper proposes a cloud-based framework that
optimizes the three-dimensional (3D) reconstruction of multiple
types of sensor data captured from multiple remote robots. A
working environment using multiple remote robots requires
massive amounts of data processing in real-time, which cannot
be achieved using a single computer. In the proposed
framework, reconstruction is carried out in cloud-based servers
via distributed data processing.
Data-drivenprocessing
Likein allthefieldsofcomputervision,real-timescanning,post-
processingandsemanticunderstandingareimprovedwith
recent deeplearningandartificial intelligencetechniques
DeepLearningbeyondnon-euclidean problems
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, andPierre Vandergheynst
https://doi.org/10.1109/MSP.2017.2693418
https://arxiv.org/abs/1705.10819
DeepLearningPointclouds
https://arxiv.org/abs/1704.03847
https://arxiv.org/abs/1705.03428
DeepLearningPointNet++
PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas
Stanford University, (Submitted on 7 Jun 2017)
https://arxiv.org/abs/1706.02413
Illustration of our hierarchical feature learning architecture and its application for set segmentation and classification using points in 2D
Euclidean space as an example. Single scale point grouping is visualized here.
Left: Point cloud with random point
dropout.
Right: Curve showing advantage of
our density adaptive strategy in
dealing with non-uniform density.
DP means random input dropout
during training; otherwise training is
on uniformly dense points
Scannet labeling results. PointNet captures the
overall layout of the room correctly but fails to
discover the furniture. Our approach, in contrast,
is much better at segmenting objects besides
the room layout.
DeepLearning2DFeatureDescriptors
Instead of using the old-school SIFT, SURF, ORB, etc., the
feature descriptor / matching can be done with data-driven
deep learning network as well
Note This model was trained with SfM data, which does not have strong
rotation changes. Newer models work better in this case, which will be
released soon. In the meantime, you can also use the models in the
learn-orientation, benchmark-orientation.
https://github.com/cvlab-epfl/LIFT
https://arxiv.org/abs/1603.09114 | Cited by 23 Related articles
DeepLearning3DFeatureDescriptors
https://arxiv.org/abs/1706.04496
We present a view-based convolutional network that produces local, point-based shape descriptors.
The network is trained such that geometrically and semantically similar points across different 3D
shapes are embedded close to each other in descriptor space (left). Our produced descriptors are
quite generic — they can be used in a variety of shape analysis applications, including dense
matching, prediction of human affordance regions, partial scan-to-shape matching, and shape
segmentation (right).
In contrast to findings in the image analysis community where learned 2D
descriptors are ubiquitous and general (e.g. LIFT), learned 3D descriptors have
not been as powerful as 2D counterparts because they (1) rely on limited training
data originating from small-scale shape databases, (2) are computed at low spatial
resolutions resulting in loss of detail sensitivity, and (3) are designed to operate on
specific shape classes, such as deformable shapes.
We generate training correspondences
automatically by leveraging highly structured
databases of consistently segmented shapes
with labeled parts. The largest such database
is the segmented ShapeNetCore dataset [
Yi et al. 2016, https://www.shapenet.org/] that
includes 17K man-made shapes distributed in
16 categories
Meshgenerativeshapeswith GAN
https://arxiv.org/abs/1705.02090
Our key insight is that 3D shapes are effectively
characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as
adjacency and symmetry. We develop a recursive neural net
(RvNN) based autoencoder to map a flat, unlabeled, arbitrary
part layout to a compact code. The code effectively captures
hierarchical structures of man-made 3D objects of varying
structural complexities despite being fixed-dimensional: an
associated decoder maps a code back to a full hierarchy. The
learned bidirectional mapping is further tuned using an
adversarial setup to yield a generative model of plausible
structures, from which novel structures can be sampled.
It would be interesting to thoroughly investigate the effect
of code length on structure encoding. Finally, it is worth
exploring recent developments in GANs, e.g. Wasserstein
GAN [Arjovsky et al. 2017], in our problem setting. It would
also be interesting to compare with plain VAE and other
generative adaptations.
PointCloud generativeGANsforpointclouds #1a
https://arxiv.org/abs/1707.02392
We build an end-to-end pipeline for 3D point clouds that uses an autoencoder (AE) to
create a latent representation, and a Generative Adversarial Networks (GAN) to generate
new samples in that latent space. Our AE is designed with a structural loss tailored to
unordered point clouds. Our learned latent space, while compact, has excellent class-
discriminative ability: per our classification results, it outperforms recent GAN-based
representations by 4.3%. In addition, the latent space allows for vector arithmetic, which
we apply in a number of shape editing scenarios, such as interpolation and structural
manipulation.
We argue that jointly learning the representation and training the GAN is unnecessary for
our modality. We propose a workflow that first learns a representation by training an AE
with a compact bottleneck layer, then trains a plain GAN in that fixed latent
representation. One benefit of this approach is that AEs are a mature technology: training
them is much easier and they are compatible with more architectures than GANs. We
point to theory that supports this idea, and verify it empirically: we show that GANs
trained in our learned AE-based latent space generate visibly improved results,
even with a generator and discriminator as shallow as a single hidden layer. Within a
handful of epochs, we generate geometries that are recognized in their right object class at
a rate close to that of ground truth data. Importantly, we report significantly better diversity
measures (10x divergence reduction) over the state of the art, establishing that we cover
more of the original data distribution. In summary, we contribute.
● An effective cross-category AE-based latent representation on point clouds.
● The first (monolithic) GAN architecture operating on 3D point clouds.
● A surprisingly simpler, state-of-the-art GAN working in the AE’s latent space.
1) Autoencoder
For fixed latent representation
Vector arithmetic
2) Generative Adversarial Network
Using the fixed latent representation
In our latent-space GAN, instead of operating on the raw point cloud input, we pass the data through
our pre-trained autoencoder, trained separately for each object class with the Earth Mover’s distance
(EMD) loss function. Both the generator and the discriminator of the GAN then operate on the 512-
dimensional bottleneck variable of the AE. Finally, once the GAN training is over, the output of the
generator is decoded to a point cloud via the AE decoder. We found that very shallow designs for both
the generator and discriminator (in our case, 1 hidden layer for the generator and 2 for the
discriminator) are sufficient to produce realistic results
PointCloud generativeGANsforpointclouds #1b
Interpolating between different point clouds, using our latent
space representation. Note the interpolation between
structurally and topologically different shapes.
Generative results using our latent-space GAN. Note the
variability and fidelity of the result.
For a recap on GANs, you could see for example:
https://arxiv.org/abs/1701.07875
Cited by 106 - Related articles
What does GANs for point clouds mean in practice?
Point-cloud super-resolution (e.g. Ledig et al. 2016 for natural images), to improve
model appearance (e.g. remove staircasing), and inpainting (e.g. Iizuka et al. 2017)
to handle occlusion and gaps from indoor scans (“shape completion”). “Visual
plastic surgery” in other words (Tung et al. 2017)
Sung et al. (2015)
Data-driven Structural Priors for Shape Completion
Mönch et al. (2010)
Staircase-Aware Smoothing of Medical Surface Meshes
HardwarePointCloud Super-resolution multiplescans
https://doi.org/10.2312/SPBG/SPBG06/009-015
Cited by 47 articles
On the left, one scan of the the parrot
statue, with a sample spacing of
about 1mm. Center, we combine 100
nearly identical such scans to
produce the surface in the center,
produced on a grid with sample
spacing of about 0.3mm. Notice the
noise reduction and the improvement
in the detail, for instance in the face,
neck and wing feathers. On the right,
a photograph of the parrot statue.
Super-resolution reconstruction
using only 30 input scans at the left
and increasing to 140 at the right.
Noise is reduced dramatically at the
beginning but more slowly at the end.
Surfaces were reconstructed from
subsets which were pre-registered
using all 140 scans.
For absolute measurement accuracy (e.g. Biljecki et al. 2017), one can scan the same space multiple times
A thin strip of the super-resolved
surface, and the nearby sample
points from the input scans. The
input is very noisy, but the points are
densely and randomly distributed
near the surface with few outliers, so
the average gives an accurate
representation of the surface.
(a) One scan. (b) Final super-resolved surface from 100 scans. (c) Photo of
the object (a plaster cast of a subway token). The bottom row shows some
results of other kinds of processing, to evaluate the importance of the various
steps of the algorithm. (d) One scan, bilinearly interpolated onto the finer grid
and smoothed. Detail is missing. (e) The entire algorithm except for the final
bilateral filtering step. The noise removed by the filtering seems to be residual
registration error, which perhaps could be improved. (f) Just averaging 100
scans taken without moving the scanner, using the same Gaussian kernel. Noise
is decreased, but there is aliasing from the lower-resolution grid obscuring detail
visible in (b).
DeepLearningSuper-Resolution
Plentyofoptionsforimage/video/volumesuper-resolution
https://arxiv.org/abs/1706.03142
https://arxiv.org/abs/1704.02738
https://arxiv.org/abs/1704.02470 https://arxiv.org/abs/1612.00085
Novel texture enhancement framework
creates an HR style image that is rich in
details, which can be used to restore
high-frequency texture details back into
the initial HR image via the style transfer
algorithm.
Four examples of SR results for nearest
neighbor and cubic interpolation, the
best-performing sparse coding, 3D-
FSRCNN, and 3D-SRU-Net
configurations. Arrows indicate regions
in which at least one SR result mis-
interprets a cell boundary or an
ultrastructural feature. Scale bar 500
nm.
Our method includes a sub-pixel
motion compensation (SPMC) layer
that can better handle inter-frame
motion for this task. Our detail
fusion (DF) network that can
effectively fuse image details from
multiple images after SPMC
alignment
Point-cloudsuper-resolution
Upsampling‘on-the-fly’toavoid“dataexplosion”?
Jason Schreier
4/17/17 12:05pm Horizon Zero Dawn, Kotaku
http://kotaku.com/horizon-zero-dawn-uses-all-sorts-
of-clever-tricks-to-lo-1794385026
Games like this don’t just look incredible because of ‘hyper-realism’
but because their engineers use all sorts of tricks [LOD’ing, or Level
of Detail; Mipmapping; frustum culling, etc.] to save memory.
The engine is designed to produce models in CityGML and does so in multiple
LODs. Besides the generation of multiple geometric LODs, we implement the
realisation of multiple levels of spatiosemantic coherence, geometric reference
variants, and indoor representations. The datasets produced by Random3Dcity
are suited for several applications, as we show in this paper with documented
uses. The developed engine is available under an open-source licence at Github
at http://github.com/tudelft3d/Random3Dcity
http://doi.org/10.5194/isprs-annals-IV-4-W1-51-2016
Filip Biljecki, Hugo Ledoux, Jantien Stoter
Level of detail texture filtering with dithering
and mipmaps US 5831624 A
Original Assignee 3Dfx Interactive Inc
https://www.google.com/patents/US5831624
Level-of-detail rendering: colors identify different
subdivision levels as stated in the top left corner.
Feature-Adaptive Rendering of Loop
Subdivision Surfaces on Modern GPUs
November 2014 DOI: 10.1007/s11390-014-1486-x
ManyLoDs: Parallel Many-View
Level-of-Detail Selection for Real-
Time Global Illumination
Matthias Hollander, Tobias Ritschel, Elmar Eisemann, Tamy Boubekeur
(2011) http://dx.doi.org/10.1111/j.1467-8659.2011.01982.x
3DContentgeneration VolumetricCapture
Generatecontentbyscanningreal-lifescenesandobjects
Kul Wadhwa's and Roddy O'Hara's Uncorporeal
http://www.uncorporeal.com/
Uncorporeal: volumetric capture systems for VR & AR content
creation. The team includes a technical Oscar-winner and
engineering and product leadership from WETA, Google X, Lucas
ILM, and Wikimedia.
https://venturebeat.com/2016/10/13/pathbreaker-ventures-raises-12-milli
on-to-invest-in-emerging-tech-such-as-vr-ar-and-robotics/
Ryan Gembala, founder of Pathbreaker Ventures
believes connected homes and cars and
autonomous vehicles will create a lot of
opportunities in vertical applications for startups.
And he also thinks that space technologies such as
small satellites, analysis of space-captured data,
consumer transport, space mining, and others are
interesting.
REALITYVIRTUAL.CO - A NEW ZEALAND BASED
CREATIVE TECHNOLOGIES RESEARCH &
DEVELOPMENT COLLECTIVE WITH AN ENTHUSIAST
TOWARDS THE VISUAL REALM:
● unique post production & signal processing techniques
including the development of deep learning image
enhancement & automation throughout our 3D pipeline
for PBR workflow
● strong emphasis on advanced robotics & autonomous
operations for large data acquisition of 3D
environments.
3D Scene Creation with Photogrammetry
3DContentgeneration Automaticphotorealism#1
Stillcanbequitelabor-intensivetocreaterealisticcontent
Get to know Rense de Boer, a technical art director from
Sweden, who is not only pushing the envelope of photo-real
CGI environments, but he’s doing it all in a real-time engine!
Art by Rens
https://news.developer.nvidia.com/artist-spotlight-creating-photorealistic-cgi-environments-in-real-time/
https://www.youtube.com/watch?v=bXouFfqSfxg
One Ph.D. position (supervision by Profs Niessner and Rüdiger
Westermann) is available at our chair in the area of photorealistic rendering
for deep learning and online reconstruction
Research in this project includes the development of photorealistic realtime rendering
algorithms that can be used in deep learning applications for scene understanding, and for
high-quality scalable rendering of point scans from depth sensors and RGB stereo image
reconstruction. If you are interested in applying, you should have a strong background in
computer science, i.e., efficient algorithms and data structures, and GPU programming,
have experience implementing C/C++ algorithms, and you should be excited to work on
state-of-the-art research in the 3D computer graphics.
https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le
arning-and-online-reconstruction.html
Ph.D. Position – Photorealistic Rendering for
Deep Learning and Online Reconstruction
3DContentgeneration Automaticphotorealism#2
ConvertingLiDARscanstovisuallyhighquality3Dcontent
Atom View is a new piece of software that allows content creators to
translate real-world scans into assets for virtual environments. Not only
does it aim to produce realistic results but also reduce the workflow for
content creation. The standalone app takes files captured from
volumetric cameras, offline graphics renderers, 360 lidar and more.
Volumetric capture is a promising area of development that could one day
allow content creators to skip over several of the more laborious steps of
traditional 3D content creation with better results. With Atom View, users can
even edit objects once they’ve been imported.
https://youtu.be/YxRI_3gKP8g
3DContentgeneration Styletransfer formaps
Neural Networks and The Future of 3D Procedural Content Generation
by Sam Snider-Held, Creative Technologist at MediaMonks, focusing on the intersection of AR, VR, AI, UX, and
Style transfer output on the left, real terrain on the right. Both are planes
whose vertices are being displaced by the height map texture.
Now was time to create my own style transfer light field and light field renderer. I
basically reimplemented Andrew Lowndes’ WebGl light field renderer in Unity.
What this post demonstrates is the idea that neural network could
radically change how we generate 3D content. I went with light fields
because currently my GPU is not fast enough to style transfer or any
other generative network at 60 FPS. But if we do get to that point, it’s
entirely possible see generative neural networks become an alternative
rendering pipe line to the standard rasterization approach. In this way,
neural networks could generate each frame of a game in real time,
based on realtime feedback from the user.
But it also potentially allows for a much more powerful creative approach, for
the creator and the end user. Imagine playing Gears of War, but then telling the
computer “Keep the gameplay, story, and 3d models, but make it look like
Zelda: Breath of the Wild.” This is how creating or playing a future gaming
experience could be, all because computers now know what things “look like”
and can make other things “look like” them too.
3DContentgeneration from Videoto3D
Production-Level Facial Performance Capture Using Deep
Convolutional Neural Networks In Proceedings of SCA'17, Los Angeles,
CA, USA, July 28-30, 2017
http://research.nvidia.com/publication/facial-performance-capture-deep
-neural-networks
Samuli Laine, Tero Karras, Timo Aila, Antti Herva (Remedy
Entertainment), Shunsuke Saito (Pinscreen, University of Southern
California), Ronald Yu (Pinscreen, University of Southern California), Hao
Li (USC Institute for Creative Technologies, University of Southern
California, Pinscreen), Jaakko Lehtinen (NVIDIA, Aalto University)
NVIDIA and game developer Remedy (Alan Wake, Quantum Break) showcased their
team-up solution to streamlining motion capture and animation using a deep learning
neural network, running on NVIDIA’s powerful DGX-1 server. After being “trained” with
information on previously produced animations, the network is able to generate
sophisticated 3D facial animation from videos of live actors, greatly alleviating the
time and labor burden of traditional mo-cap animation — it can even learn enough to
generate facial animation from just an audio clip. The companies believe this system
could eventually produce animation that’s just as good or better than traditionally
produced fare.
http://www.animationmagazine.net/events/siggraph-facial-animation-advances-fabri
c-engine-the-french-contingent/
“We present a real-time deep learning framework for video-based facial
performance capture -- the dense 3D tracking of an actor's face given a monocular
video. Our pipeline begins with accurately capturing a subject using a high-end
production facial capture pipeline based on multi-view stereo tracking and artist-
enhanced animations.
With 5-10 minutes of captured footage, we train a convolutional neural network to
produce high-quality output, including self-occluded regions, from a monocular
video sequence of that subject. Since this 3D facial performance capture is fully
automated, our system can drastically reduce the amount of labor involved in the
development of modern narrative-driven video games or films involving realistic
digital doubles of actors and potentially hours of animated dialogue per character. “
3DContentgeneration from Video(&Audio) toVideo
Face2Face: Real-time Face Capture and Reenactment of RGB Videos
Justus Thies1
Michael Zollhöfer 2
Marc Stamminger 1
Christian Theobalt 2
Matthias Nießner 3
1
University of Erlangen-Nuremberg2
Max Planck Institute for Informatics 3
Stanford University
http://www.graphics.stanford.edu/~niessner/thies2016face.html
https://doi.org/10.1109/CVPR.2016.262
Neural Face Editing
with Intrinsic Image
Disentangling
Zhixin Shu, Ersin Yumer,
Sunil Hadap, Kalyan Sunkavalli,
Eli Shechtman, Dimitris Samaras
(Submitted on 13 Apr 2017)
https://arxiv.org/abs/1704.04131
University of Washington researchers have developed new
algorithms that solve a thorny challenge in the field of computer
vision: turning audio clips into a realistic, lip-synced video of the
person speaking those words.
As detailed in a paper to be presented Aug. 2 at SIGGRAPH 2017,
the team successfully generated highly-realistic video of former
president Barack Obama talking about terrorism, fatherhood, job
creation and other topics using audio clips of those speeches and
existing weekly video addresses that were originally on a different
topic.
Synthesizing Obama: learning lip sync
from audioSupasorn Suwajanakorn, Steven M. Seitz,
Ira Kemelmacher-Shlizerman
ACM Transactions on Graphics (TOG), Volume 36 Issue 4,
July 2017, https://doi.org/10.1145/3072959.3073640
http://www.washington.edu/news/2017/07
/11/lip-syncing-obama-new-tools-turn-a
udio-clips-into-realistic-video/
3DContentgeneration Styletransfer formaps
Neural Networks and The Future of 3D Procedural Content Generation
by Sam Snider-Held, Creative Technologist at MediaMonks, focusing on the intersection of AR, VR, AI, UX, and
Style transfer output on the left, real terrain on the right. Both are planes
whose vertices are being displaced by the height map texture.
Now was time to create my own style transfer light field and light field renderer. I
basically reimplemented Andrew Lowndes’ WebGl light field renderer in Unity.
What this post demonstrates is the idea that neural network could
radically change how we generate 3D content. I went with light fields
because currently my GPU is not fast enough to style transfer or any
other generative network at 60 FPS. But if we do get to that point, it’s
entirely possible see generative neural networks become an alternative
rendering pipe line to the standard rasterization approach. In this way,
neural networks could generate each frame of a game in real time,
based on realtime feedback from the user.
But it also potentially allows for a much more powerful creative approach, for
the creator and the end user. Imagine playing Gears of War, but then telling the
computer “Keep the gameplay, story, and 3d models, but make it look like
Zelda: Breath of the Wild.” This is how creating or playing a future gaming
experience could be, all because computers now know what things “look like”
and can make other things “look like” them too.
3DContentgeneration Styletransfer tofantasy
https://uploadvr.com/google
-tango-app-turns-real-world
-into-the-matrix/
The Tango Martix Scanner from VR
and AR developer Null Real uses the
special cameras fitted to Tango-ready
Android phones to turn the walls,
floors and ceilings of the environment
around you into the virtual data
streams that Keanu Reeves sees
towards the end of the legendary 1999
sci-fi flick.
InteractiveContentgeneration
IEEE Transactions on Affective Computing > Volume: 2 Issue: 3
Experience-Driven Procedural Content Generation
Date of Publication: 05 April 2011 https://doi.org/10.1109/T-AFFC.2011.6
“Procedural content generation (PCG) is an increasingly important area of
technology within modern human-computer interaction (HCI) design.
Personalization of user experience via affective and cognitive modeling, coupled
with real-time adjustment of the content according to user needs and
preferences are important steps toward effective and meaningful PCG. Games,
Web 2.0, interface, and software design are among the most popular applications
of automated content generation. “
Emotion in Games Pp 155-166
Part of the Socio-Affective Computing book series (SAC, volume 4)
Emotion-Driven Level Generation
Julian Togelius, Georgios N. Yannakakis
https://doi.org/10.1007/978-3-319-41316-7_9
AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment
Targeting Horror via Level and Soundscape
Generation
Phil Lopes, Antonios Liapis, Georgios N. Yannakakis
This paper presented improvements to the Sonancia system, a multi-faceted
level generator for the horror genre. The additions include a level generation
system that optimizes towards a designer-defined tension curve, while still
providing a degree of variability. The paper also presented some initial
methodologies for creating soundscapes of generated levels by directly using the
distribution of monsters in the level’s path from the starting player position to the
goal. Several experiments studied the impact of designer tension curves on level
generation and sonification, as well as the efficiency of the GA in generating larger
maps.
and adaptive to user behavior. Great strides have already been made
with motion capture, haptics, eye-tracking, and natural language
processing. What has been missing is a serious effort to link mixed
reality to the ultimate computing platform—the human brain.
Our add-ons give your AR/VR Headset eye tracking
superpowers.
Pupil Labs. https://pupil-labs.com/vr-ar/
3DDatasets
Point clouds,meshes,andRGB+D(epth)
PointClouds BenchmarkDataset
http://semantic3d.net/ | ”ImageNet of point clouds” | Dataset for semantic segmentation of unordered point cloud data
What do we
provide?
We have created a framework for the fair
evaluation of semantic classification in
3D space. In this framework we provide:
● A large set of point clouds with
over one billion of labelled points.
● Ground truth, hand-labelled by
professional assessors.
● A common evaluation tool providing
the established intersection-union
measure along with the full confusion
matrix.
semantic-8
semantic-8 is a benchmark for classification with 8 class labels, namely {1: man-made terrain, 2: natural terrain, 3: high
vegetation, 4: low vegetation, 5: buildings, 6: hard scape, 7: scanning artefacts, 8: cars}. An additional label {0: unlabeled points}
marks points without ground truth and should not be used for training! In total over a billion points are provided. Please check
out the reduced benchmark if your method is too computational demanding for the full data set.
PointClouds Syntheticdatasets
You could always mesh point clouds, and convert meshes to point clouds.
https://arxiv.org/abs/1702.08558
https://arxiv.org/abs/1706.06782
https://arxiv.org/abs/1703.06907
https://arxiv.org/abs/1505.00171
SynthCam3D is a library of synthetic indoor
scenes collected from various online 3D
repositories and hosted at
http://robotvault.bitbucket.org.
PointClouds Indoor Dataset
Announcing the Matterport3D Research Dataset
https://hackernoon.com/announcing-the-matterport3d-research-dataset-815cae932939
We’re excited that groups at Stanford, Princeton, and TUM have painstakingly hand-labeled a
wide range of spaces offered up by customers and made these labeled spaces public in the form
of the Matterport 3D dataset.
This dataset contains 10,800 aligned 3D panoramic views (RGB + depth per pixel) from
194,400 RGB + depth images of 90 building-scale scenes. All of these scenes were captured with
Matterport’s Pro 3D Camera. The 3D models of the scenes have been hand-labeled with instance-
level object segmentation. If you’re passionate about 3D and interested in an even bigger dataset,
Matterport internally has roughly 7500x as much 3D data than is in this dataset
You can access the dataset and sample code here and read the paper here.
We’d like to thank Angel Chang, Angela Dai, Thomas Funkhouser,
Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song,
Andy Zeng, Yinda Zhang for their work in labeling this dataset and
developing algorithms to run on it. We’d also like to thank all the
Matterport camera owners who gave us permission to include
their 3D models in this dataset.
MeshDataset ModelNet
The goal of the Princeton ModelNet project is to provide researchers in
computer vision, computer graphics, robotics and cognitive science, with a
comprehensive clean collection of 3D CAD models for objects. To build the
core of the dataset, we compiled a list of the most common object categories
in the world, using the statistics obtained from the SUN database. Once we
established a vocabulary for objects, we collected 3D CAD models belonging
to each object category using online search engines by querying for each
object category term. Then, we hired human workers on Amazon Mechanical
Turk to manually decide whether each CAD model belongs to the specified
cateogries, using our in-house designed tool with quality control. To obtain a
very clean dataset, we choose 10 popular object categories, and manually
deleted the models that did not belong to these categories. Furthermore, we
manually aligned the orientation of the CAD models for this 10-class subset as
well. We provide both the 10-class subset and the full dataset for download.
http://modelnet.cs.princeton.edu/
Skolkovo Institute of Science and Technology
https://arxiv.org/abs/1704.01222
A Kd-tree built on the point cloud of eight
points (left), and the associated Kd-network
built for classification (right). We number
nodes in the Kd-tree from the root to leaves.
Therefore leaf nodes, which correspond to
original points, are numbered starting from 8.
The arrows indicate information flow during
forward pass (inference). The leftmost bars
correspond to leaf (point) representations. The
rightmost bar corresponds to inferred class
posteriors v0. Circles correspond to linear
(affine) transformations with learnable
parameters. Colors of the circles indicate
parameter sharing, as splits of the same type
(same orientation, same tree level – three
“green” splits in this example) share the
transformation parameters.
MeshDataset ShapeNet
https://arxiv.org/abs/1512.03012
Overview https://www.shapenet.org/
ShapeNetCore is a subset of the full ShapeNet dataset with single clean 3D
models and manually verified category and alignment annotations. It covers 55
common object categories with about 51,300 unique 3D models. The 12 object
categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset,
are all covered by ShapeNetCore.
ShapeNetSem is a smaller, more densely annotated subset consisting of 12,000
models spread over a broader set of 270 categories. In addition to manually
verified category labels and consistent alignments, these models are annotated
with real-world dimensions, estimates of their material composition at the
category level, and estimates of their total volume and weight.
ShapeNet Model Viewer and Renderer
https://github.com/ShapeNet/shapenet-viewer
This Java+Scala code was used to render the ShapeNet model screenshots and
thumbnails. It can handle loading of OBJ+MTL, COLLADA DAE, KMZ, and PLY
format 3D meshes.
This is a realtime OpenGL-based renderer. If you would like to use a raytracing
framework for rendering, then a fork of the Mitsuba renderer has been created by
Jian Shi to handle ShapeNet models.
MeshCorrespondence “Googlingshapes”
We are organizing a large-scale 3D shape retrieval contest as
part of the Eurographics 2017 3D Object Retrieval Workshop.
More information available www.shapenet.org/shrec17.
MeshCorrespondence “inPractice
You could for example had a 3D CAD model (or a whole architectural BIM model), and you would like to search for similar parts
from GrabCad, TraceParts, Thingiverse or Pinshape, and build more value on top of that for example via shape completion,
finite element analysis (FEM), generative design or manufacturability for CNC milling / 3D printing.
Learning Localized Geometric Features
Using 3D-CNN: An Application to
Manufacturability Analysis of Drilled Holes
Aditya Balu, Sambit Ghadai, Kin Gwn Lore, Gavin Young,
Adarsh Krishnamurthy, Soumik Sarkar
https://arxiv.org/abs/1612.02141
3D convolutional neural network for the classification of
whether or not a design is manufacturable [ design for
manufacturability (DFM)]. In this example, a block with
a drilled hole with specific diameter and depth is
considered.
Automated Design for Manufacturing and Supply Chain Using Geometric
Data Mining and Machine Learning
Hoefer, Michael Jeffrey Daniel. Iowa State University, Master’s thesis
https://search.proquest.com/openview/aaa80836db1abd17b6414d1f9c65349e/1
2015 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics)
CAD Parts-Based Assembly Modeling by Probabilistic Reasoning
Kai-Ke Zhang ; Kai-Mo Hu ; Li-Cheng Yin ; Dong-Ming Yan ; Bin Wang, 26-28 Aug. 2015
https://doi.org/10.1109/CADGRAPHICS.2015.29
RGBD Dataset ScanNet
RGBD Dataset SceneNet
OwnDatasets Labeling isverytime-consuming
Both in academic research, and in industry the labelled data is extremely valuable. In academia, it is easy to use the same
datasets to benchmark the performance of the models, but in practice in business, another way to improve the
performance is to simply add more labelled data to your proprietary dataset. To optimize the labeling efforts, several
frameworks have been developed:
https://arxiv.org/abs/1707.04796
Adriana Kovashka, Olga Russakovsky, Li Fei-Fei and Kristen Grauman (2016),
"Crowdsourcing in Computer Vision", Foundations and Trends®
in Computer Graphics and Vision: Vol. 10: No. 3, pp 177-243.
http://dx.doi.org/10.1561/0600000071
What can I use to quickly build a labeling tool for my training data?
I need to label my training data - web documents in my case, but it actually
does not matter. Is there a generic framework or tool that I can use to quickly
build UI for a labeling tool for a particular kind of data like documents, images
and etc? Ideally the tool should allow multiple people to label the same data
set.
- > CrowdFlower, datapure; pilab-annotator / pylabelme (stackoverflow.com)
3D Mesh Labeling via Deep Convolutional Neural Networks
Guo et al. (2015), ACM Transactions on Graphics (TOG) Volume 35 Issue 1,
December 2015, https://doi.org/10.1145/2835487
Learning Hierarchical Shape Segmentation and Labeling from
Online Repositories
Li Yi, Leonidas Guibas, Aaron Hertzmann, Vladimir G. Kim, Hao Su, Ersin Yumer;
(Submitted on 4 May 2017) https://arxiv.org/abs/1705.01661
FutureProspects
Whatdoestheseemerging scanning technologiesmean in
practiceforrealestateandconstruction
AR/VR Therolein construction?
What’s next for VR in construction?
Published: 15 March 2017, Tridify’s Nigel Alexander
http://aecmag.com/59-features/1296-what-s-next-for-vr-in-construction
(top) Using 4D simulation, VR can help improve safety on
construction sites, (bottom) VR can be used to identify potential
hazards in office environments
This technology will simultaneously empower the sales and marketing side of the
construction industry, by enabling developers to showcase projects early on. It will
also help save costs by reducing wastage and rework, by making it easier for all
parties to collaborate on the design and layout of buildings.
VR meets the IoT
As smart cities become increasingly popular with developers, we’ll be creating environments that can be easily
linked to VR. Buildings are being fitted with vast numbers of electronic chips for smart monitoring. One of the
results of this is that you start to collect data – an aspect of which is ‘movement’. At any given period of time, you
might have X number of people moving through a particular area, but what might this actually mean and what are
the possible outcomes?
Health & safety
If, for example, you had a 2D plan of an office, you could look at it, and consider what might be hazardous about it
– but imagine if you had instead a virtual, three-dimensional environment where it was easy to add or subtract
objects and elements. You could visualise how a desk blocks an emergency exit, or show that an electrical box is
not secured. From there, it would just be a question of going around and looking at the hazard, and identifying what
measures were needed, perhaps a desk exclusion zone.
Green construction
Finally, there are important environmental considerations, too. If you can create a building in data before you
construct it, it’s quite simply much cheaper to make any necessary modifications. Not only are you free to play
around with a virtual environment, you can present it to the end user and gain relevant feedback to know exactly
what is needed. No developer has the omniscience to predict every element of what a customer will need – a
surgeon in a hospital for example, is going to have a far better knowledge of what they will need from an emergency
environment than a developer ever will.
SLAM3D Maps AsPervasiveasGoogleMaps
http://augmentedpixels.com/slam-3d-maps-augmented-reality
-robotics-will-worth-google-maps/
Adopted from Vitaliy Goncharuk (CEO and Founder of Augmented Pixels)
Most of new mobile devices will have at least stereo camera capabilities (and/or
structured light sensing, and even solid-state LIDARs if cheap enough). Each
device will be creating a 3D SLAM map and with the sheer number of such
devices, indoor and outdoor maps can be mainted with minimal effort using
crowdsourcing.
“Just imagine that thanks to 3D SLAM cloud maps a man with a mobile
phone and AR glasses will be able to interact with other people with mobile
phones/AR Glasses and robots in real time in the same coordinate system.
This opens up great prospects for improvement of current patterns of
behavior (indoor navigation, etc.) and for creation of fundamentally new
services, patterns of behavior and accumulation of fundamentally new
knowledge, which will exceed the value of Google Maps in many times!”
GoogleMapsIndoorMaps
https://www.google.com/intl/en_uk/maps/about/partners/indoormaps/
Google Business View, https://www.google.com/streetview/hire/
Use indoor maps to view floor plans
You can see and navigate inside places like airports, department stores,
and malls using the Google Maps app.
Note: Indoor maps is only available in selected locations. See a
list of places that have indoor maps.
In addition for indoor navigation (useful for example for visually
impaired, indoor drones, etc.), Google wants to visualize the indoor
spaces like they have done for StreetView.
In USA, businesses can have their indoors already 360º
photographed (for example how a restaurant looks inside)
IndoorMap Beyondcommerceandadtech
We've got Google Maps to help us out when we need to navigate
outdoors, but Google can only map out so many indoor locations
without getting creepy. And that's where Stimulant comes in. This
"innovation studio" built a Microsoft HoloLens app that lets you map out
an area, define locations, and use the headset to get instant directions
to any defined location.
Stimulant's HoloLens App Helps
Navigate Inside Buildings
BY ADAM DACHIS 09/03/2016
https://hololens.reality.news/news/stimulants-hololens-app-helps-navigate-
inside-buildings-0171946/
https://vimeo.com/168415931
http://mashable.com/2017/05/17/google-visual-positioning-service-tango-augmented-reality
BY RAYMOND WONG MAY 17, 2017
At its Google I/O developers conference, Google announced a new technology
called Visual Positioning Service (VPS), a Tango-enabled mapping system that
uses augmented reality on phones and tablets to help navigate indoor locations.
Google says VPS makes use of machine learning, computer vision and mapping
coordinates to do just that. Along with audio interfaces, Google says VPS could
help the visually impaired find their way around the world, where they previously
would have had difficulties.
GoogleMapsmeet MatterportandNCTech#1
Matterport partners with Google to bring
3D Street View perspectives indoors
Posted May 9, 2017 by Lucas Matney (@lucas_matney)
When you’re looking at moving into a new space, Street View is
often a useful tool to get the general vibe of the area but it’s almost
impossible to really tell what spaces look like indoors without
physically being there.
Today, users clicking through locations on Google Street View will
start seeing quite a bit more businesses pop up that they can
actually jump into and explore themselves. This is possible
thanks to a partnership between Google and Matterport.
Google has already been doing a bit of indoor surveying through
partnerships with individual 360 photographers, but this
partnership opens Street View up to a much larger library of
content. Matterport has an index of over a half-million indoor
spaces that users can view using either a web viewer of VR
headset. It will ultimately be up to the individual partners of
Matterport to decide if their content ends up being viewable
on Street View, but the company believes this partnership will
greatly expand the reach of its customers.
Matterport is ultimately not the only partner to whom Google is
opening its Street View API, but it is the sole company which will
be offering 3D views of spaces in addition to 360-degree
scans which should allow for more compelling views as Google
embraces new technologies like virtual reality.
Google Street View Teams with NCTech and Matterport
BY SEAN HIGGINS, SPAR 3D EDITOR ON MAY 17, 2017
TECHNOLOGY: HARDWARE, INDUSTRY, RELATED & NEW TECHNOLOGIES, SOFTWARE INDUSTRIES:
ARCHITECTURE ENGINEERING & CONSTRUCTION (AEC), TRANSPORTATION & INFRASTRUCTURE
Though the Scottish company NCTech has garnered attention recently for its LASiris VR 3D
capture device, it is also well known for their 360° HDR cameras. On May 11th, the company
announced that it will be producing one such camera for Google Street View. This summer, the
company is making a move to “help small businesses market their venues” by offering a Street
View API that enables users to publish their captures to Google’s platform with “the click of a
button.” Once published, the scans will be available on Google Maps and in Google Search.
“Matterport is excited to partner with Google to enhance the way business owners market to
customers around the world,” said Bill Brown, CEO of Matterport. “Our all-in-one solution helps
businesses promote their venues and provide a preview of what customers can expect.”
GoogleMapsmeet MatterportandNCTech#2
Spar3D: This summer, the company is making a move to “help small businesses market their venues” by offering
a Street View API that enables users to publish their captures to Google’s platform with “the click of a
button.” Once published, the scans will be available on Google Maps and in Google Search.
https://www.nctechimaging.com/downloads-files/iris360-brochure-a5.pdf
Supportportal:https://nctech.zendesk.com/hc/en-us/community/topics/200412007-iris360
GoogleMapsmeet Matterportfor customers#1
Matterport Spaces: Reach Millions on
Google Street View
Attract more customers and win more business with 3D, VR,
and more! Create immersive 3D and VR experiences with
Matterport today and publish your virtual walkthroughs to Google
Street View in just a few months. Join our Beta program below!
https://matterport.com/gsv/
What is Google Street View and
Matterport for Business
Listings?
https://support.matterport.com/hc/en-us/articles/115006844048-FAQ-Ma
tterport-for-Business-Listings-Publish-to-Google-Street-View-
What kinds of places can I publish?
● Business Listings — retail and restaurants
● Places of Interest — museums and landmarks
● Multifamily — apartment complexes
● Travel and Hospitality — hotels and resorts
● Vacation & Short Term Rentals — nightly rentals only
● Commercial Real Estate — office spaces
Private homes (residential real estate) cannot be
published to Google Street View. Nightly rentals are
allowed.
GoogleMapsmeet Matterportfor customers#2
Why is Google putting lidar on its new
Street View cars?
BY SEAN HIGGINS, SPAR 3D EDITOR ON SEPTEMBER 9, 2017
https://www.spar3d.com/blogs/the-other-dimension/google-putting-lid
ar-new-street-view-cars/
Earlier this week, your friends at Arstechnica published
a little piece about Google’s new Street View cars and
their inclusion of lidar sensors. On top of the new
cars, you’ll spot an integrated system that includes 7
cameras and two Velodyne Pucks, though it’s uncertain
exactly which version of the puck it is. These sensors
aren’t for automating the cars, and we know this
because they’re placed at an unusual angle—45°
rather than the 15° you’d commonly find on self-driving
cars.
To that, I would add my own speculation that Google
is gathering outdoor building data that they can
connect to interior captures taken by sensors from
Matterport or NCTech, both companies that have
partnered with the Google Street View initiative. With a
3D data set that bridges the indoors and outdoors,
we could get the indoor navigation so many have been
asking for. We’d also get better AR, and–much to
Google’s pleasure I’m sure–much more precise ways to
show us advertising.
Google wants you to help feed its image-hungry
algorithms. The tech industry’s recent interest in virtual
reality has made 360 degree cameras relatively cheap.
This summer, Google began certifying some cameras as “
Street View ready,” meaning you can upload your own
panoramas through the Street View mobile app to live on
the company’s service. That footage will be processed by
Google’s image recognition algorithms for fresh map data
just like its own imagery.
https://www.wired.com/story/googles-new-street-view-cam
eras-will-help-algorithms-index-the-real-world/
GoogleMapsmeet Matterportfor customers#3
3D, VR, 360° and Street View Photographers | Real Estate Agents
Need a 3D/VR/360°/Street View Photographer? 3,236 We Get Around Network Members
in 104 Countries!
Dan Smigrod (23 Sept 2017): “Soon, Matterport will enable Room Labels
to be automatically generated using Artificial Intelligence (AI) as discussed
below and in the Matterport Satisfaction & New Feature Survey 2017.”
Matterport: "We’ve used it internally to build a system that segments
spaces captured by our users into rooms and classifies each room. It’s
even capable of handling situations in which two types of room (e.g. a
kitchen and a dining room) share a common enclosure without a door or
divider. In the future, this will help our customers skip the task of having to
label rooms in their floor plan views," according to Matterport Co-Founder
Matt Bell is this article published Thursday (21 September 2017).
"Ultimately, we want to do for the real world what Google did for the web –
enable any space to be indexed, searched, sorted, and understood,
enabling you to find exactly what you’re looking for. Want to find a place to
live that has three large bedrooms, a sleek modern kitchen, a balcony with
a view of a pond, a living room with a built-in fireplace, and floor-to-ceiling
windows? No problem! Want to inventory all the furniture in your office, or
compare your construction site’s plumbing and HVAC installations against
the CAD model? Also easy!"
https://www.metroplex360.co
m/virtual-tours-google-stre
etview/
https://www.mp2sv.com/
We provide support for any platform including:
DSLR Photospheres, Ricoh Theta S, iGuide, RealVision, Matterport 360 Snapshots
GoogleMapsmeetsEarth VR
Google Earth VR app gets
support for Street View
Posted Sep 14, 2017 by Lucas Matney (@lucasmtny)
https://techcrunch.com/2017/09/14/google-earth-vr-app-gets-su
pport-for-street-view/
Google Earth VR is getting a little update today that brings your
views to street-level in the world-exploring virtual reality app.
The app is adding Street View into the app so that users can
easily transition between 3D satellite views and 360 camera
captures on the ground level.
Introducing Google Earth VR, our next step to help the world see the world. With Earth
VR, you can fly over a city, stand at the edge of a mountain, and even soar into space.
Google Earth VR is available now on Steam for the HTC Vive.
https://www.youtube.com/watch?v=SCrkZOx5Q1M
Beyondstructure Howpeopleinteract with thespace?
Keywords: indoor semantic inference; activity recognition; multi-length windows; virtual samples;
virtual features; deep learning
http://dx.doi.org/10.3390/s17061214
The architecture of the stacked autoencoder used in DeepMap+
http://www.behavioranalyticsretail.com/7-technologies-to-track-people/
Recognizing human actions from unknown and unseen (novel) views is a challenging problem. We propose a
Robust Non-Linear Knowledge Transfer Model (R-NKTM) for human action recognition from novel views. The proposed
R-NKTM is a deep fully-connected neural network that transfers knowledge of human actions from any unknown view to
a shared high-level virtual view by finding a non-linear virtual path that connects the views
https://arxiv.org/abs/1602.00828
Imaging Noveltechniques:Transientimaging
https://doi.org/10.1109/ICCPHOT.2017.7951478
Can we reconstruct the entire internal shape of a room if all we can directly
observe is a small portion of one internal wall, presumably through a window in the
room? While conventional wisdom may indicate that this is not possible, motivated by
recent work on `looking around corners', we show that one can exploit light echoes to
reconstruct the internal shape of hidden rooms.
Can we reconstruct the
shape of a hidden closed
room with a small
peephole?
We will show an experimental set up,
built with a transient camera and a
pico-second laser, that can infer the
shape of a hidden room. The transient
camera consists of single SPAD
detector (single-photon avalanche
diode) with 30 ps jitter. The pico-
second laser has a jitter comparable
to that a SPAD, and emits pulses at
530-570 nm wavelength. Coherence
lengths at this bandwidth are too low
to do interferometric measurements.
Instead, we rely on the arrival times of
photon echoes in our algorithm.
Block diagram of experimental setup: The three core components of our
setup are illumination hardware, SPAD electronics, and reconstruction
algorithm. The illumination hardware consists of a pulsed laser, which sends
periodic pulses of short duration and a galvo to control the position of the
beam. The SPAD electronics consists of SPAD to detect a photon and
Picoharp to compute the timing of that photon. The data from Picoharp is fed
to a computer where the reconstruction algorithm will computationally
determine the shape of the room.
Adithya Kumar Pediredla ; Mauro Buttafava ; Alberto Tosi ; Oliver Cossairt ; Ashok Veeraraghavan
Published in: Computational Photography (ICCP), 2017 IEEE International Conference on
Date of Conference: 12-14 May 2017
Imaging Noveltechniques:CompressedSensing #1
Compressed sensing (CS, also known as compressive sensing, compressive sampling, or
sparse sampling) is a signal processing technique for efficiently acquiring and
reconstructing a signal. CS enables a potentially large reduction in the sampling and
computation costs for sensing signals that have a sparse or compressible representation.
Compressed sensing
DL Donoho - IEEE Transactions on information theory, 2006
Cited by 19,074 articles, see Related articles
An introduction to compressive sampling
EJ Candès, MB Wakin - IEEE signal processing magazine, 2008
Cited by 6779 articles, see Related articles
A Framework for Compressive-Sensing of 3D Point
Clouds
Vahid Behravan ; Gurjeet Singh ; Patrick Y. Chiang (2016)
https://doi.org/10.1109/CIS.2016.0024
“ A key question in any power efficient LiDAR system (e.g. wireless sensor applications) is
how many points we need to capture to fully obtain the scene point cloud. The lower points
we need to capture the lower energy will be needed to transmit this data to the receiver. Also
it increases the frame rate. Compressive sensing is a method that enables reduction of the
LiDAR data.
Experimental results show that excluding edge points from error calculation gives us better
criteria to decide the best compression ratio in the system.”
A new approach to apply compressive sensing to
LIDAR sensing
Richard C. Lau; T. K. Woodward (2016)
http://dx.doi.org/10.1117/12.2058777
Most CS methods require sequential capture of a large number of random data projections,
which is not advantageous to LIDAR systems, wherein reduction of 3D data sampling is
desirable. In this paper, we introduce a new method called Resampling Compressive
Sensing (RCS) that can be applied to a single capture of a LIDAR point cloud to reconstruct
a 3- dimensional representation of the scene with a significant reduction in the required
amount of data. Examples of 50 to 80% reduction in point count are shown for sample
point cloud data. The proposed new CS method leads to a new data collection paradigm
that is general and different from traditional CS sensing such as the single-pixel camera
architecture.
Imaging Noveltechniques:CompressedSensing #2
Compressive sensing for reconstruction of 3D point
clouds in smart systems
Ivo Stančić ; Milos Brajović ; Irena Orović ; Josip Musić (2017)
https://doi.org/10.1109/SOFTCOM.2016.7772129
Introduction of simple structured-light scanners makes possible fast
scanning, effective robot detection and evasion of obstacles.
Nevertheless, some obstacles may still be difficult to detect and
recognize, primarily due to limitations of scanner's hardware which
results in a low number of reconstructed surface points. In this paper a
compressed sensing technique, primarily used for the reconstruction
of 2D images, is utilized to enhance the quality of 3D scan, by
increasing the number of reconstructed 3D points to the scanner's
theoretical maximum.
Sparse representation for colors of 3D point cloud
via virtual adaptive sampling
Junhui Hou ; Lap-Pui Chau ; Ying He ; Philip A. Chou (2017)
https://doi.org/10.1109/ICASSP.2017.7952692
“it is common that a point cloud contains millions of points, leading to
huge amounts of data, so effective and efficient compression
schemes have to be developed due to limited network bandwidth
and storage space. The acquired data may be defective due to
occlusion or other factors (e.g., noise and holes), and thus,
preprocessing operations have to be performed to restore it”
Amir Adler, Michael Elad, Michael Zibulevsky https://arxiv.org/abs/1610.09615
The contributions of this paper are two-
fold:
(1) It presents for the first time, to the best
knowledge of the authors, the utilization of
a deep neural network for the tasks of
compressive linear sensing and non-
linear inference; and
(2) During training, the proposed network
jointly optimizes the compressive
sensing matrix and the inference
operator, leading to a significant
advantage compared to state-of-the-art
for the task of image classification
Igor Carron | Nuit Blanche
Jason Laska's thesis presentation slides entitled:
Regime Change Sampling Rate vs. Bit-Depth in Compressive Sensing
CompressionConvergencewithartificialintelligence
http://dx.doi.org/10.1038/nature14541
Data compression and probabilistic modelling are two sides of the same coin, and Bayesian
machine-learning methods are increasingly advancing the state-of-the-art in compression. The
connection between compression and probabilistic modelling was established in the
mathematician Claude Shannon’s seminal work on the source coding theorem, which states that
the number of bits required to compress data in a lossless manner is bounded by the entropy of
the probability distribution of the data.
The link to Bayesian machine learning is that the better the probabilistic model one learns,
the higher the compression rate can be (MacKay, 2003). These models need to be flexible
and adaptive, since different kinds of sequences have very different statistical patterns (say,
Shakespeare’s plays or computer source code). It turns out that some of the world’s best
compression algorithms [for example, Sequence Memoizer (Wood et al. 2011) and PPM with
dynamic parameter updates (Steinruecken et al. 2015)] are equivalent to Bayesian non-
parametric models of sequences, and improvements to compression are being made through a
better understanding of how to learn the statistical structure of sequences.
Future advances in compression will come with advances in probabilistic machine learning,
including special compression methods for non-sequence data such as images, graphs and
other structured objects.
The key distinction between problems in which a probabilistic approach is important and
problems that can be solved using non-probabilistic machine-learning approaches is whether
uncertainty has a central role. Moreover, most conventional optimization-based machine-
learning approaches have probabilistic analogues that handle uncertainty in a more principled
manner. For example, Bayesian neural networks represent the parameter uncertainty in neural
networks (Neal, 1996), and mixture models are a probabilistic analogue for clustering methods (
MacKay, 2003).
http://dx.doi.org/10.1111/j.1467-8659.2006.00957.x
https://doi.org/10.1016/j.cag.2009.03.019
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.604.8269
Dropout is used as a
practical tool to obtain
uncertainty estimates
in large vision models and
reinforcement learning
(RL) tasks
https://arxiv.org/abs/1705.07832
Imaging FusionwithDeeplearningandAI
Memristor Image Processor Uses
Sparse Coding to See
By Katherine Bourzac - Posted 25 May 2017 | 12:00 GMT
http://spectrum.ieee.org/tech-talk/semiconductors/optoe
lectronics/memristor-camera-chip-uses-sparse-coding-to-
see
Now researchers led by Wei Lu at the University of
Michigan have designed hardware specifically to run
brain-like “sparse coding” algorithms. Their system
learns and stores visual patterns, and can recognize
natural images while using very little power compared
to machine learning programs run on GPUs and CPUs.
Lu hopes these designs, described this week in the
journal Nature Nanotechnology, will be layered on
image sensors in self-driving cars.
The key, he says, is thinking about hardware and
software in tandem. “Most approaches to machine
learning are about the algorithm,” says Lu. Conventional
processors use a lot of energy to run these algorithms,
because they are not designed to process large
amounts of data, he says. “I want to design efficient
hardware that naturally fits with the algorithm,” he says.
Running a machine-learning algorithm on a powerful
processor can require 300 watts of power, says Lu. His
prototype uses 20 milliwatts to process video in real
time. Lu says that’s due to a few years of careful work
modifying the hardware and software designs together.
Chronocam - A new standard: Bio-inspired vision sensing + processing
http://www.chronocam.com/wp-content/uploads/2016/09/Technology.pdf
Processing Distributedprocessing...fordroneLiDARhives?
http://www.wired.co.uk/article/improbable-quest-to-build-the-matrix
SpatialOS
https://improbable.io/
SpatialOS is a cloud-based computational
platform that lets you use many servers and engines
to power a single world. The platform coordinates a
swarm of micro-services called workers, which
overlap and dynamically reorganize to power a huge,
seamless world. The platform also lets you handle a
huge number of concurrent agents across different
devices in one world.
http://www.wired.co.uk/article/drone-swarms-change-warfare
LOCUST - Swarming Navy Drones
In the future, flying drone or a hive of flying drones can do automatic inspection and
scanning of both indoor real estate (think the scaling for Google Indoor Maps instead of
having a human go there and move the tripod), and outdoor construction sites in automated
and autonomous fashion. One just needs affordable solid-state LiDARs and/or 360º
imaging. The R&D is massive now on military drones for obvious reasons:
DragonflEye Project Wants to Turn
Insects Into Cyborg Drones
Anti-drone radio wave startup
SkySafe secures $11.5M from
Andreessen Horowitz
Posted Jul 20, 2017 by Josh Constine (@joshconstine)
https://techcrunch.com/2017/07/20/skysafe

Emerging 3D Scanning Technologies for PropTech

  • 1.
    Emerging 3D ScanningTechnologies for PropTech Falling costs with rising quality via hardware innovations and deep learning
  • 2.
    Outlineofthepresentation StructurefromMotion(SfM) Low-cost passivesensing 360°imaging Omnidirectional immersiveimagesandvideos Rangesensing Structuredlight, Matterport,Kinectforexample Laserscanning LiDARs fromVelodyne for example Data-drivenprocessing DeepLearning 3DDatasets Withwhat totrain yourdeeplearningpipelines FutureProspects Short overview of future applications Thepresentationismeant asatechnical introductionfor typical hardware andsoftware processingtechniquesusedinreal estateandconstruction site scanning. Computerscientistsnew to proptechorganizations andreal estate fieldin generalmight especiallyfindthispresentation useful.One assumesthat thereaderisfamiliarwiththe basics ofdeeplearning.
  • 3.
    Datastructuresfor realestatescans RGB+D Pixelgrid presenting colorand depth Example from Prof. Li Mesh(Polygon) from voxel data(“3Dpixels”) Voxel grid meshing using marching cubes (StackExchange) PointCloud unordered datatypically (i.e. not on agrid but sparse
  • 4.
    PropTechResources for domaininsights https://www.inman.com/ InmanHacker Connect is created by and for the real estate technology community. Debate, discuss and define the future of real estate’s most pressing tech issues at Hacker Connect. Join more than 400 engineers, developers, designers, product managers, database architects, webmasters, and technology executives from across the real estate space. Build partnerships, connect with peers, tackle thorny tech issues, learn best practices discover innovative breakthroughs and collaborate during special hands-on keyboard sessions at this day-long, tech- first event. WHY YOU SHOULD ATTEND Hear from industry leaders on APIs, bots, data security, ownership, user experience, blockchain and more. Take part in collaborative hands-on-keyboard sessions and come out with a new tool to apply to your job. Learn how to better integrate data, workflows and be competitive in your recruitment efforts https://www.inman.com/event/hacker-17-sf/ http://www.moderneventures.com/accelerator/ https://gust.com/accelerators/moderne-accelerator (Pi Labs) is Europe’s first venture capital platform investing exclusively in early stage ventures in the property tech vertical. London, United Kingdom. http://pilabs.co.uk/ http://www.jamesdearsley.co.uk/ “The only PropTech site for the latest Property Technology news and views” #PropTech community across Europe. Join us for our next event in #Berlin http://futureproptech.de/
  • 5.
  • 6.
    StructurefromMotionBasics Structure-from-Motion (SfM). Insteadof a single stereo pair, the SfM technique requires multiple, overlapping photographs as input to feature extraction and 3-D reconstruction algorithms. - Westoby et al praehistorische-archaeologie.de - Florian Tubbesing Structure from Motion can achieve good accuracy compared to laser scanners. James and Robson (2012) Cited by 281 Articles, and see Related articles This volcanic bomb (~10 cm across) from Soufrière Hills volcano was scanned by an Arius3d laser scanner ( Stuart Robson, University College London) and also reconstructed using the SfM-MVS technique, with the results scaled by sfm_georef. Differences between cross sections through the two models have RMS values of ~0.3 mm. Point cloud: low res (6 Mb) http://www.lancaster.ac.uk/staff/jamesm/software/sfm_georef.htm SfM method basically computes the relative camera positions between all related photos. After every relative camera position is found, the scheme uses these matrices to reconstruct all feature points using triangulation. Thus there are two main problems: 1) Image registration (e.g. SIFT, SURF, ORB, etc) 2) Pose Estimation (e.g. Perspective-n-Point with RANSAC) By Dr Calle Olsson https://www.youtube.com/watch?v=i7ierVkXYa8
  • 7.
    StructurefromMotionLiteratureReferences https://doi.org/10.1016/j.geomorph.2012.08.021 Cited by 631articles, and see Related articles https://arxiv.org/abs/1701.08493 Structure-from-Motion’ (SfM) operates under the same basic tenets as stereoscopic photogrammetry, namely that 3-D structure can be resolved from a series of overlapping, offset images. However, it differs fundamentally from conventional photogrammetry, in that the geometry of the scene, camera positions and orientation is solved automatically without the need to specify a priori, a network of targets which have known 3-D positions. Instead, these are solved simultaneously using a highly redundant, iterative bundle adjustment procedure, based on a database of features automatically extracted from a set of multiple overlapping images (Snavely et al 2008). Finally, even though there exist various theoretical works in the literature that study fundamental problems in SfM and/or provide rigorous analysis of stability and robustness of specific methods, we believe that the SfM community would still highly benefit from rigorous results on fundamental problems (e.g., what is the theoretically maximal amount of mismatched features or level of noise in the images that can be tolerated for a stable structure recovery, and can this be achieved efficiently?) and theoretical analysis of stability, robustness and computational efficiency of existing or new methods
  • 8.
    SLAM Simultaneouslocalizationandmapping SLAM, VisualOdometry, Structure from Motion, Multiple View Stereo Yu Huang, Senior Architect, Autonomous Driving@Baidu USA https://www.slideshare.net/yuhuang/visual-slam-structure-from-motion-multiple-view-stereo Samsung R&D Institute Necessary Skills / Attributes: ● 5+ years’ experience delivering computer vision based products using C++ or Python (Masters or PhD study will be considered). ● Theoretical and practical understanding of multi-view geometry and 3D reconstruction. ● Experience with machine learning techniques within a computer vision context. ● PhD/MS in Computer Vision, Artificial Intelligence or Machine Learning. ● Expertise with Deep Neural Networks using TensorFlow or Keras. SLAM stands for Simultaneous Localization and Mapping and one way to understand it is to imagine yourself entering an unfamiliar building for the first time. As you move about the building, you don't completely forget where you have already been. Indeed, at any moment you have a pretty good idea where you are within the current map that you have so far constructed in your head, and unless you have a really bad sense of direction, you could probably turn around and get back out of the building without too much trouble. Finding your way around the building is a good example of simultaneously constructing a map and localizing yourself within that map. http://www.pirobot.org/blog/0015/
  • 9.
    SLAM Traditionalalgorithm comparison http://dx.doi.org/10.1186/s41074-017-0027-2 Theframework is mainly composed of three modules as follows. 1) Initialization 2) Tracking 3) Mapping Additional modules for stable and accurate vSLAM + Relocalization +Global map optimization “ From the technical point of views, there is no definitive difference between SLAM and real-time SfM.” Even though visual SLAM algorithms have been developed since 2003, vSLAM is still an active research field. Each algorithm has different characteristics. We need to choose an appropriate algorithm by considering a purpose of an application.
  • 10.
    VisualOdometry Taketomi et al.(2017): http://dx.doi.org/10.1186/s41074-017-0027-2 “Odometry is to estimate the sequential changes of sensor positions over time using sensors such as wheel encoder to acquire relative sensor movement. Camera-based odometry called visual odometry (VO) is also one of the active research fields in the literature [16, 17]. From the technical point of views, vSLAM and VO are highly relevant techniques because both techniques basically estimate sensor positions. According to the survey papers in robotics [18, 19], the relationship between vSLAM and VO can be represented as follows. vSLAM = VO + global map optimization The relationship between vSLAM and VO can also be found from the papers [20, 21] and the papers [22, 23]. In the paper [20, 22], a technique on VO was first proposed. Then, a technique on vSLAM was proposed by adding the global optimization in VO [21, 23].” Towards stable visual odometry & SLAM solutions for autonomous vehicles https://www.youtube.com/watch?v=T5Y6OPG-d08 NavStik Hackerspace | Projects at Hackerspace Visual Odometry using Optic Flow
  • 11.
    SoftwareOpen-sourceVisualSFM VisualSFM:AVisualStructurefromMotion System Changchang Wu Citedby 326 articles, and see Related articles VisualSFM is a GUI application for 3D reconstruction using structure from motion (SFM). The reconstruction system integrates several of my previous projects: SIFT on GPU(SiftGPU), Multicore Bundle Adjustment, and Towards Linear-time Incremental Structure from Motion . VisualSFM runs fast by exploiting multicore parallelism for feature detection, feature matching, and bundle adjustment. Using VisualSFM and Meshlab as an offline alternative to Autodesk's excellent 123D catch. I walk you through my workflow for converting multiple images into a 3D model suitable for use in Blender. Tutorial for amateur photographers by Jamie Fuller. https://www.youtube.com/watch?v=V4iBb_j6k_g OpenSourcePhotogrammetrywithVisualSFM: Ditching123DCatchJuly12,2013 by Jesse Indoor Navigation from Multiple Images By Jaan Tollander de Balsch, 2016, Aalto https://jaantollander.github.io/SCI-C1000/pr ototype.html What is the best method for 3D object modelling and reconstruction from photos or videos taken by flying robots or drones? What is the accuracy of such reconstruction methods with regards to the vibrations of the flying drones, quality of camera and resolution? Is it possible to improve the results by organizing multiple flights and overlaying/accumulating the data in the point cloud? Is there any free software available?
  • 12.
    SoftwarePythonPhotogrammetryToolbox(PPT)GUI Real photo xSfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles. http://184.106.205.13/arcteam/ppt.php https://github.com/archeos/ppt-gui/ Converting pictures into a 3D mesh with PPT, MeshLab and Blender http://arc-team-open-research.blogspot.co.uk/2012/09/converting-pi ctures-into-3d-mesh-with.html Blender camera tracking + Python Photogrammetry Toolbox http://arc-team-open-research.blogspot.co.uk/2012/11/blender-camer a-tracking-python.html The video show the skull reconstructed in 3D with Python Photogrammetry Toolkit GUI. Smilodon, the 3D reconstruction of the saber-toothed cat http://arc-team-open-research.blogspot.co.uk/2013/03/
  • 13.
    Open-sourcelibraries forSfM OpenSfM isa Structure from Motion library written in Python on top of OpenCV. The library serves as a processing pipeline for reconstructing camera poses and 3D scenes from multiple images. https://github.com/mapillary/OpenSfM 656 stars OpenSfM OpenMVG (Multiple View Geometry) "open Multiple View Geometry" is a library for computer-vision scientists and especially targeted to the Multiple View Geometry community. https://github.com/openMVG/openMVG 1,1856 stars OpenMVG https://doi.org/10.1007/978-3-319-56414-2_5 http://imagine.enpc.fr/~marletr/publi/RRPR-2016 -Moulon-et-al.pdf Sung and Lin (2017): “VisualSFM uses the pre- emptive feature matching, the incremental structure from motion and the re-triangulation techniques. The incremental feature matching can greatly speed up the process because this kind of matching will first sort all feature points and match only first h feature points for each photo.” Sung and Lin (2017): “OpenMVG also contains incremental structure from motion technique. Besides that, they proposed a new iterative sampling method called a contrario Random Sample Consensus (AC-RANSAC) as a substitution to the original RANSAC in order to acquire higher precision and better performance. The AC-RANSAC using the “a contrario” methodology in order to find a model that best fits the data with a threshold T that adapts automatically to the noise. Hence, it is able to find a model and its associated noise without a fixed threshold.”
  • 14.
    Open-sourcelibraries forSfM+SLAM OpenChisel https://github.com/personalrobotics/OpenChisel An open-sourceversion of the Chisel chunked TSDF library. It contains two packages: open_chisel open_chisel is an implementation of a generic truncated signed distance field (TSDF) 3D mapping library; based on the Chisel mapping framework developed originally for Google's Project Tango. It is a complete re-write of the original mapping system (which is proprietary). open_chisel is chunked and spatially hashed inspired by this work from Neissner et. al, making it more memory-efficient than fixed-grid mapping approaches, and more performant than octree-based approaches. A technical description of how it works can be found in our RSS 2015 paper. http://ri.cmu.edu/pub_files/2015/7/ChiselPaper.pdf
  • 15.
    Research-gradeSfM old-school monovideo http://dx.doi.org/10.1186/s13640-017-0168-3 Inspiredby the structure from motion systems, we propose a system that reconstructs sparse feature points to a 3D point cloud using a mono video sequence so as to achieve higher computation efficiency. The system keeps tracking all detected feature points and calculates both the amount of these feature points and their moving distances. We only use the key frames to estimate the current position of the camera in order to reduce the computation load and the noise interference on the system. Furthermore, for the sake of avoiding duplicate 3D points, the system reconstructs the 2D point only when the point shifts out of the boundary of a camera. In our experiments, we show that our system is able to be implemented on tablets and can achieve state-of-the-art accuracy with a denser point cloud with high speed.
  • 16.
  • 17.
    Research-gradeSfM DeepLearning -based#2 https://arxiv.org/abs/1702.01381,2 May 2017 We evaluated the performance of our proposal on the DTU dataset comparing it with two traditional feature based methods, namely SURF (Cited by 8683 articles) and ORB ( Cited by 2739 articles). The system is trained in an end-to-end manner utilising transfer learning from a large scale classification dataset. In addition, a variant of the proposed architecture containing a spatial pyramid pooling (SPP) layer is evaluated and shown to further improve the performance. RegNet is able to correct even large decalibrations such as depicted in the top image. The inputs for the deep neural network are an RGB image and a projected depth map. RegNet is able to establish correspondences between the two modalities which enables it to estimate a 6 DOF extrinsic calibration. Additionally, with an iterative execution of multiple CNNs, that are trained on different magnitudes of decalibration, our approach compares favorably to state-of-the-art methods in terms of a mean calibration error of 0.28º for the rotational and 6 cm for thetranslation components even for large decalibrations up to 1.5 m and 20º . https://arxiv.org/abs/1702.02295
  • 18.
    Research-gradePose/Structure DeepLearning -based#1 Essentiallythe same technology for stereo matching and depth map generation as for SfM https://arxiv.org/abs/1703.04309 https://arxiv.org/abs/1704.07813 Empirical evaluation on the KITTI dataset demonstrates the effectiveness of our approach: 1) monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and 2) pose estimation performs favorably compared to established SLAM systems under comparable input settings.
  • 19.
    Research-gradePose/Structure DeepLearning -based#2 GANson everything, so here as well :) The usefulness of VisualSFM/ openSFM/ openMVG for defensible startup products? Inversion is often ambiguous, e.g., many compositions of 3D shape and camera pose give rise to the same 2D projection. To address this ambiguity, we impose priors on the predicted latent factors, through an adversarial discriminator network trained to discriminate between predicted factors and ground-truth ones. Training adversarial inversion does not require input-output paired annotations, but merely a collection of ground-truth factors, unrelated (unpaired) to the current input. Our model can thus be self-supervised by unlabelled image data, by minimizing a joint reconstruction and adversarial loss, complementing any direct supervision provided by paired annotations. Applying adversarial inversion to super-resolution and inpainting results in automated “visual plastic surgery” Structure-from-motion(SfM) results with and without adversarial priors. The results of the baseline (columns 5th and 8th) are obtained from a model with depth smooothness prior, trained with early stopping at 40K iterations (before divergence).
  • 20.
    SfMonMobileDevices https://arxiv.org/abs/1611.09498 https://doi.org/10.1109/ICCV.2013.15 | Citedby 141 articles, see Related articles https://doi.org/10.1016/j.cviu.2016.09.007 After introducing the reconstruction algorithms at the base of our approach, we show how to build applications able to generate 3D floor plans scaled to their real-world metric dimensions and capable to manage scene not necessary limited by Manhattan World assumptions. Then, exploiting the resulting structural and visual model, we propose a client-server interactive exploration system implementing a low-DOF navigation interface, specifically developed for touch interaction on smartphones and tablets. https://doi.org/10.1145/2999508.2999526
  • 21.
    SfMonMobileDevices CaseDacuda Magic Leap,the augmented reality startup that has raised $1.4 billion in funding but has yet to release a product, has made an acquisition to expand its work in computer vision and deep learning, and to build out its operations into Europe. The company has acquired the 3D division of Dacuda, a computer vision startup based out of Zurich. One of Dacuda’s focuses had been developing algorithms for consumer- grade cameras (and not just cameras, but any device with a camera function) to capture 2D and 3D imaging in real time, “making 3D content as easy as taking a video.” https://techcrunch.com/2017/02/18/confir med-magic-leap-acquires-3d-division-of-d As you can see, no detail about what the two might be working on. The acquisition was first rumored last week — after Dacuda posted a note on its blog about selling its 3D division, and then some Dacuda employees updated their LinkedIn profiles as Magic Leap employees (one example here). Tom’s Hardware then speculated it could signal Magic Leap using technology developed by Dacuda to enable room-scale, six degrees of freedom tracking (essentially to improve its image capturing sensors in 3D environments). The ecosystem there is attracting other big-name M&A. Faceshift, a motion capture startup acquired by Apple in 2015, was also founded in Zurich. Facebook’s Oculus VR in August 2016 also quietly acquired a startup called Zurich Eye, incubated at the University of Zurich and ETH, the federal institute of technology. Zurich Eye became the basis of Oculus and Facebook’s office in the city. Zurich Eye, ironically, was co-founded by a three former software engineers from Dacuda (they all now work for Oculus VR). For example, in October the company had linked up with MindMaze, another virtual/augmented reality startup out of Switzerland, to build a platform they were calling “MMI, the world’s first multisensory computing platform for mobile-based, immersive and social virtual reality applications,” MindMaze noted. MindMaze said it planned to “deploy the technology for users globally to address a void left by Google’s DayDream View for positional tracking and multiplayer interactions.” We have contacted Magic Leap for comment and will update this post if and when we learn more.
  • 22.
    AppleARKit Technology https://developer.apple.com/arkit/ Since theiPhone 6, iPhones have used what Apple calls “Focus Pixels”, which is its term for phase detection AF. Fast Company reports that system will be replaced with laser autofocus possibly as soon as the next iPhone, which is set to debut this fall. It is likely that Apple would use both AF technologies, as Google does in its Pixel line of phones. The technology would serve a dual purpose, also allowing for better depth perception with the inbuilt camera for augmented reality apps. ARKit rolls out with iOS 11 this fall, so it would make sense to also include the VSCEL laser system in the phone launching at the same time. https://petapixel.com/2017/07/20/apple-bring-3d-laser-autofocus-iphone-cameras-report-says/ https://www.theverge.com/2017/6/26/15872332/apple-arkit-ios-11-augmented-reality-developer-excitement
  • 23.
    AppleARKit ExampleApplications https://twitter.com/madewithARKit Measuring kitchendimensions http://bit.ly/2tJ5KV8 app by→ @SmartPicture3D Measure distances with your iPhone. Clever little #ARKit app by @BalestraPatrick http://bit.ly/2sFl8RB Inter-dimensional iPhone AR portals are closer than they appear http://bit.ly/2sufO0d ARkit demo by @nedd Demo Shows How Augmented Reality Will Make Advertising More Immersive. Mixed reality producer Bilawal Singh Sidhu show peek of what the world of advertising could be with the ARKit. #adtech https://mobile-ar.reality.news/news/apple-ar-demo-shows- augmented-reality-will-make-advertising-more-immersive-0 178905/
  • 24.
    Google’s responsetoARKit ARCore DAVIDJAGNEUX, UPLOADVR@UPLOADVR SEPTEMBER 2, 2017 6:00 AM “Earlier this week, Google announced ARCore, a software-based solution for making more Android devices AR-capable without the need for depth sensors and extra cameras. It will even work on the Google Pixel, Galaxy S8, and several other devices very soon and supports Java, Unity, and Unreal from day one. In short, it’s kind of like Google’s answer to Apple’s ARKit.” - https://venturebeat.com/2017/09/02/googles-first-arcore-goal-100-million-ar-capable-android-phones/ “Another example, which is especially relevant for developers that build traditional smartphone apps in Java, is that we want to make it easier than ever for people to get into 3D modeling that haven’t done it before,” Bavor says. “We know there are a lot of people that want to get into 3D development and AR but aren’t experts in Maya, or Unity, or anything. So Blocks is an app we built with the intention of enabling people that have never done a 3D model in their life to feel comfortable building 3D assets. We even made it easy to export right from Blocks and pull into ARCore apps you’re developing.”
  • 25.
    ARCore tooearlytotellhowitwilldoagainst“AppleCult” Verge AdiRobertson https://youtu.be/NhJydpMkpug FusedVR https://youtu.be/dNXBvDKRg1M https://venturebeat.com/2017/08/29/google-launches-arcore -sdk-in-preview-ar-on-android-phones-no-extra-hardware-re quired/ https://youtu.be/ttdPqly4OF8 Super Ventures Blog Matt Miesnieks CEO 6D.ai, Partner @Super_Ventures, AR technology & cycling https://medium.com/super-ventures-blog/how-is-arcore-better-than-arkit-5223e6b3e79d ● Isn’t ARCore just Tango-lite? ● The iPhone-8-keynote sized elephant in the room ● So should I build on ARCore now? ● Is ARCore better than ARKit? Scottie Gardonio Aug 30 AR / VR enthusiast. Creative Manager. Passionate graphic designer. https://medium.com/iotforall/arcore-vs-arkit-google-counters-apple-33483c08d3da ARCore vs. ARKit: Google Counters Apple Let the Dueling Begin Google announcing inside-out 6-DOF tracking support for Daydream back at Google IO earlier this year.
  • 26.
    DeepLearningonMobileDevices https://techcrunch.com/2017/05/17/googles-tensorflow-lite-brings-machine-learning-to-android-devices/ http://blog.stratospark.com/creating-a-deep-learning-ios-app-with-keras-and-tensorflow.html ● 3D FaceCapture ● 3D Scene Reconstruction ● 2.5D Scene Reconstruction and Computational Photography ● SLAM and Object Tracking ● Augmented Reality ● Google Cardboard SDK for iOS https://doi.org/10.1109/IPSN.2016.7460664 | Cited by 28 articles, see Related articles Thursday 20 July 2017, Movidius USB stick https://techcrunch.com/2017/07/20/movidius-launches-a-79-deep-learning-usb-stick/ Snapchat secretly acquires Seene, a computer vision startup that lets ... https://techcrunch.com/.../snapchat-secretly-acquires-seene-a- computer-vision-startup-... 3 Jun 2016 https://doi.org/10.1109/PDP.2017.98 https://arxiv.org/abs/1705.06224
  • 27.
  • 28.
    360°(omnidirectionalimaging) Introduction The PanopticCamera platform developed jointly by Microelectronic Systems Laboratory (LSM) and Signal Processing Laboratory (LTS2) of EPFL.* http://lsm.epfl.ch/page-52820-en.html Wikipedia: “360-degree videos, also known as immersive videos[1] or spherical videos ,[2] are video recordings where a view in every direction is recorded at the same time, shot using an omnidirectional camera or a collection of cameras. During playback the viewer has control of the viewing direction like a panorama.” Consumer-level camera review http://thewirecutter.com/reviews/best-360-degree-camera/ By DANIEL CULPANWednesday 12 August 2015 http://www.wired.co.uk/article/9-mind-blowing-360-degree-videos Scuba Diving Short Film in 360° Green Island, Taiwan https://youtu.be/2OzlksZBTiA
  • 29.
    360°aspartof “10BreakthroughTechnologiesof2017” https://www.technologyreview.com/s/603496/10-breakthrough-technologies-2017-the-360-degree-selfie/ Seasonal changesto vegetation fascinate Koen Hufkens. So last fall Hufkens, an ecological researcher at Harvard, devised a system to continuously broadcast images from a Massachusetts forest to a website called VirtualForest.io. And because he used a camera that creates 360°pictures, visitors can do more than just watch the feed; they can use their mouse cursor (on a computer) or finger (on a smartphone or tablet) to pan around the image in a circle or scroll up to view the forest canopy and down to see the ground. Journalists from the New York Times and Reuters are using $350 Samsung Gear 360 cameras to produce spherical photos and videos that document anything from hurricane damage in Haiti to a refugee camp in Gaza. One New York Times video that depicts people in Niger fleeing the militant group Boko Haram puts you in the center of a crowd receiving food from aid groups. Or consider the spherical videos of medical procedures that the Los Angeles startup Giblib makes to teach students about surgery. The company films the operations by attaching a $500 360fly 4K camera, which is the size of a baseball, to surgical lights above the patient. The 360° view enables students to see not just the surgeon and surgical site, but also the way the operating room is organized and how the operating room staff interacts. These applications are feasible because of the smartphone boom and innovations in several technologies that combine images from multiple lenses and sensors. For instance, 360° cameras require more horsepower than regular cameras and generate more heat, but that is handled by the energy-efficient chips that power smartphones. Both the 360fly and the $499 ALLie camera use Qualcomm Snapdragon processors similar to those that run Samsung’s high- end handsets. Once people discover spherical videos, research suggests, they shift their viewing behavior quickly. The company Humaneyes, which is developing an $800 camera that can produce 3-D spherical images, says people need to watch only about 10 hours of 360° content before they instinctively start trying to interact with all videos. When you see 360°imagery that truly transports you somewhere else, you want it more and more.
  • 30.
    Low-costendSamsung Gear andGalaxy SamsungGear360, ~£250 Samsung GearVR, ~£100 Samsung Galaxy S6-8, smartphone, ~£200-£700 http://www.samsung.com/uk/wearables/gear-360-c200/ If you’re clamoring to shoot in 360 degrees, the Gear 360 balances simple design with workable image quality — but you really need a Samsung phone (and a Gear VR, and a good hunk of money) to get the most out of it. And, for now, that's fine. This version of the Gear 360 is more likely to be looked back on as a relic anyway, a recognizable but eventually dismissible attempt at a new idea, and the foundation for whatever Samsung does next.
  • 31.
    Low-costend#2Ricoh Theta Ricoh’s ThetaV 4K camera sports 360- degree video and wireless playback RYAN WINTERHALTER, UPLOADVR@@UPLOADVR SEPTEMBER 02, 2017 07:03 PM https://venturebeat.com/2017/09/02/ricohs-theta-v-4k-camera-sport s-360-degree-video-and-wireless-playback/ Ricoh is unveiling its latest 360-degree camera this morning. Dubbed the Ricoh Theta V, the $430 4K camera is the latest in the line which launched in 2013 with the Ricoh Theta. Available for pre-order now, and shipping in mid-September, the Theta V features 3,820-by-1,920 resolution video capture. That’s a massive improvement on the earlier Theta S, which offered a sub-1,080p 1,920-by-960, and the Theta SC, which allowed for 1,920-by-1,080 recording. Perhaps the biggest usability improvement to the Theta V is the inclusion of remote playback. Users can now wirelessly stream their video to an external display directly from the camera. Previous devices in the Theta line (except the developer-only Theta R) required users to export their raw footage into a computer to stitch the image and create a useable video. That’s now all done on the device. Videographers can watch their footage on any display, and move the POV by moving the camera itself. The Theta V boosts sound quality as well. Four microphones capture data from their respective dimensions, creating spatial audio that allows users to hear where the sound is coming from within the recording. Ricoh Theta V hands-on Published Aug 31, 2017 | Jeff Keller Based on some quick tests of a non-final Theta V, both stills and videos are noticeably better than those from its predecessor. We're looking forward to getting our hands on a production model in a few weeks and putting it through its paces. For higher quality audio capture, Ricoh is offering the TA-1 3D Microphone ($269). Developed by Audio Technica, the mic attaches via the tripod mount and uses a standard 3.5mm audio jack.
  • 32.
    HigherEndGoPro, Nokia Ozo,FacebookSurround, etc. GoPro (NASDAQ:GPRO) recently unveiled the Omni, a six-camera rig for filming interactive spherical videos that can be explored through a smartphone's movements, a user's finger swipes, or a virtual reality headset. The device is the smaller sibling of the 16-camera Odyssey rig ($15,000), which hasn't been launched despite being announced nearly a year ago. Let's take a look at four key things investors should know about the Omni ($3,500), and how they might impact GoPro's future. https://www.fool.com/investing/general/2016/04/14/4-things-inves tors-need-to-know-about-gopro-incs-o.aspx What's next for GoPro? GoPro investors don't have many catalysts to look forward to this year. The Omni is too pricey relative to its peers to gain any mainstream traction. The Karma drone, which is due to arrive within the next two months, faces tough competition from market leader DJI Innovations. By the time the Hero 5 cameras arrive near the end of the year, the mainstream market could be saturated with cheap VR and flying cameras. Introducing Facebook Surround 360: An open, high-quality 3D-360 video capture system Brian K Cabral, April 12, 2016 ● Facebook has designed and built a durable, high- quality 3D-360 video capture system. ● The system includes a design for camera hardware and the accompanying stitching code, and we will make both available on GitHub this summer. We're open-sourcing the camera and the software to accelerate the growth of the 3D-360 ecosystem — developers can leverage the designs and code, and content creators can use the camera in their productions. ● The system exports 4K, 6K, and 8K video for each eye. The 8K videos double industry standard output and can be played on Gear VR with Facebook's custom Dynamic Streaming technology. https://code.facebook.com/posts/1755691291326688/introduc ing-facebook-surround-360-an-open-high-quality-3d-360-vid eo-capture-system/ https://www.theverge.com/2016/4/25/11421992/disney-nokia-oz o-camera-virtual-reality-star-wars-marvel Ever since Nokia announced its 360-degree Ozo virtual reality camera it has positioned the system as a high-end option for Hollywood filmmakers, and today the company is announcing a partnership with Disney that should help deliver on that promise. As part of the deal, Ozo cameras will be put into the hands of Disney filmmakers and its marketing teams to create 360-degree, virtual reality content across all of the studio’s various brands.
  • 33.
    LytroImmerge The world'sfirstprofessional Light Field solution forcinematicVR roadtovr.com/lytros-immerge-360 https://www.lytro.com/immerge Consequently, to create a virtual reality that even the human eye cannot distinguish from the real world, we must achieve the perfect immersive viewing experience, such that human viewers feel they can walk into the scene. This is known as the virtual walk-in effect, and it requires light-field technology—3D imaging technology that emerged from the field of computational imaging/photography to capture the light rays that people perceive from different locations and directions. When combined with computer vision and deep learning, light- field technology provides a viable path for producing low-cost, high-quality VR content, positioning this technology to be the most profitable segment of the VR industry.
  • 34.
    “DepthLytro”‘Depth sensing withlight fieldtechniques Refocusing in spite of foreground occlusions: (a) Scene containing a monkey toy being partially occluded by a plant in the foreground, (b) traditional synthetic aperture refocusing on light field is partially effective in removing the effect of foreground plants, (c) synthetic aperture refocusing of depth displays corruption due to occlusion, (d) histogram of depth clearly shows two clusters corresponding to plant and monkey, (e) virtual aperture refocusing after removal of plant pixels shows sharp depth image of monkey, (f) Quantitative comparison of indicated scan line of the monkey’s head for (c) and (e) We use coding techniques from Tadano et al. (2015) to image beyond backscattering nets. Notice how the corrupted depth maps are improved using the codes. We show how digital refocusing can be performed on the images without the scattering occluders by combining depth fields with coded TOF. https://arxiv.org/abs/1509.00816
  • 35.
    Post-processingfor360° imaging https://doi.org/10.1007/s00371-017-1368-7 Overall process.a Input image. b Lines detected and classified: red for vertical lines and yellow for horizontal lines. c Great circles from the classified lines. Green dots are vanishing points computed from horizontal (yellow) lines. d Upright adjustment result We implemented our method using C++ and the OpenCV library on a 64-bit Windows PC with an Intel i7- 6700K 4.00GHz CPU and 32GB RAM. For an input image of size 5376 × 2688 px, it takes a few hundred milliseconds (less than one second) to obtain the final rotation matrix R for upright adjustment. https://arxiv.org/abs/1703.10798 http://vllab1.ucmerced.edu/~wlai24/360hyperlapse Pipeline of the proposed algorithm. Given a 360 video, we first stabilize the sequence to smooth the relative rotation◦ between adjacent frames. We estimate the focus of expansion (i.e., the direction of forward motion) as a prior information for our camera path planning. To extract the regions of interest, we compute the spatial-temporal saliency and semantic segmentation. The detected regions of interest are used to guide the camera path planning. Finally, we use an adaptive 2D video stabilization to render a smooth hyperlapse.
  • 36.
    360°DeepLearning #1 http://dx.doi.org/10.3390/s17061341 https://arxiv.org/abs/1705.01759 Watching a360º sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this “360 piloting” task, we propose “deep 360 pilot” – a deep learning-based agent for piloting through 360º sports videos automatically Panel (a) overlaps three panoramic frames sampled from a 360 skateboarding video◦ with two skateboarders. One skateboarder is more active than the other in this example. For each frame, the proposed “deep 360 pilot” selects a view – a viewing angle, where a Natural Field of View (NFoV) (cyan box) is centered at. It first extracts candidate objects (yellow boxes), and then selects a main object (green dash boxes) in order to determine a view (just like a human agent). Panel (b) shows the NFoV from a viewer’s perspective.
  • 37.
    360°DeepLearning #2 Flat2Sphere: LearningSpherical Convolution for Fast Features from 360° Imagery Yu-Chuan Su, Kristen Grauman (Submitted on 2 Aug 2017) https://arxiv.org/abs/1708.00919 We propose to learn a spherical convolutional network that translates a planar CNN to process 360° imagery directly in its equirectangular projection. Our approach learns to reproduce the flat filter outputs on 360° data, sensitive to the varying distortion effects across the viewing sphere. The key benefits are 1) Efficient feature extraction for 360°images and video, and 2) The ability to leverage powerful pre- trained networks researchers have carefully honed (together with massive labeled image training sets) for perspective images. We validate our approach compared to several alternative methods in terms of both raw CNN output accuracy as well as applying a state-of-the-art "flat" object detector to 360° data. Our method yields the most accurate results while saving orders of magnitude in computation versus the existing exact reprojection solution.
  • 38.
    360°Therolein PropTech? #1a Useforreal estate agents, still a novelty/gimmicky? (from 2014 until 2017) MAY 26, 2014 By James Dearsley http://www.jamesdearsley.co.uk/is-the-property-industry-intereste d-in-360-degree-hd-filming/ USES OF 360 DEGREE HD FILMING IN REAL ESTATE: 1. Sales and Marketing. Firstly, from a realtor or estate agent perspective there are several uses here of 360 degree cameras, the first being obvious, that of sales and marketing. It will be simple and efficient to take a quick film of each room, or just walk through the property with these devices to record what you need 2. Property Management issues. We have also seen interest from companies looking to use these bits of equipment for inventory taking. Seeing as they are of HD quality it means you can quickly take photographs of properties which can later be looked at in more detail should problems arise in letting disputes. 3. Virtual Reality. With Facebook recently buying Oculus Rift for $2 Billion, it is getting less far fetched. Considering the price of an Oculus is relatively cheap (reckoned to be less than $500/£360 when released next year) it would not be surprising if Facebook are hoping for a lot of people to be purchasing these (Candy Crush Saga in Virtual Reality anyone?!). It isn’t just Facebook though; Sony have a VR headset in production as does Samsung (it was recently announced) and so this space is going to move quickly. By using these cameras you can put your clients into these homes very quickly and easily – either in the office, if you get a set of these yourself, or, in time, in their own home if Facebook get their way. https://www.forbes.com/sites/forbesagencycouncil/2017/06/28/want-to-use-360 -degree-photo-and-video-11-things-to-consider/#22fffa955002 1. I would recommend that marketers stay on the sidelines until the industry matures. - Kristopher Jones, LSEO.com 4. Use A Strategic Approach The capabilities of 360-degree photo/video have powerful applications in many industries, including real estate, retail and tourism. A 360-degree view has a better chance of selling a house than a static image. - Brock Murray, seoplus+ 7. Prepare For Tomorrow's Consumer Expectations Today, 360-degree photos and videos are very helpful in industries such as the auto industry or real estate where visualizing the product is essential. As VR continues to grow, 360-degree photos and videos will likely become a standard. The consumers' expectations will likely adjust to needing to learn more about the overall "360-degree" experience of the restaurant for example, not just a picture of the dish. - Ahmad Kareh, Twistlab Marketing 11. Create An Emotional Connection 360-degree multimedia is a brilliant tool for meaningful storytelling, as it allows the consumer to be transported to the experience you want them to have, bringing the story to life. Companies should take advantage of these tools to transform products into experiences, cultivating an immersive and emotional connection with the brand. - Joey Hodges, Demonstrate PR JUN 28, 2017 by Forbes Agency Council
  • 39.
    360°Therolein PropTech? #1b Useforreal estate agents A four-wheeled tripod outfitted with a computer, 360- degree camera and sensors can roam properties, producing highly choreographed, immersive videos that would be difficult — if not impossible — to replicate with a normal video camera. VirtualAPT (Brooklyn, NYC) offers residential tour service at now $1/ft² (~10.8$/m²), and for commercial uses, for a monthly fee per building or $0.50/ft² (~5.4$/m²) for separate units. Generated by technology from companies such as Matterport, 3-D home tours allow users to jump between 360-degree photos — sometimes situated within a 3-D model. ● A rover can shoot 360-degree footage of a home while moving along a pre-plotted route. ● Made by VirtualAPT, the videos can include on-camera presentations from real estate agents. ● They're an alternative to 3-D homes tours from companies such as Matterport. https://www.youtube.com/watch?v=JhfQK-tDvGU
  • 40.
    360°Therolein PropTech? #2a Useforconstruction andasatoolforconstructing4D/5D/6DBIM (BuildingInformationModel) Construction site manager manually taking photos of the progress. - Time-consuming to walk through and take photos - No full coverage of site - Might forget some spots - Nice initial 3D BIM not properly maintained during construction site. + Ideally have a drone inspecting the whole construction site with an on- board 360 degree video and a LIDAR / laser scanner. + One can go back in time and see who of the subcontractors for example are responsible for possible problems https://doi.org/10.1186/s40327-014-0016-9
  • 41.
    360°Therolein PropTech? #2b 360videos registered or not to 3D BIM model allows inspection of the progress (“4D BIM”) in the construction site also retrospectively, and can possibly reduce legal battles when it is clearer who is the one to be held responsible in case of discrepancies between as-built and as-planned data. VISUAL ASSET MANAGEMENT Visual Asset Management (VAM) service digitizes industrial and infrastructure assets using 360 degree images, 3D Models, and relative asset information. 3D MODELING We thrive on enabling 3D realistic visualization to projects while preserving the minute details necessary to portray our world. 360 VIDEO 360 video enables viewers to be at the center of any medium, allowing for a unique visual experience and situational awareness from any device. VIRTUAL REALITY OcuTech’s virtual reality solutions stimulate creative thinking and enhanced information sharing allowing for one of kind virtual experience. Ocutech from Houston, Texas, USA is already providing these type of services https://ocutech360.com/3d-architectural-visualization-solution/#3dvrvideo
  • 42.
  • 43.
    360°intosmartphones howbigwillitbe? https://www.engadget.com/2017/07/10/future-of-smartphone-camera/ 1) Augmentedreality 2) Dual-lens cameras 3)Better lenses 4)4K recording 5)Thermal imaging 6)Optical zoom 7)360 video “Several smartphone makers, including Samsung and Huawei, have already released add-on 360- degree cameras for their handsets, but this is something that could eventually be integrated into the phones themselves. Immersive 360-degree videos are gradually making their mark, with Facebook among the big firms pushing the technology, while virtual reality companies are gradually introducing more 360-VR content that be viewed from mobile phones.” https://techcrunch.com/2016/08/30/the-future-of-mobile-video-is-virtual-reality/ Are 360 cameras the future? https://youtu.be/i8EUerX90-0 TechAltar So whether teens in big numbers will ever apply Snapchat bunny ears to immersive 360 degree videos?
  • 44.
    360°intosmartphones plentyofoptionscoming#1 Acer’s newHolo 360 degree camera is essentially a smartphone Acer has announced its entry into the VR video market with a device that’s half 360-degree camera, half smartphone. http://www.trustedreviews.com/news/acer-s-new-ho lo-360-degree-camera-is-essentially-a-smartphone -2953609 Paul Monckton CONTRIBUTOR I write about photography and related subjects https://www.forbes.com/sites/paulmonckton/2016/05/31/worlds-first-live-smartphone-vr-camera/#9 fea6921a8b0 Yesterday at this year’s Computex trade show in Taipei, Quanta Computer and ImmerVision jointly announced what is claimed to be the world’s first 360-degree live VR streaming camera for smartphones, with demos starting from today. The, as yet unnamed, camera fits in the palm of the hand and is designed to attach magnetically to any smartphone. It comes with a 360-degree by 187-degree lens and uses a Sony Exmor-HDR imaging sensor to produce 16 megapixel panoramic images. ImmerVision's Panamorph lens makes more efficient use of an image sensor (Image credit: ImmerVision) THIS ADD-ON CAMERA WILL TURN YOUR SMARTPHONE INTO A 360 CAMERAJULY 26, 2017 ION360 U 4K 360-Degree Smartphone Camera is comprised of a 360 camera that goes on top of Essential's 360 Camera Is the World's Smallest 360-Degree Personal Camera for a Smartphone 30 May 2017 http://gadgets.ndtv.com/mobiles/news/essentials-360-camera-is-the-worlds-sm allest-360-degree-personal-camera-for-a-smartphone-1705826 After months of teasing, Android creator Andy Rubin has finally unveiled the Essential Phone that features a near bezel-less display that tries to outdo Samsung's Galaxy S8. Essential's 360 camera, which weighs around 35 grams and is being called the world's smallest 360- degree personal camera by the company, includes a dual 12-megapixel fisheye sensors that can capture 4K 360 video at 30fps. The camera also features 4 microphones to capture sound in 3D. The 360 camera can be bought along with the Essential Phone for an additional $50, or can be bought separately which will cost you $199. @essential, Palo Alto, CA, essential.com
  • 45.
    360°intosmartphones plentyofoptionscoming#2 ProTruly’s Darling https://www.theverge.com/2017/3/5/14809 182/protruly-darling-360-degree-camera- smartphone Acompany called HT Optical that makes the cameras found on ProTruly’s devices. The company said that it is working on a much smaller 360 camera module that will actually fit into a 7.6 mm thick smartphone and will be capable of capturing 16 MP photos and shoot 4K videos. What’s even more interesting is that the module will only add an extra 1 mm to the overall thickness of a device. https://www.theverge.com/ci rcuitbreaker/2017/2/22/1469 8026/huawei-360-degree-came ra-honor-vr-smartphones http://360rumors.com/ https://www.vrfocus.com/2017/07/360-degree-video-edi ting-app-for-smartphones/ V360 -360 video editor Avincel GroupInc  360-DegreeVideo Editing App ForSmartphonesV360editingsuite alreadyout for Android, withiOS versioncomingsoon.
  • 46.
    360°intosmartphones convergencewith AIplayers of course https://www.embedded-vision.com/news/movidius-low-po wer-vpu-technology-delivers-4k-vr-pixel-processing-p erformance-motorola%E2%80%99s-newest Movidius’ Myriad 2 Vision Processing Unit (VPU) technology, known for its image signal processing and computer vision capabilities with high energy efficiency, was selected by Motorola Mobility to power their newest Moto Mod: the 360 Camera. Moto Mods are unique modular accessories for Motorola smartphones that bring advanced functionality beyond traditional smartphone features. Motorola’s newest Moto Mod brings users the ability to live stream 360 videos⁰ while preserving battery life. Say Hello to the moto z² Force Edition with moto mods https://www.youtube.com/watch?v=0moMnChM6Ds https://www.wsj.com/articles/intel-to-buy-semiconduct or-startup-movidius-1473170441 https://www.altera.com/solutions/industry/automotive/applicat ions/drive-assistance/surround-view-camera.html http://www.nvidia.co.uk/object/drive-px-uk.html
  • 47.
    360°VideoSfM Obviousextensiontocombineboth Instead ofmanuallyrotatingyour camera,image all angles simultaneously while going through the rooms in an apartment https://uploadvr.com/adobe-algorithm-6dof-360-cam/ http://variety.com/2017/digital/news/adobe-6dof-vr-v ideo-algorithms-1202394491/ Adobe Motion Parallax demo https://youtu.be/37Z4f6p1HOY https://www.roadtovr.com/adobes-new-research-aims-give-depth-monoscopic-360-video/: Other techniques to achieve 6-DoF VR video usually require light-field cameras like HypeVR’s crazy 6k/60 FPS, LiDAR rig or Lytro’s giant Immerge camera. While these undoubtedly will produce a higher quality 3D effect, they’re also custom-built and ungodly expensive. 6-DOF VR videos with a single 360-camera Jingwei Huang ; Zhili Chen ; Duygu Ceylan ; Hailin Jin, Virtual Reality (VR), 2017 IEEE http://dx.doi.org/10.1109/VR.2017.7892229, 18-22 March 2017 Given a 360-video captured by a single spherical panorama camera, in an offline pre-processing stage, we recover the camera motion and the scene geometry first by performing structure-from-motion (SfM) followed by dense reconstruction. Then, in real-time we playback the video in a VR headset where we track the 6-DOF motion of the headset and synthesize new views by a novel warping algorithm.
  • 48.
    360°VideoSfM KoreaAdvanced InstituteofScience andTechnology(KAIST) Spherical panoramic cameras (Ricoh Theta S, Samsung Gear 360 and LG 360) Our sphere sweeping algorithm enables to compute all-around dense depth maps, minimizing the loss of spatial resolution. With the estimated all-around image and depth map, we have shown practical utilities by introducing 360 stereoscopic and anaglyph◦ images as VR contents. European Conference on Computer Vision ECCV 2016: Computer Vision – ECCV 2016 pp 156-172 https://doi.org/10.1007/978-3-319-46487-9_10 All-Around Depth from Small Motion with a Spherical Panoramic Camera. Sunghoon ImEmail author Hyowon Ha François Rameau Hae-Gon Jeon Gyeongmin Choe In So Kweon
  • 49.
  • 50.
    MicrosoftKinect Democratizing structuredlightscanning https://arxiv.org/abs/1505.05459 Structuredlight A sequence of known patterns is sequentially projected onto an object, which gets deformed by geometric shape of the object. The object is then observed from a camera from a different direction. By analyzing the distortion of the observed pattern, i.e. the disparity from the original projected pattern, depth information can be extracted The Time-of-Flight (ToF) technology is based on measuring the time that light emitted by an illumination unit requires to travel to an object and back to the sensor array. The Kinec tToF camera applies this CW intensity modulation approach. . Due to the distance between the camera and the object (sensor and illumination are assumed to be at the same location), and the finite speed of light c, a time shift [s]φ is caused in the optical signal which is equivalent to a phase shift in the periodic signal. This shift is detected in each sensor pixel by a so-called mixing process. The time shift can be easily transformed into the sensor-object distance as the light has to travel the distance twice, Cited by 65 articles - see Related articles
  • 51.
    KinectFusion Scanning withKinect https://doi.org/10.1145/2047196.2047270 Cited by 1356 articles, see Related articles https://arxiv.org/abs/1704.01047 https://arxiv.org/abs/1612.02859 The semantic cue from floorplan (i.e., door detection) resolves ambiguities. The figure shows the best placement based on the unary potential with or without the semantic cue We show qualitative results on ModelNet using the TSDF encoding (Curless and Levoy, 1996) and 4 views. The same TSDF truncation threshold has been used for traditional fusion, our OctNetFusion approach and the ground truth generation process. While the baseline approach is not able to resolve conflicting TSDF information from different viewpoints, our approach learns produce a smooth and accurate 3D model from highly noisy input. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill gaps in the reconstruction. We evaluate our approach extensively on both synthetic and real-world datasets for volumetric fusion. Further, we apply our approach to the problem of 3D shape completion from a single view where our approach achieves state-of-the-art results.
  • 52.
    Kinecttweaks depthresolution improvementswithpolarizationmeasurement? http://news.mit.edu/2015/object-recognition-robots-0724 https://youtu.be/m6sStUk3UVk http://news.mit.edu/2015/algorithms-boost-3-d-imaging-resolution-1000-times-1201 https://doi.org/10.1007/s11263-017-1025-7 https://doi.org/10.1364/OE.25.001173
  • 53.
    RangeSensing PlentyofOptions http://3dscanexpert.com/photogrammetry-benchmarks-r emake-vs-photoscan-vs-realitycapture-vs-zephyr/ This postis just an example based on a single photoset from a single object. That makes it zero percent scientific. Also, RealityCapture might have won this Drag Race in terms of both speed with the Fast preset and quality with the Normal preset, but an organic object like this is very favorable to its algorithms. Read my Full RC Review to see that it can’t always handle non-organic objects well. COMMERCIAL SOFTWARE http://3dscanexpert.com/ By Nick Lievendag Entrepreneur at the intersection of Creativity × Technology. Writes, Speaks and Consults about 3D Capture (3D Scanning & Photogrammetry). Founder of 3D Scan Expert.
  • 54.
    Matterportdominating RealEstatescanning This $4,500camera turns the real world into the virtual one. Today, Matterport ’s hardware is a hit with real estate agents. But fueled by the $30 million Series C it just raised, Matterport’s software and partnership with Google’s Project Tango could let you wave your phone around to create VR tours of anywhere you want. https://techcrunch.com/2015/06/25/matterport/ https://www.crunchbase.com/organization/matterport#/entity Matterport spawned out of the Xbox Kinect hacker scene in 2010. Founder Matt Bell had been working for a gesture recognition company that relied on a $50,000 camera and expert operators to produce a huge CAD file that could only be accessed through a specialized application. Bell was flabbergasted by the power of the $150 Kinect. He realized the potential for a relatively cheap device with similar technology that could let anyone map out rooms to create 3D models accessible straight from the web. https://youtu.be/HZX8RupfQls
  • 55.
    MatterportResearch onsemanticindoor segmentation Wecollected the data using the Matterport Camera, which combines 3 structured-light sensors to capture 18 RGB and depth images during a 360 rotation at each scan location◦ . The output is the reconstructed 3D textured meshes of the scanned area, the raw RGB-D images, and camera metadata. We used this data as a basis to generate additional RGB-D data and make point clouds by sampling the meshes. We semantically annotated the data directly on the 3D point cloud, rather than images, and then projected the per point labels on the 3D mesh and the image domains. https://arxiv.org/abs/1702.01105 | Cited by 3 - Related articles https://arxiv.org/abs/1702.07600 https://www.fastcompany.com/3059281/ introducing-hover-an-ai-powered-indo or-safe-camera-drone + Indoor scanning with tripod-based Matterport still requires a lot of manual work, and at some point will be updated to autonomous AI- powered indoor drone for better user experience.
  • 56.
    MatterportTechnologypatents Capturing and aligningmultiple 3-dimensional sceneswww.google.com/patents/US8879828Grant - Filed Jun 29, 2012 - Issued Nov 4, 2014 - Matthew Bell - Matterport, Inc. Multi-modal method for interacting with 3d models www.google.com/patents/US20130342533App. - Filed Jun 24, 2013 - Published Dec 26, 2013 - Matthew Bell - Matterport, Inc. Identifying and filling holes across multiple aligned three-dimensional scenes www.google.com/patents/US8861840Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc. Building a three-dimensional composite scene www.google.com/patents/US8861841Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc. Processing and/or transmitting 3D data www.google.com/patents/US9396586Grant - Filed Mar 14, 2014 - Issued Jul 19, 2016 - Matthew Tschudy Bell - Matterport, Inc. Semantic understanding of 3d data www.google.com/patents/US20160055268App. - Filed Jun 6, 2014 - Published Feb 25, 2016 - Matthew Tschudy Bell - Matterport, Inc. Selecting two-dimensional imagery data for display within a three-dimensional model www.google.com/patents/EP3120329A1?cl=enApp. - Filed Mar 13, 2015 - Published Jan 25, 2017 - Matthew Tschudy BELL - Matterport, Classifying, separating and displaying individual stories of a three-dimensional model of a multi-story structure based on captured image data of the multi-story structure www.google.com/patents/US20160217225App. - Filed Jan 28, 2016 - Published Jul 28, 2016 - Matthew Tschudy Bell - Matterport, Inc. Semantic understanding of 3d data US 20160055268 A1 ABSTRACT Systems and techniques for processing three- dimensional (3D) data are presented. Captured three- dimensional (3D) data associated with a 3D model of an architectural environment is received and at least a portion of the captured 3D data associated with a flat surface is identified. Furthermore, missing data associated with the portion of the captured 3D data is identified and additional 3D data for the missing data is generated based on other data associated with the portion of the captured 3D data. REFERENCED BY US9576184 Textura Planswift Corporation Detection of a perimeter of a region of interest in a floor plan document US20130328872 Tekla Corporation Computer aided modeling US20150227644 Pictometry International Corp. Method and system for displaying room interiors on a floor plan US20160063722 Textura Planswift Corporation Detection of a perimeter of a region of interest in a floor plan document US20160379405 Jim S Baca Technologies for generating computer models, devices, systems, and methods utilizing the same
  • 57.
    GoogleTangoTechnology http://www.deccanchronicle.com/technology/gadgets/210717/i s-google-tango-relevant-in-2017.html https://arstechnica.co.uk/gadgets/2016/12/google- tango-phab-2-pro-review/ A Project Tangodevice ‘sees’ the environment around it through a combination of three core functions. First up is motion tracking, which allows the device to understand its position and orientation using a range of sensors (including accelerometer and gyroscope). Then there’s depth perception, which examines the shape of the world around you. Intel provides a vital cog in this respect with its RealSense 3D camera. With this component on board, a device can gain accurate gesture control and snappy 3D object rendering among other things. Finally, Project Tango incorporates area learning, which means that it maps out and remembers the area around it. Point Cloud Framework for Rendering 3D Models Using Google Tango Maxen Chung, Santa Clara University Julian Callin, Santa Clara University http://scholarcommons.scu.edu/cseng_senior/84 https://doi.org/10.1007/s11227-016-1891-8 Project Tango Tablet Development Kit, recently introduced by Google, Inc. Equipped with the most powerful processor available to date on a consumer-level mobile platform (i.e., NVIDIA Tegra K1 whose 192 programmable CUDA-enabled GPU cores use the same efficient Kepler architecture found in the world’s most powerful supercomputers and workstations) along with several sensors (motion tracking camera, 3D depth sensor, accelerometer, ambient light sensor, barometer, compass, GPS, gyroscope), this mobile device can readily utilize GPU computing making it an ideal platform for developing real-time contextual awareness applications for the visually impaired (VI). Moreover, being compact, lightweight, potentially wearable, relatively discreet and affordable render it aesthetically appealing, socially acceptable and accessible for VI users
  • 58.
    GoogleTangoExampleApplications#1 We broke thenews yesterday that Google was producing a prototype 3D sensing smartphone called Project Tango. We also broke down the capabilities of the vision processor inside the device and talked about what it means for the future of phones. Now, we’ve got an exclusive look in the video below at a real 3D indoor map of a room captured with one of the prototype devices by Matterport. https://techcrunch.com/2014/02/21/heres-an-actual-3d-indoor-map-of-a-room-captured-with-googles-project-tango-phone/ https://matterport.com/mobile-3d-capture/ https://developers.google.com/tango/apis/overview Daydream is Google’s platform for virtual reality. It consists of Daydream-ready phones, Daydream-ready headsets and controllers, and Daydream apps. Daydream View is the first Daydream-ready headset and controller designed and developed by Google. It also comes with a touch-and-motion enabled controller so you can easily interact with VR apps. With the Daydream View, you will be able to explore new worlds through Google Street View and Fantastic Beasts. Kick back in your personal cinema with YouTube, Netflix, Hulu, and HBO. Get in the game with Gunjack 2, LEGO® BrickHeadz, and Need for Speed. That’s just the beginning of the VR possibilities with Daydream. http://www.techphlie.com/ 2017/07/what-is-google-ta ngo-and-daydream.html Google has notably been pushing AR/VR technologies with its latest Android OS. The most prominent introduction however, has been the ASUS ZenFone AR launch that took place at CES, 2017, earlier this year.
  • 59.
    GoogleTangoExampleApplications#2 Google Tango SDK examples:how to make a floor plan in 50 seconds Alexander Grau Google Tango and Revit Leonardo Manzione https://www.youtube.com/watch?v=A-4cuJ1kOQ4
  • 60.
    “GoogleTango”withoutdepth sensors I havealways believed that bringing 3D to consumers could only work without the need for dedicated depth sensors. This pure-software approach is already being embraced for Augmented Reality with Apple’s upcoming ARKit and Google’s ARCore which was announced last week. Both can give modern smartphones AR-capabilities by just using the regular camera(s), instead of using dedicated sensors like Tango. https://3dscanexpert.com/sony-3d-creator-brings-sensor-less-3d-scanning-consumers/ But yesterday, at IFA Berlin, Sony announced its latest smartphone, the XZ1. Which has all the bells and whistles you expect from a flagship Android phone but also an app called 3D Creator . It basically does exactly what Microsoft showed last year, but is actually available — albeit exclusive for the XZ1. https://www.sonymobile.com/global-en/products/phones/xperia -xz1/3d-creator/
  • 61.
    AppleDepthSensing TheiPhoneX’s notch isbasically aKinect 365by Paul Miller@futurepaul  Sep17,2017, 10:00am EDT https://www.theverge.com/circuitbreaker/2017/9/17/16315510/iphone-x-notch-kinect-apple-primesense-microsoft And now, in late 2017, Apple is going to sell a phone witha front-facing depthcamera. Unlike the original Kinect, which was built to track motion in a whole living room, the sensor is primarily designed for scanning faces and powers Apple’s Face ID feature. Apple’s “TrueDepth” camera blasts “more than 30,000 invisible dots” and can create incredibly detailed scans of a human face. In fact, while Apple’s Animoji feature is impressive,  the developerAPIbehind it is even wilder: Apple generates, in real time, a full animated 3D mesh of your face, while also approximating your face’s lighting conditions to improve the realismofAR applications. How Apple’siPhone X TrueDepth CameraWorks By David Cardinal onSeptember 14, 2017 Beyond the Camera: Facial Motions and Changing Features Getting a depth estimate for portions of a scene is only the beginning of what’s required for Apple’s implementation of secure facial recognition and Animojis. For example, a mask could be used to hack a facial recognition system that relied solely on the shape of the face. So Apple is using processing power to learn and recognize 50 different facial motionsthat are muchharder toforge.Theyalso provide the basis for making Animoji figures seem to mimicthephone’sowner. How Secure is Face ID? Given how willing Apple is to commit to using Face ID for financial transactions, I’m sure they have pushed the limits beyond either simple 3D models or 2D motion. It is likely they are relying on the phone’s abilitytorecognize minute facial movements and feed them into a machine learning system on the A11Bionicchip that will add another layer of security to the system. That piece will also be key in helping the phone decide whether you’re the same person when you put on a pair of glasses, a hat, or grow a beard — all of which Apple claims Face ID willhandle.
  • 62.
  • 63.
    LaserScanning LiDAR(LightDetection AndRanging) http://dx.doi.org/10.1038/nphoton.2010.148 http://dx.doi.org/10.1080/19479832.2013.811124 3Dbuilding modeling (BIM) using images and LiDAR: a review https://techcrunch.com/2017/07/12/nyu-releases-the-largest-lidar- dataset-ever-to-help-urban-development/ http://ia.cr/2017/613 https://www.theregister.co.uk/2017/06/27/lidar_spoofed_bad_news_for_self_driving_cars/
  • 64.
  • 65.
    RieglA rangeof differentlaserscanners http://www.riegl.com/products/unmanned-scanning/ RIEGLVZ-400 Indoor Scanned Data by Jamis Choi, Published on Apr 1, 2010 https://www.youtube.com/watch?v=hOf0hpCn92I Scanning made simple with RiSOLVE - RIEGL's new 3D Scene Capture Software Published on Oct 4, 2012 (feat. horrible lounge music) https://www.youtube.com/watch?v=lbxvzMlTWyg
  • 66.
    Rieglsystemin practice https://doi.org/10.1109/IROS.2016.7759501 Namely, wepropose a method for the automatic selection of feature coordinate locations, and introduce the concept of localized automatic relevance determination (LARD) to the Hilbert Maps framework, in which different dimensions in the projected Hilbert space operate within independent length scale values. The proposed technique was tested against other state-of-the-art 3D scene reconstruction tools in three different datasets: a simulated indoors environment, RIEGL laser scans and dense LSD-SLAM pointclouds. The results testify to the proposed framework’s ability to model complex structures and correctly interpolate over unobserved areas of the input space while achieving real-time training and querying performances.
  • 67.
    HandheldScanning GeoSLAMZEB-REVO Handheld LaserScanning - ZEB-REVO The ZEB-REVO is the latest, lightweight revolving laser scanner from GeoSLAM. Handheld, pole-mounted or attached to a mobile platform, the ZEB-REVO can record more than 40,000 measurement points per second from the survey environment. NEW ZEB-CAM The new ZEB-CAM is an optional upgrade for standard ZEB-REVO systems. Simply attach ZEB-CAM to the underside of a standard REVO and begin scanning immediately. The ZEB-CAM captures live video footage of the survey environment and adds contextual video and imagery to scan data to aid feature identification. Optical flow technology is utilised to accurately synchronise the video and scan together in GeoSLAM's Desktop software. http://www.3dlasermapping.com/zeb-revo- handheld-laser-scanning/ https://youtu.be/k8q5xr_eLgk
  • 68.
    GeoSlamvs.Leica Portablescanningquality http://dx.doi.org/10.1117/12.2270761 The paperinvestigates the performances of two portable mobile mapping systems (MMSs), the handheld GeoSLAM ZEB-REVO and Leica Pegasus:Backpack, in two typical user-case scenarios: an indoor two-floors building and an outdoor open city square. Note! This paper would have been even nicer with a ‘gold standard’ giving the “correct measurements” instead of just comparing two “good enough” scanners.
  • 69.
    ResearchScanners SensorFusion The IndoorMulti-sensor Acquisition System (IMAS) presented in this paper consists of a wheeled platform equipped with two 2D laser heads, RGB cameras, thermographic camera, thermohygrometer, and luxmeter. One of the laser scanning sensors is foreseen to obtain the building map and the navigation information, and the other one to the 3D environment reconstruction. The thermographic and optical images, and the geometric and comfort data are synchronized and automatically linked to trajectory positions, so that they are georeferenced in the building in terms of a relativepositioning system Software interface for virtual immersive navigation and ex situ data analysis. http://dx.doi.org/10.3390/s16060785
  • 70.
    AppliedPointCloud Scans Accessibility PointClouds to Indoor/Outdoor Accessibility Diagnosis J. Balado, L. Díaz-Vilariño, P. Arias, I. Garrido https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W4/287/2017/isprs-annals-IV-2- W4-287-2017.pdf This work presents an approach to automatically detect structural floor elements such as steps or ramps in the immediate environment of buildings, elements that may affect the accessibility to buildings. The methodology is based on Mobile Laser Scanner (MLS) point cloud and trajectory information. The methodology is tested in a real case study, consisting of 100 m of an urban street. Ground elements are correctly classified in an acceptable computation time. Steps and ramps also are exported to GIS software to enrich building models from Open Street Map with information about accessible/inaccessible entrances and their locations. http://www.wired.co.uk/article/wayfindr-app A project initiated by the Royal London Society for the Blind's (RLSB) Youth Forum has led to the prototyping of a new app called Wayfindr, which has been built especially to help blind and partially sighted people use London's transport network independently. The app relies on smartphones and iBeacons and has been developed in collaboration with global digital product design studio ustwo Our Open Standard gives you the tools to create inclusive and consistent experiences for your vision impaired customers. From transport networks and shopping centres, to hospitals and any other indoor space - we can help. Through our on-site trials and consultancy we will work together with you to understand how digital wayfinding can make your estate accessible. https://www.wayfindr.net/
  • 71.
  • 72.
    DataQuality compromisebetweenfilesize,computationaltimeandquality 3D modelreconstruction from point cloud processed either with OpenSFM, VisualSFM or Pix4D (top row) to mesh model (middle row) to final textured 3D model (bottom row) across a series of downsampled Sky Ranger UAV including full resolution (first column) half resolution (second column) and quarter resolution (last column). Bolick and Harguess (2016), http://dx.doi.org/10.1117/12.2224677 Garbage in – garbage out true like as always. The more high-quality images / points you have as input, the higher the reconstruction quality will obviously be. Top-left: points sampled on a sphere and corrupted with a lot of noise. Top-right: reconstructed surface mesh. Bottom-left: smoothed point set. Bottom- right: reconstructed surface mesh. Reconstruction error (mm) against number of points for the Bimba con Nastrino point set with 1.6M points as well as for simplified versions. CGAL 4.10 - Poisson Surface Reconstruction The sensitivity of biological finite element models to the resolution of surface geometry: a case study of crocodilian crania: “Example of the simplified models. C. moreletti models composed of 20k, 30k, 90k and 300k surface (mesh) elements.” https://doi.org/10.7717/peerj.988 point cloud & mesh processing MAY 27 2017, posted by Taylor Wang The final goal is to get a fully editable NURBS CAD model so that it can be modified by any CAD software to improve the design or reproduce the product.
  • 73.
    PointCloudLibray(PCL) The mostpopularopen-sourcelibrary http://unanancyowen.com/en/pcl-with-velodyne/ https://www.youtube.com/watch?v=7BUFxkyH1r0 https://doi.org/10.1109/MRA.2012.2206675 Cited by 186 articles - see Related articles
  • 74.
  • 75.
    Driftcorrection forproperimageregistration https://doi.org/10.1109/ROBOT.2010.5509312 Correcting fordrift (distortion) between different scans or overlapping point clouds with added velocity information for ICP (Iterative Closest Point) algorithm. (a) is a given environment. Blue points in (b) shows distortion of the scan, and red points in (b) show compensated scan. Transformation estimated using distorted data includes inevitable errors(c). Transformation estimated from the rectified scan gives us more accurate results(d). Kaarta - Common point cloud registration issues http://www.kaarta.com/cloud-registration-issues/ Published: 8 March 2017 http://dx.doi.org/10.3390/s17030539 Keywords: LiDAR; inertial measurement unit; iterative closest point; iterated sigma point Kalman filter; time delay calibration
  • 76.
    DataReduction andsimplificationfor storage ImranAshraf ; Soojung Hur ; Yongwan Park https://doi.org/10.1109/ACCESS.2017.2699686 LIDAR produces large point cloud, but, while generating images for limited field of view, data sparsity results in poor quality images. Moreover, 3D to 2D data transformation also involves data reduction, which further deteriorates the quality of images. http://dx.doi.org/10.1117/12.2270833 31 October 2016 https://doi.org/10.1109/TIP.2016.2623488 https://www.google.com/patents/US9582939 https://arxiv.org/abs/1609.00893 Keywords: Tensor networks, Function-related tensors, CP decomposition, Tucker models, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, multiway component analysis, multilinear blind source separation, tensor completion, linear/multilinear dimensionality reduction, large-scale optimization problems, symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear equations, pseudo-inverse of very large matrices, Lasso and Canonical Correlation Analysis (CCA) https://doi.org/10.1016/j.isprsjprs.2016.06.012 In-base point cloud management pipeline in the point cloud server (PCS).
  • 77.
    DataReduction CompressiongPointClouds Dynamic polygoncloud compression Eduardo Pavez ; Philip A. Chou (2017) https://doi.org/10.1109/ICASSP.2017.7952694 We introduce a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds called a polygon cloud. Polygon clouds, compared to polygonal meshes, are more robust to live capture noise and artifacts. Furthermore, dynamic polygon clouds, compared to dynamic point clouds, are easier to compress, if certain challenges are addressed. In this paper, we propose methods for compressing dynamic polygon clouds using transform coding of color and motion residuals. Real-time compression of point cloud streams Julius Kammerl ; Nico Blodow ; Radu Bogdan Rusu ; Suat Gedikli ; Michael Beetz ; Eckehard Steinbach (2012) https://doi.org/10.1109/ICRA.2012.6224647 We present a novel lossy compression approach for point cloud streams which exploits spatial and temporal redundancy within the point data. Our proposed compression framework can handle general point cloud streams of arbitrary and varying size, point order and point density. Furthermore, it allows for controlling coding complexity and coding precision. To compress the point clouds, we perform a spatial decomposition based on octree data structures. 3D Reconstruction Framework for Multiple Remote Robots on Cloud System Phuong Minh Chu, Seoungjae Cho, Simon Fong, Yong Woon Park and Kyungeun Cho (2017) http://dx.doi.org/10.3390/sym9040055 This paper proposes a cloud-based framework that optimizes the three-dimensional (3D) reconstruction of multiple types of sensor data captured from multiple remote robots. A working environment using multiple remote robots requires massive amounts of data processing in real-time, which cannot be achieved using a single computer. In the proposed framework, reconstruction is carried out in cloud-based servers via distributed data processing.
  • 78.
  • 79.
    DeepLearningbeyondnon-euclidean problems Michael M.Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, andPierre Vandergheynst https://doi.org/10.1109/MSP.2017.2693418 https://arxiv.org/abs/1705.10819
  • 80.
  • 81.
    DeepLearningPointNet++ PointNet++: Deep HierarchicalFeature Learning on Point Sets in a Metric Space Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas Stanford University, (Submitted on 7 Jun 2017) https://arxiv.org/abs/1706.02413 Illustration of our hierarchical feature learning architecture and its application for set segmentation and classification using points in 2D Euclidean space as an example. Single scale point grouping is visualized here. Left: Point cloud with random point dropout. Right: Curve showing advantage of our density adaptive strategy in dealing with non-uniform density. DP means random input dropout during training; otherwise training is on uniformly dense points Scannet labeling results. PointNet captures the overall layout of the room correctly but fails to discover the furniture. Our approach, in contrast, is much better at segmenting objects besides the room layout.
  • 82.
    DeepLearning2DFeatureDescriptors Instead of usingthe old-school SIFT, SURF, ORB, etc., the feature descriptor / matching can be done with data-driven deep learning network as well Note This model was trained with SfM data, which does not have strong rotation changes. Newer models work better in this case, which will be released soon. In the meantime, you can also use the models in the learn-orientation, benchmark-orientation. https://github.com/cvlab-epfl/LIFT https://arxiv.org/abs/1603.09114 | Cited by 23 Related articles
  • 83.
    DeepLearning3DFeatureDescriptors https://arxiv.org/abs/1706.04496 We present aview-based convolutional network that produces local, point-based shape descriptors. The network is trained such that geometrically and semantically similar points across different 3D shapes are embedded close to each other in descriptor space (left). Our produced descriptors are quite generic — they can be used in a variety of shape analysis applications, including dense matching, prediction of human affordance regions, partial scan-to-shape matching, and shape segmentation (right). In contrast to findings in the image analysis community where learned 2D descriptors are ubiquitous and general (e.g. LIFT), learned 3D descriptors have not been as powerful as 2D counterparts because they (1) rely on limited training data originating from small-scale shape databases, (2) are computed at low spatial resolutions resulting in loss of detail sensitivity, and (3) are designed to operate on specific shape classes, such as deformable shapes. We generate training correspondences automatically by leveraging highly structured databases of consistently segmented shapes with labeled parts. The largest such database is the segmented ShapeNetCore dataset [ Yi et al. 2016, https://www.shapenet.org/] that includes 17K man-made shapes distributed in 16 categories
  • 84.
    Meshgenerativeshapeswith GAN https://arxiv.org/abs/1705.02090 Our keyinsight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. It would be interesting to thoroughly investigate the effect of code length on structure encoding. Finally, it is worth exploring recent developments in GANs, e.g. Wasserstein GAN [Arjovsky et al. 2017], in our problem setting. It would also be interesting to compare with plain VAE and other generative adaptations.
  • 85.
    PointCloud generativeGANsforpointclouds #1a https://arxiv.org/abs/1707.02392 Webuild an end-to-end pipeline for 3D point clouds that uses an autoencoder (AE) to create a latent representation, and a Generative Adversarial Networks (GAN) to generate new samples in that latent space. Our AE is designed with a structural loss tailored to unordered point clouds. Our learned latent space, while compact, has excellent class- discriminative ability: per our classification results, it outperforms recent GAN-based representations by 4.3%. In addition, the latent space allows for vector arithmetic, which we apply in a number of shape editing scenarios, such as interpolation and structural manipulation. We argue that jointly learning the representation and training the GAN is unnecessary for our modality. We propose a workflow that first learns a representation by training an AE with a compact bottleneck layer, then trains a plain GAN in that fixed latent representation. One benefit of this approach is that AEs are a mature technology: training them is much easier and they are compatible with more architectures than GANs. We point to theory that supports this idea, and verify it empirically: we show that GANs trained in our learned AE-based latent space generate visibly improved results, even with a generator and discriminator as shallow as a single hidden layer. Within a handful of epochs, we generate geometries that are recognized in their right object class at a rate close to that of ground truth data. Importantly, we report significantly better diversity measures (10x divergence reduction) over the state of the art, establishing that we cover more of the original data distribution. In summary, we contribute. ● An effective cross-category AE-based latent representation on point clouds. ● The first (monolithic) GAN architecture operating on 3D point clouds. ● A surprisingly simpler, state-of-the-art GAN working in the AE’s latent space. 1) Autoencoder For fixed latent representation Vector arithmetic 2) Generative Adversarial Network Using the fixed latent representation In our latent-space GAN, instead of operating on the raw point cloud input, we pass the data through our pre-trained autoencoder, trained separately for each object class with the Earth Mover’s distance (EMD) loss function. Both the generator and the discriminator of the GAN then operate on the 512- dimensional bottleneck variable of the AE. Finally, once the GAN training is over, the output of the generator is decoded to a point cloud via the AE decoder. We found that very shallow designs for both the generator and discriminator (in our case, 1 hidden layer for the generator and 2 for the discriminator) are sufficient to produce realistic results
  • 86.
    PointCloud generativeGANsforpointclouds #1b Interpolatingbetween different point clouds, using our latent space representation. Note the interpolation between structurally and topologically different shapes. Generative results using our latent-space GAN. Note the variability and fidelity of the result. For a recap on GANs, you could see for example: https://arxiv.org/abs/1701.07875 Cited by 106 - Related articles What does GANs for point clouds mean in practice? Point-cloud super-resolution (e.g. Ledig et al. 2016 for natural images), to improve model appearance (e.g. remove staircasing), and inpainting (e.g. Iizuka et al. 2017) to handle occlusion and gaps from indoor scans (“shape completion”). “Visual plastic surgery” in other words (Tung et al. 2017) Sung et al. (2015) Data-driven Structural Priors for Shape Completion Mönch et al. (2010) Staircase-Aware Smoothing of Medical Surface Meshes
  • 87.
    HardwarePointCloud Super-resolution multiplescans https://doi.org/10.2312/SPBG/SPBG06/009-015 Citedby 47 articles On the left, one scan of the the parrot statue, with a sample spacing of about 1mm. Center, we combine 100 nearly identical such scans to produce the surface in the center, produced on a grid with sample spacing of about 0.3mm. Notice the noise reduction and the improvement in the detail, for instance in the face, neck and wing feathers. On the right, a photograph of the parrot statue. Super-resolution reconstruction using only 30 input scans at the left and increasing to 140 at the right. Noise is reduced dramatically at the beginning but more slowly at the end. Surfaces were reconstructed from subsets which were pre-registered using all 140 scans. For absolute measurement accuracy (e.g. Biljecki et al. 2017), one can scan the same space multiple times A thin strip of the super-resolved surface, and the nearby sample points from the input scans. The input is very noisy, but the points are densely and randomly distributed near the surface with few outliers, so the average gives an accurate representation of the surface. (a) One scan. (b) Final super-resolved surface from 100 scans. (c) Photo of the object (a plaster cast of a subway token). The bottom row shows some results of other kinds of processing, to evaluate the importance of the various steps of the algorithm. (d) One scan, bilinearly interpolated onto the finer grid and smoothed. Detail is missing. (e) The entire algorithm except for the final bilateral filtering step. The noise removed by the filtering seems to be residual registration error, which perhaps could be improved. (f) Just averaging 100 scans taken without moving the scanner, using the same Gaussian kernel. Noise is decreased, but there is aliasing from the lower-resolution grid obscuring detail visible in (b).
  • 88.
    DeepLearningSuper-Resolution Plentyofoptionsforimage/video/volumesuper-resolution https://arxiv.org/abs/1706.03142 https://arxiv.org/abs/1704.02738 https://arxiv.org/abs/1704.02470 https://arxiv.org/abs/1612.00085 Novel textureenhancement framework creates an HR style image that is rich in details, which can be used to restore high-frequency texture details back into the initial HR image via the style transfer algorithm. Four examples of SR results for nearest neighbor and cubic interpolation, the best-performing sparse coding, 3D- FSRCNN, and 3D-SRU-Net configurations. Arrows indicate regions in which at least one SR result mis- interprets a cell boundary or an ultrastructural feature. Scale bar 500 nm. Our method includes a sub-pixel motion compensation (SPMC) layer that can better handle inter-frame motion for this task. Our detail fusion (DF) network that can effectively fuse image details from multiple images after SPMC alignment
  • 89.
    Point-cloudsuper-resolution Upsampling‘on-the-fly’toavoid“dataexplosion”? Jason Schreier 4/17/17 12:05pmHorizon Zero Dawn, Kotaku http://kotaku.com/horizon-zero-dawn-uses-all-sorts- of-clever-tricks-to-lo-1794385026 Games like this don’t just look incredible because of ‘hyper-realism’ but because their engineers use all sorts of tricks [LOD’ing, or Level of Detail; Mipmapping; frustum culling, etc.] to save memory. The engine is designed to produce models in CityGML and does so in multiple LODs. Besides the generation of multiple geometric LODs, we implement the realisation of multiple levels of spatiosemantic coherence, geometric reference variants, and indoor representations. The datasets produced by Random3Dcity are suited for several applications, as we show in this paper with documented uses. The developed engine is available under an open-source licence at Github at http://github.com/tudelft3d/Random3Dcity http://doi.org/10.5194/isprs-annals-IV-4-W1-51-2016 Filip Biljecki, Hugo Ledoux, Jantien Stoter Level of detail texture filtering with dithering and mipmaps US 5831624 A Original Assignee 3Dfx Interactive Inc https://www.google.com/patents/US5831624 Level-of-detail rendering: colors identify different subdivision levels as stated in the top left corner. Feature-Adaptive Rendering of Loop Subdivision Surfaces on Modern GPUs November 2014 DOI: 10.1007/s11390-014-1486-x ManyLoDs: Parallel Many-View Level-of-Detail Selection for Real- Time Global Illumination Matthias Hollander, Tobias Ritschel, Elmar Eisemann, Tamy Boubekeur (2011) http://dx.doi.org/10.1111/j.1467-8659.2011.01982.x
  • 90.
    3DContentgeneration VolumetricCapture Generatecontentbyscanningreal-lifescenesandobjects Kul Wadhwa'sand Roddy O'Hara's Uncorporeal http://www.uncorporeal.com/ Uncorporeal: volumetric capture systems for VR & AR content creation. The team includes a technical Oscar-winner and engineering and product leadership from WETA, Google X, Lucas ILM, and Wikimedia. https://venturebeat.com/2016/10/13/pathbreaker-ventures-raises-12-milli on-to-invest-in-emerging-tech-such-as-vr-ar-and-robotics/ Ryan Gembala, founder of Pathbreaker Ventures believes connected homes and cars and autonomous vehicles will create a lot of opportunities in vertical applications for startups. And he also thinks that space technologies such as small satellites, analysis of space-captured data, consumer transport, space mining, and others are interesting. REALITYVIRTUAL.CO - A NEW ZEALAND BASED CREATIVE TECHNOLOGIES RESEARCH & DEVELOPMENT COLLECTIVE WITH AN ENTHUSIAST TOWARDS THE VISUAL REALM: ● unique post production & signal processing techniques including the development of deep learning image enhancement & automation throughout our 3D pipeline for PBR workflow ● strong emphasis on advanced robotics & autonomous operations for large data acquisition of 3D environments. 3D Scene Creation with Photogrammetry
  • 91.
    3DContentgeneration Automaticphotorealism#1 Stillcanbequitelabor-intensivetocreaterealisticcontent Get toknow Rense de Boer, a technical art director from Sweden, who is not only pushing the envelope of photo-real CGI environments, but he’s doing it all in a real-time engine! Art by Rens https://news.developer.nvidia.com/artist-spotlight-creating-photorealistic-cgi-environments-in-real-time/ https://www.youtube.com/watch?v=bXouFfqSfxg One Ph.D. position (supervision by Profs Niessner and Rüdiger Westermann) is available at our chair in the area of photorealistic rendering for deep learning and online reconstruction Research in this project includes the development of photorealistic realtime rendering algorithms that can be used in deep learning applications for scene understanding, and for high-quality scalable rendering of point scans from depth sensors and RGB stereo image reconstruction. If you are interested in applying, you should have a strong background in computer science, i.e., efficient algorithms and data structures, and GPU programming, have experience implementing C/C++ algorithms, and you should be excited to work on state-of-the-art research in the 3D computer graphics. https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le arning-and-online-reconstruction.html Ph.D. Position – Photorealistic Rendering for Deep Learning and Online Reconstruction
  • 92.
    3DContentgeneration Automaticphotorealism#2 ConvertingLiDARscanstovisuallyhighquality3Dcontent Atom Viewis a new piece of software that allows content creators to translate real-world scans into assets for virtual environments. Not only does it aim to produce realistic results but also reduce the workflow for content creation. The standalone app takes files captured from volumetric cameras, offline graphics renderers, 360 lidar and more. Volumetric capture is a promising area of development that could one day allow content creators to skip over several of the more laborious steps of traditional 3D content creation with better results. With Atom View, users can even edit objects once they’ve been imported. https://youtu.be/YxRI_3gKP8g
  • 93.
    3DContentgeneration Styletransfer formaps NeuralNetworks and The Future of 3D Procedural Content Generation by Sam Snider-Held, Creative Technologist at MediaMonks, focusing on the intersection of AR, VR, AI, UX, and Style transfer output on the left, real terrain on the right. Both are planes whose vertices are being displaced by the height map texture. Now was time to create my own style transfer light field and light field renderer. I basically reimplemented Andrew Lowndes’ WebGl light field renderer in Unity. What this post demonstrates is the idea that neural network could radically change how we generate 3D content. I went with light fields because currently my GPU is not fast enough to style transfer or any other generative network at 60 FPS. But if we do get to that point, it’s entirely possible see generative neural networks become an alternative rendering pipe line to the standard rasterization approach. In this way, neural networks could generate each frame of a game in real time, based on realtime feedback from the user. But it also potentially allows for a much more powerful creative approach, for the creator and the end user. Imagine playing Gears of War, but then telling the computer “Keep the gameplay, story, and 3d models, but make it look like Zelda: Breath of the Wild.” This is how creating or playing a future gaming experience could be, all because computers now know what things “look like” and can make other things “look like” them too.
  • 94.
    3DContentgeneration from Videoto3D Production-LevelFacial Performance Capture Using Deep Convolutional Neural Networks In Proceedings of SCA'17, Los Angeles, CA, USA, July 28-30, 2017 http://research.nvidia.com/publication/facial-performance-capture-deep -neural-networks Samuli Laine, Tero Karras, Timo Aila, Antti Herva (Remedy Entertainment), Shunsuke Saito (Pinscreen, University of Southern California), Ronald Yu (Pinscreen, University of Southern California), Hao Li (USC Institute for Creative Technologies, University of Southern California, Pinscreen), Jaakko Lehtinen (NVIDIA, Aalto University) NVIDIA and game developer Remedy (Alan Wake, Quantum Break) showcased their team-up solution to streamlining motion capture and animation using a deep learning neural network, running on NVIDIA’s powerful DGX-1 server. After being “trained” with information on previously produced animations, the network is able to generate sophisticated 3D facial animation from videos of live actors, greatly alleviating the time and labor burden of traditional mo-cap animation — it can even learn enough to generate facial animation from just an audio clip. The companies believe this system could eventually produce animation that’s just as good or better than traditionally produced fare. http://www.animationmagazine.net/events/siggraph-facial-animation-advances-fabri c-engine-the-french-contingent/ “We present a real-time deep learning framework for video-based facial performance capture -- the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist- enhanced animations. With 5-10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. “
  • 95.
    3DContentgeneration from Video(&Audio)toVideo Face2Face: Real-time Face Capture and Reenactment of RGB Videos Justus Thies1 Michael Zollhöfer 2 Marc Stamminger 1 Christian Theobalt 2 Matthias Nießner 3 1 University of Erlangen-Nuremberg2 Max Planck Institute for Informatics 3 Stanford University http://www.graphics.stanford.edu/~niessner/thies2016face.html https://doi.org/10.1109/CVPR.2016.262 Neural Face Editing with Intrinsic Image Disentangling Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, Dimitris Samaras (Submitted on 13 Apr 2017) https://arxiv.org/abs/1704.04131 University of Washington researchers have developed new algorithms that solve a thorny challenge in the field of computer vision: turning audio clips into a realistic, lip-synced video of the person speaking those words. As detailed in a paper to be presented Aug. 2 at SIGGRAPH 2017, the team successfully generated highly-realistic video of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics using audio clips of those speeches and existing weekly video addresses that were originally on a different topic. Synthesizing Obama: learning lip sync from audioSupasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman ACM Transactions on Graphics (TOG), Volume 36 Issue 4, July 2017, https://doi.org/10.1145/3072959.3073640 http://www.washington.edu/news/2017/07 /11/lip-syncing-obama-new-tools-turn-a udio-clips-into-realistic-video/
  • 96.
    3DContentgeneration Styletransfer formaps NeuralNetworks and The Future of 3D Procedural Content Generation by Sam Snider-Held, Creative Technologist at MediaMonks, focusing on the intersection of AR, VR, AI, UX, and Style transfer output on the left, real terrain on the right. Both are planes whose vertices are being displaced by the height map texture. Now was time to create my own style transfer light field and light field renderer. I basically reimplemented Andrew Lowndes’ WebGl light field renderer in Unity. What this post demonstrates is the idea that neural network could radically change how we generate 3D content. I went with light fields because currently my GPU is not fast enough to style transfer or any other generative network at 60 FPS. But if we do get to that point, it’s entirely possible see generative neural networks become an alternative rendering pipe line to the standard rasterization approach. In this way, neural networks could generate each frame of a game in real time, based on realtime feedback from the user. But it also potentially allows for a much more powerful creative approach, for the creator and the end user. Imagine playing Gears of War, but then telling the computer “Keep the gameplay, story, and 3d models, but make it look like Zelda: Breath of the Wild.” This is how creating or playing a future gaming experience could be, all because computers now know what things “look like” and can make other things “look like” them too.
  • 97.
    3DContentgeneration Styletransfer tofantasy https://uploadvr.com/google -tango-app-turns-real-world -into-the-matrix/ TheTango Martix Scanner from VR and AR developer Null Real uses the special cameras fitted to Tango-ready Android phones to turn the walls, floors and ceilings of the environment around you into the virtual data streams that Keanu Reeves sees towards the end of the legendary 1999 sci-fi flick.
  • 98.
    InteractiveContentgeneration IEEE Transactions onAffective Computing > Volume: 2 Issue: 3 Experience-Driven Procedural Content Generation Date of Publication: 05 April 2011 https://doi.org/10.1109/T-AFFC.2011.6 “Procedural content generation (PCG) is an increasingly important area of technology within modern human-computer interaction (HCI) design. Personalization of user experience via affective and cognitive modeling, coupled with real-time adjustment of the content according to user needs and preferences are important steps toward effective and meaningful PCG. Games, Web 2.0, interface, and software design are among the most popular applications of automated content generation. “ Emotion in Games Pp 155-166 Part of the Socio-Affective Computing book series (SAC, volume 4) Emotion-Driven Level Generation Julian Togelius, Georgios N. Yannakakis https://doi.org/10.1007/978-3-319-41316-7_9 AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Targeting Horror via Level and Soundscape Generation Phil Lopes, Antonios Liapis, Georgios N. Yannakakis This paper presented improvements to the Sonancia system, a multi-faceted level generator for the horror genre. The additions include a level generation system that optimizes towards a designer-defined tension curve, while still providing a degree of variability. The paper also presented some initial methodologies for creating soundscapes of generated levels by directly using the distribution of monsters in the level’s path from the starting player position to the goal. Several experiments studied the impact of designer tension curves on level generation and sonification, as well as the efficiency of the GA in generating larger maps. and adaptive to user behavior. Great strides have already been made with motion capture, haptics, eye-tracking, and natural language processing. What has been missing is a serious effort to link mixed reality to the ultimate computing platform—the human brain. Our add-ons give your AR/VR Headset eye tracking superpowers. Pupil Labs. https://pupil-labs.com/vr-ar/
  • 99.
  • 100.
    PointClouds BenchmarkDataset http://semantic3d.net/ |”ImageNet of point clouds” | Dataset for semantic segmentation of unordered point cloud data What do we provide? We have created a framework for the fair evaluation of semantic classification in 3D space. In this framework we provide: ● A large set of point clouds with over one billion of labelled points. ● Ground truth, hand-labelled by professional assessors. ● A common evaluation tool providing the established intersection-union measure along with the full confusion matrix. semantic-8 semantic-8 is a benchmark for classification with 8 class labels, namely {1: man-made terrain, 2: natural terrain, 3: high vegetation, 4: low vegetation, 5: buildings, 6: hard scape, 7: scanning artefacts, 8: cars}. An additional label {0: unlabeled points} marks points without ground truth and should not be used for training! In total over a billion points are provided. Please check out the reduced benchmark if your method is too computational demanding for the full data set.
  • 101.
    PointClouds Syntheticdatasets You couldalways mesh point clouds, and convert meshes to point clouds. https://arxiv.org/abs/1702.08558 https://arxiv.org/abs/1706.06782 https://arxiv.org/abs/1703.06907 https://arxiv.org/abs/1505.00171 SynthCam3D is a library of synthetic indoor scenes collected from various online 3D repositories and hosted at http://robotvault.bitbucket.org.
  • 102.
    PointClouds Indoor Dataset Announcingthe Matterport3D Research Dataset https://hackernoon.com/announcing-the-matterport3d-research-dataset-815cae932939 We’re excited that groups at Stanford, Princeton, and TUM have painstakingly hand-labeled a wide range of spaces offered up by customers and made these labeled spaces public in the form of the Matterport 3D dataset. This dataset contains 10,800 aligned 3D panoramic views (RGB + depth per pixel) from 194,400 RGB + depth images of 90 building-scale scenes. All of these scenes were captured with Matterport’s Pro 3D Camera. The 3D models of the scenes have been hand-labeled with instance- level object segmentation. If you’re passionate about 3D and interested in an even bigger dataset, Matterport internally has roughly 7500x as much 3D data than is in this dataset You can access the dataset and sample code here and read the paper here. We’d like to thank Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang for their work in labeling this dataset and developing algorithms to run on it. We’d also like to thank all the Matterport camera owners who gave us permission to include their 3D models in this dataset.
  • 103.
    MeshDataset ModelNet The goalof the Princeton ModelNet project is to provide researchers in computer vision, computer graphics, robotics and cognitive science, with a comprehensive clean collection of 3D CAD models for objects. To build the core of the dataset, we compiled a list of the most common object categories in the world, using the statistics obtained from the SUN database. Once we established a vocabulary for objects, we collected 3D CAD models belonging to each object category using online search engines by querying for each object category term. Then, we hired human workers on Amazon Mechanical Turk to manually decide whether each CAD model belongs to the specified cateogries, using our in-house designed tool with quality control. To obtain a very clean dataset, we choose 10 popular object categories, and manually deleted the models that did not belong to these categories. Furthermore, we manually aligned the orientation of the CAD models for this 10-class subset as well. We provide both the 10-class subset and the full dataset for download. http://modelnet.cs.princeton.edu/ Skolkovo Institute of Science and Technology https://arxiv.org/abs/1704.01222 A Kd-tree built on the point cloud of eight points (left), and the associated Kd-network built for classification (right). We number nodes in the Kd-tree from the root to leaves. Therefore leaf nodes, which correspond to original points, are numbered starting from 8. The arrows indicate information flow during forward pass (inference). The leftmost bars correspond to leaf (point) representations. The rightmost bar corresponds to inferred class posteriors v0. Circles correspond to linear (affine) transformations with learnable parameters. Colors of the circles indicate parameter sharing, as splits of the same type (same orientation, same tree level – three “green” splits in this example) share the transformation parameters.
  • 104.
    MeshDataset ShapeNet https://arxiv.org/abs/1512.03012 Overview https://www.shapenet.org/ ShapeNetCoreis a subset of the full ShapeNet dataset with single clean 3D models and manually verified category and alignment annotations. It covers 55 common object categories with about 51,300 unique 3D models. The 12 object categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset, are all covered by ShapeNetCore. ShapeNetSem is a smaller, more densely annotated subset consisting of 12,000 models spread over a broader set of 270 categories. In addition to manually verified category labels and consistent alignments, these models are annotated with real-world dimensions, estimates of their material composition at the category level, and estimates of their total volume and weight. ShapeNet Model Viewer and Renderer https://github.com/ShapeNet/shapenet-viewer This Java+Scala code was used to render the ShapeNet model screenshots and thumbnails. It can handle loading of OBJ+MTL, COLLADA DAE, KMZ, and PLY format 3D meshes. This is a realtime OpenGL-based renderer. If you would like to use a raytracing framework for rendering, then a fork of the Mitsuba renderer has been created by Jian Shi to handle ShapeNet models.
  • 105.
    MeshCorrespondence “Googlingshapes” We areorganizing a large-scale 3D shape retrieval contest as part of the Eurographics 2017 3D Object Retrieval Workshop. More information available www.shapenet.org/shrec17.
  • 106.
    MeshCorrespondence “inPractice You couldfor example had a 3D CAD model (or a whole architectural BIM model), and you would like to search for similar parts from GrabCad, TraceParts, Thingiverse or Pinshape, and build more value on top of that for example via shape completion, finite element analysis (FEM), generative design or manufacturability for CNC milling / 3D printing. Learning Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes Aditya Balu, Sambit Ghadai, Kin Gwn Lore, Gavin Young, Adarsh Krishnamurthy, Soumik Sarkar https://arxiv.org/abs/1612.02141 3D convolutional neural network for the classification of whether or not a design is manufacturable [ design for manufacturability (DFM)]. In this example, a block with a drilled hole with specific diameter and depth is considered. Automated Design for Manufacturing and Supply Chain Using Geometric Data Mining and Machine Learning Hoefer, Michael Jeffrey Daniel. Iowa State University, Master’s thesis https://search.proquest.com/openview/aaa80836db1abd17b6414d1f9c65349e/1 2015 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics) CAD Parts-Based Assembly Modeling by Probabilistic Reasoning Kai-Ke Zhang ; Kai-Mo Hu ; Li-Cheng Yin ; Dong-Ming Yan ; Bin Wang, 26-28 Aug. 2015 https://doi.org/10.1109/CADGRAPHICS.2015.29
  • 107.
  • 108.
  • 109.
    OwnDatasets Labeling isverytime-consuming Bothin academic research, and in industry the labelled data is extremely valuable. In academia, it is easy to use the same datasets to benchmark the performance of the models, but in practice in business, another way to improve the performance is to simply add more labelled data to your proprietary dataset. To optimize the labeling efforts, several frameworks have been developed: https://arxiv.org/abs/1707.04796 Adriana Kovashka, Olga Russakovsky, Li Fei-Fei and Kristen Grauman (2016), "Crowdsourcing in Computer Vision", Foundations and Trends® in Computer Graphics and Vision: Vol. 10: No. 3, pp 177-243. http://dx.doi.org/10.1561/0600000071 What can I use to quickly build a labeling tool for my training data? I need to label my training data - web documents in my case, but it actually does not matter. Is there a generic framework or tool that I can use to quickly build UI for a labeling tool for a particular kind of data like documents, images and etc? Ideally the tool should allow multiple people to label the same data set. - > CrowdFlower, datapure; pilab-annotator / pylabelme (stackoverflow.com) 3D Mesh Labeling via Deep Convolutional Neural Networks Guo et al. (2015), ACM Transactions on Graphics (TOG) Volume 35 Issue 1, December 2015, https://doi.org/10.1145/2835487 Learning Hierarchical Shape Segmentation and Labeling from Online Repositories Li Yi, Leonidas Guibas, Aaron Hertzmann, Vladimir G. Kim, Hao Su, Ersin Yumer; (Submitted on 4 May 2017) https://arxiv.org/abs/1705.01661
  • 110.
  • 111.
    AR/VR Therolein construction? What’snext for VR in construction? Published: 15 March 2017, Tridify’s Nigel Alexander http://aecmag.com/59-features/1296-what-s-next-for-vr-in-construction (top) Using 4D simulation, VR can help improve safety on construction sites, (bottom) VR can be used to identify potential hazards in office environments This technology will simultaneously empower the sales and marketing side of the construction industry, by enabling developers to showcase projects early on. It will also help save costs by reducing wastage and rework, by making it easier for all parties to collaborate on the design and layout of buildings. VR meets the IoT As smart cities become increasingly popular with developers, we’ll be creating environments that can be easily linked to VR. Buildings are being fitted with vast numbers of electronic chips for smart monitoring. One of the results of this is that you start to collect data – an aspect of which is ‘movement’. At any given period of time, you might have X number of people moving through a particular area, but what might this actually mean and what are the possible outcomes? Health & safety If, for example, you had a 2D plan of an office, you could look at it, and consider what might be hazardous about it – but imagine if you had instead a virtual, three-dimensional environment where it was easy to add or subtract objects and elements. You could visualise how a desk blocks an emergency exit, or show that an electrical box is not secured. From there, it would just be a question of going around and looking at the hazard, and identifying what measures were needed, perhaps a desk exclusion zone. Green construction Finally, there are important environmental considerations, too. If you can create a building in data before you construct it, it’s quite simply much cheaper to make any necessary modifications. Not only are you free to play around with a virtual environment, you can present it to the end user and gain relevant feedback to know exactly what is needed. No developer has the omniscience to predict every element of what a customer will need – a surgeon in a hospital for example, is going to have a far better knowledge of what they will need from an emergency environment than a developer ever will.
  • 112.
    SLAM3D Maps AsPervasiveasGoogleMaps http://augmentedpixels.com/slam-3d-maps-augmented-reality -robotics-will-worth-google-maps/ Adoptedfrom Vitaliy Goncharuk (CEO and Founder of Augmented Pixels) Most of new mobile devices will have at least stereo camera capabilities (and/or structured light sensing, and even solid-state LIDARs if cheap enough). Each device will be creating a 3D SLAM map and with the sheer number of such devices, indoor and outdoor maps can be mainted with minimal effort using crowdsourcing. “Just imagine that thanks to 3D SLAM cloud maps a man with a mobile phone and AR glasses will be able to interact with other people with mobile phones/AR Glasses and robots in real time in the same coordinate system. This opens up great prospects for improvement of current patterns of behavior (indoor navigation, etc.) and for creation of fundamentally new services, patterns of behavior and accumulation of fundamentally new knowledge, which will exceed the value of Google Maps in many times!”
  • 113.
    GoogleMapsIndoorMaps https://www.google.com/intl/en_uk/maps/about/partners/indoormaps/ Google Business View,https://www.google.com/streetview/hire/ Use indoor maps to view floor plans You can see and navigate inside places like airports, department stores, and malls using the Google Maps app. Note: Indoor maps is only available in selected locations. See a list of places that have indoor maps. In addition for indoor navigation (useful for example for visually impaired, indoor drones, etc.), Google wants to visualize the indoor spaces like they have done for StreetView. In USA, businesses can have their indoors already 360º photographed (for example how a restaurant looks inside)
  • 114.
    IndoorMap Beyondcommerceandadtech We've gotGoogle Maps to help us out when we need to navigate outdoors, but Google can only map out so many indoor locations without getting creepy. And that's where Stimulant comes in. This "innovation studio" built a Microsoft HoloLens app that lets you map out an area, define locations, and use the headset to get instant directions to any defined location. Stimulant's HoloLens App Helps Navigate Inside Buildings BY ADAM DACHIS 09/03/2016 https://hololens.reality.news/news/stimulants-hololens-app-helps-navigate- inside-buildings-0171946/ https://vimeo.com/168415931 http://mashable.com/2017/05/17/google-visual-positioning-service-tango-augmented-reality BY RAYMOND WONG MAY 17, 2017 At its Google I/O developers conference, Google announced a new technology called Visual Positioning Service (VPS), a Tango-enabled mapping system that uses augmented reality on phones and tablets to help navigate indoor locations. Google says VPS makes use of machine learning, computer vision and mapping coordinates to do just that. Along with audio interfaces, Google says VPS could help the visually impaired find their way around the world, where they previously would have had difficulties.
  • 115.
    GoogleMapsmeet MatterportandNCTech#1 Matterport partnerswith Google to bring 3D Street View perspectives indoors Posted May 9, 2017 by Lucas Matney (@lucas_matney) When you’re looking at moving into a new space, Street View is often a useful tool to get the general vibe of the area but it’s almost impossible to really tell what spaces look like indoors without physically being there. Today, users clicking through locations on Google Street View will start seeing quite a bit more businesses pop up that they can actually jump into and explore themselves. This is possible thanks to a partnership between Google and Matterport. Google has already been doing a bit of indoor surveying through partnerships with individual 360 photographers, but this partnership opens Street View up to a much larger library of content. Matterport has an index of over a half-million indoor spaces that users can view using either a web viewer of VR headset. It will ultimately be up to the individual partners of Matterport to decide if their content ends up being viewable on Street View, but the company believes this partnership will greatly expand the reach of its customers. Matterport is ultimately not the only partner to whom Google is opening its Street View API, but it is the sole company which will be offering 3D views of spaces in addition to 360-degree scans which should allow for more compelling views as Google embraces new technologies like virtual reality. Google Street View Teams with NCTech and Matterport BY SEAN HIGGINS, SPAR 3D EDITOR ON MAY 17, 2017 TECHNOLOGY: HARDWARE, INDUSTRY, RELATED & NEW TECHNOLOGIES, SOFTWARE INDUSTRIES: ARCHITECTURE ENGINEERING & CONSTRUCTION (AEC), TRANSPORTATION & INFRASTRUCTURE Though the Scottish company NCTech has garnered attention recently for its LASiris VR 3D capture device, it is also well known for their 360° HDR cameras. On May 11th, the company announced that it will be producing one such camera for Google Street View. This summer, the company is making a move to “help small businesses market their venues” by offering a Street View API that enables users to publish their captures to Google’s platform with “the click of a button.” Once published, the scans will be available on Google Maps and in Google Search. “Matterport is excited to partner with Google to enhance the way business owners market to customers around the world,” said Bill Brown, CEO of Matterport. “Our all-in-one solution helps businesses promote their venues and provide a preview of what customers can expect.”
  • 116.
    GoogleMapsmeet MatterportandNCTech#2 Spar3D: Thissummer, the company is making a move to “help small businesses market their venues” by offering a Street View API that enables users to publish their captures to Google’s platform with “the click of a button.” Once published, the scans will be available on Google Maps and in Google Search. https://www.nctechimaging.com/downloads-files/iris360-brochure-a5.pdf Supportportal:https://nctech.zendesk.com/hc/en-us/community/topics/200412007-iris360
  • 117.
    GoogleMapsmeet Matterportfor customers#1 MatterportSpaces: Reach Millions on Google Street View Attract more customers and win more business with 3D, VR, and more! Create immersive 3D and VR experiences with Matterport today and publish your virtual walkthroughs to Google Street View in just a few months. Join our Beta program below! https://matterport.com/gsv/ What is Google Street View and Matterport for Business Listings? https://support.matterport.com/hc/en-us/articles/115006844048-FAQ-Ma tterport-for-Business-Listings-Publish-to-Google-Street-View- What kinds of places can I publish? ● Business Listings — retail and restaurants ● Places of Interest — museums and landmarks ● Multifamily — apartment complexes ● Travel and Hospitality — hotels and resorts ● Vacation & Short Term Rentals — nightly rentals only ● Commercial Real Estate — office spaces Private homes (residential real estate) cannot be published to Google Street View. Nightly rentals are allowed.
  • 118.
    GoogleMapsmeet Matterportfor customers#2 Whyis Google putting lidar on its new Street View cars? BY SEAN HIGGINS, SPAR 3D EDITOR ON SEPTEMBER 9, 2017 https://www.spar3d.com/blogs/the-other-dimension/google-putting-lid ar-new-street-view-cars/ Earlier this week, your friends at Arstechnica published a little piece about Google’s new Street View cars and their inclusion of lidar sensors. On top of the new cars, you’ll spot an integrated system that includes 7 cameras and two Velodyne Pucks, though it’s uncertain exactly which version of the puck it is. These sensors aren’t for automating the cars, and we know this because they’re placed at an unusual angle—45° rather than the 15° you’d commonly find on self-driving cars. To that, I would add my own speculation that Google is gathering outdoor building data that they can connect to interior captures taken by sensors from Matterport or NCTech, both companies that have partnered with the Google Street View initiative. With a 3D data set that bridges the indoors and outdoors, we could get the indoor navigation so many have been asking for. We’d also get better AR, and–much to Google’s pleasure I’m sure–much more precise ways to show us advertising. Google wants you to help feed its image-hungry algorithms. The tech industry’s recent interest in virtual reality has made 360 degree cameras relatively cheap. This summer, Google began certifying some cameras as “ Street View ready,” meaning you can upload your own panoramas through the Street View mobile app to live on the company’s service. That footage will be processed by Google’s image recognition algorithms for fresh map data just like its own imagery. https://www.wired.com/story/googles-new-street-view-cam eras-will-help-algorithms-index-the-real-world/
  • 119.
    GoogleMapsmeet Matterportfor customers#3 3D,VR, 360° and Street View Photographers | Real Estate Agents Need a 3D/VR/360°/Street View Photographer? 3,236 We Get Around Network Members in 104 Countries! Dan Smigrod (23 Sept 2017): “Soon, Matterport will enable Room Labels to be automatically generated using Artificial Intelligence (AI) as discussed below and in the Matterport Satisfaction & New Feature Survey 2017.” Matterport: "We’ve used it internally to build a system that segments spaces captured by our users into rooms and classifies each room. It’s even capable of handling situations in which two types of room (e.g. a kitchen and a dining room) share a common enclosure without a door or divider. In the future, this will help our customers skip the task of having to label rooms in their floor plan views," according to Matterport Co-Founder Matt Bell is this article published Thursday (21 September 2017). "Ultimately, we want to do for the real world what Google did for the web – enable any space to be indexed, searched, sorted, and understood, enabling you to find exactly what you’re looking for. Want to find a place to live that has three large bedrooms, a sleek modern kitchen, a balcony with a view of a pond, a living room with a built-in fireplace, and floor-to-ceiling windows? No problem! Want to inventory all the furniture in your office, or compare your construction site’s plumbing and HVAC installations against the CAD model? Also easy!" https://www.metroplex360.co m/virtual-tours-google-stre etview/ https://www.mp2sv.com/ We provide support for any platform including: DSLR Photospheres, Ricoh Theta S, iGuide, RealVision, Matterport 360 Snapshots
  • 120.
    GoogleMapsmeetsEarth VR Google EarthVR app gets support for Street View Posted Sep 14, 2017 by Lucas Matney (@lucasmtny) https://techcrunch.com/2017/09/14/google-earth-vr-app-gets-su pport-for-street-view/ Google Earth VR is getting a little update today that brings your views to street-level in the world-exploring virtual reality app. The app is adding Street View into the app so that users can easily transition between 3D satellite views and 360 camera captures on the ground level. Introducing Google Earth VR, our next step to help the world see the world. With Earth VR, you can fly over a city, stand at the edge of a mountain, and even soar into space. Google Earth VR is available now on Steam for the HTC Vive. https://www.youtube.com/watch?v=SCrkZOx5Q1M
  • 121.
    Beyondstructure Howpeopleinteract withthespace? Keywords: indoor semantic inference; activity recognition; multi-length windows; virtual samples; virtual features; deep learning http://dx.doi.org/10.3390/s17061214 The architecture of the stacked autoencoder used in DeepMap+ http://www.behavioranalyticsretail.com/7-technologies-to-track-people/ Recognizing human actions from unknown and unseen (novel) views is a challenging problem. We propose a Robust Non-Linear Knowledge Transfer Model (R-NKTM) for human action recognition from novel views. The proposed R-NKTM is a deep fully-connected neural network that transfers knowledge of human actions from any unknown view to a shared high-level virtual view by finding a non-linear virtual path that connects the views https://arxiv.org/abs/1602.00828
  • 122.
    Imaging Noveltechniques:Transientimaging https://doi.org/10.1109/ICCPHOT.2017.7951478 Can wereconstruct the entire internal shape of a room if all we can directly observe is a small portion of one internal wall, presumably through a window in the room? While conventional wisdom may indicate that this is not possible, motivated by recent work on `looking around corners', we show that one can exploit light echoes to reconstruct the internal shape of hidden rooms. Can we reconstruct the shape of a hidden closed room with a small peephole? We will show an experimental set up, built with a transient camera and a pico-second laser, that can infer the shape of a hidden room. The transient camera consists of single SPAD detector (single-photon avalanche diode) with 30 ps jitter. The pico- second laser has a jitter comparable to that a SPAD, and emits pulses at 530-570 nm wavelength. Coherence lengths at this bandwidth are too low to do interferometric measurements. Instead, we rely on the arrival times of photon echoes in our algorithm. Block diagram of experimental setup: The three core components of our setup are illumination hardware, SPAD electronics, and reconstruction algorithm. The illumination hardware consists of a pulsed laser, which sends periodic pulses of short duration and a galvo to control the position of the beam. The SPAD electronics consists of SPAD to detect a photon and Picoharp to compute the timing of that photon. The data from Picoharp is fed to a computer where the reconstruction algorithm will computationally determine the shape of the room. Adithya Kumar Pediredla ; Mauro Buttafava ; Alberto Tosi ; Oliver Cossairt ; Ashok Veeraraghavan Published in: Computational Photography (ICCP), 2017 IEEE International Conference on Date of Conference: 12-14 May 2017
  • 123.
    Imaging Noveltechniques:CompressedSensing #1 Compressedsensing (CS, also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal. CS enables a potentially large reduction in the sampling and computation costs for sensing signals that have a sparse or compressible representation. Compressed sensing DL Donoho - IEEE Transactions on information theory, 2006 Cited by 19,074 articles, see Related articles An introduction to compressive sampling EJ Candès, MB Wakin - IEEE signal processing magazine, 2008 Cited by 6779 articles, see Related articles A Framework for Compressive-Sensing of 3D Point Clouds Vahid Behravan ; Gurjeet Singh ; Patrick Y. Chiang (2016) https://doi.org/10.1109/CIS.2016.0024 “ A key question in any power efficient LiDAR system (e.g. wireless sensor applications) is how many points we need to capture to fully obtain the scene point cloud. The lower points we need to capture the lower energy will be needed to transmit this data to the receiver. Also it increases the frame rate. Compressive sensing is a method that enables reduction of the LiDAR data. Experimental results show that excluding edge points from error calculation gives us better criteria to decide the best compression ratio in the system.” A new approach to apply compressive sensing to LIDAR sensing Richard C. Lau; T. K. Woodward (2016) http://dx.doi.org/10.1117/12.2058777 Most CS methods require sequential capture of a large number of random data projections, which is not advantageous to LIDAR systems, wherein reduction of 3D data sampling is desirable. In this paper, we introduce a new method called Resampling Compressive Sensing (RCS) that can be applied to a single capture of a LIDAR point cloud to reconstruct a 3- dimensional representation of the scene with a significant reduction in the required amount of data. Examples of 50 to 80% reduction in point count are shown for sample point cloud data. The proposed new CS method leads to a new data collection paradigm that is general and different from traditional CS sensing such as the single-pixel camera architecture.
  • 124.
    Imaging Noveltechniques:CompressedSensing #2 Compressivesensing for reconstruction of 3D point clouds in smart systems Ivo Stančić ; Milos Brajović ; Irena Orović ; Josip Musić (2017) https://doi.org/10.1109/SOFTCOM.2016.7772129 Introduction of simple structured-light scanners makes possible fast scanning, effective robot detection and evasion of obstacles. Nevertheless, some obstacles may still be difficult to detect and recognize, primarily due to limitations of scanner's hardware which results in a low number of reconstructed surface points. In this paper a compressed sensing technique, primarily used for the reconstruction of 2D images, is utilized to enhance the quality of 3D scan, by increasing the number of reconstructed 3D points to the scanner's theoretical maximum. Sparse representation for colors of 3D point cloud via virtual adaptive sampling Junhui Hou ; Lap-Pui Chau ; Ying He ; Philip A. Chou (2017) https://doi.org/10.1109/ICASSP.2017.7952692 “it is common that a point cloud contains millions of points, leading to huge amounts of data, so effective and efficient compression schemes have to be developed due to limited network bandwidth and storage space. The acquired data may be defective due to occlusion or other factors (e.g., noise and holes), and thus, preprocessing operations have to be performed to restore it” Amir Adler, Michael Elad, Michael Zibulevsky https://arxiv.org/abs/1610.09615 The contributions of this paper are two- fold: (1) It presents for the first time, to the best knowledge of the authors, the utilization of a deep neural network for the tasks of compressive linear sensing and non- linear inference; and (2) During training, the proposed network jointly optimizes the compressive sensing matrix and the inference operator, leading to a significant advantage compared to state-of-the-art for the task of image classification Igor Carron | Nuit Blanche Jason Laska's thesis presentation slides entitled: Regime Change Sampling Rate vs. Bit-Depth in Compressive Sensing
  • 125.
    CompressionConvergencewithartificialintelligence http://dx.doi.org/10.1038/nature14541 Data compression andprobabilistic modelling are two sides of the same coin, and Bayesian machine-learning methods are increasingly advancing the state-of-the-art in compression. The connection between compression and probabilistic modelling was established in the mathematician Claude Shannon’s seminal work on the source coding theorem, which states that the number of bits required to compress data in a lossless manner is bounded by the entropy of the probability distribution of the data. The link to Bayesian machine learning is that the better the probabilistic model one learns, the higher the compression rate can be (MacKay, 2003). These models need to be flexible and adaptive, since different kinds of sequences have very different statistical patterns (say, Shakespeare’s plays or computer source code). It turns out that some of the world’s best compression algorithms [for example, Sequence Memoizer (Wood et al. 2011) and PPM with dynamic parameter updates (Steinruecken et al. 2015)] are equivalent to Bayesian non- parametric models of sequences, and improvements to compression are being made through a better understanding of how to learn the statistical structure of sequences. Future advances in compression will come with advances in probabilistic machine learning, including special compression methods for non-sequence data such as images, graphs and other structured objects. The key distinction between problems in which a probabilistic approach is important and problems that can be solved using non-probabilistic machine-learning approaches is whether uncertainty has a central role. Moreover, most conventional optimization-based machine- learning approaches have probabilistic analogues that handle uncertainty in a more principled manner. For example, Bayesian neural networks represent the parameter uncertainty in neural networks (Neal, 1996), and mixture models are a probabilistic analogue for clustering methods ( MacKay, 2003). http://dx.doi.org/10.1111/j.1467-8659.2006.00957.x https://doi.org/10.1016/j.cag.2009.03.019 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.604.8269 Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks https://arxiv.org/abs/1705.07832
  • 126.
    Imaging FusionwithDeeplearningandAI Memristor ImageProcessor Uses Sparse Coding to See By Katherine Bourzac - Posted 25 May 2017 | 12:00 GMT http://spectrum.ieee.org/tech-talk/semiconductors/optoe lectronics/memristor-camera-chip-uses-sparse-coding-to- see Now researchers led by Wei Lu at the University of Michigan have designed hardware specifically to run brain-like “sparse coding” algorithms. Their system learns and stores visual patterns, and can recognize natural images while using very little power compared to machine learning programs run on GPUs and CPUs. Lu hopes these designs, described this week in the journal Nature Nanotechnology, will be layered on image sensors in self-driving cars. The key, he says, is thinking about hardware and software in tandem. “Most approaches to machine learning are about the algorithm,” says Lu. Conventional processors use a lot of energy to run these algorithms, because they are not designed to process large amounts of data, he says. “I want to design efficient hardware that naturally fits with the algorithm,” he says. Running a machine-learning algorithm on a powerful processor can require 300 watts of power, says Lu. His prototype uses 20 milliwatts to process video in real time. Lu says that’s due to a few years of careful work modifying the hardware and software designs together. Chronocam - A new standard: Bio-inspired vision sensing + processing http://www.chronocam.com/wp-content/uploads/2016/09/Technology.pdf
  • 127.
    Processing Distributedprocessing...fordroneLiDARhives? http://www.wired.co.uk/article/improbable-quest-to-build-the-matrix SpatialOS https://improbable.io/ SpatialOS isa cloud-based computational platform that lets you use many servers and engines to power a single world. The platform coordinates a swarm of micro-services called workers, which overlap and dynamically reorganize to power a huge, seamless world. The platform also lets you handle a huge number of concurrent agents across different devices in one world. http://www.wired.co.uk/article/drone-swarms-change-warfare LOCUST - Swarming Navy Drones In the future, flying drone or a hive of flying drones can do automatic inspection and scanning of both indoor real estate (think the scaling for Google Indoor Maps instead of having a human go there and move the tripod), and outdoor construction sites in automated and autonomous fashion. One just needs affordable solid-state LiDARs and/or 360º imaging. The R&D is massive now on military drones for obvious reasons: DragonflEye Project Wants to Turn Insects Into Cyborg Drones Anti-drone radio wave startup SkySafe secures $11.5M from Andreessen Horowitz Posted Jul 20, 2017 by Josh Constine (@joshconstine) https://techcrunch.com/2017/07/20/skysafe