Omniphotos Whitepaper | SIGGRAPH Asia 2020
A methodology for extracting parallax from photography and videography, using visual synthesis, optical flow, and scene-adaptive proxy geometry.
This document summarizes a research paper on high-quality real-time video inpainting using an approach called PixMix. The paper introduces PixMix as an approach that can generate coherent video streams in real-time while handling complex backgrounds and camera movements. This is an improvement over prior approaches that either did not achieve real-time performance or imposed restrictions on camera movement. PixMix uses a combined pixel-and patch-based approach for fast, high-quality image inpainting. It also introduces new tracking and frame-to-frame coherence techniques using homography to achieve real-time video manipulation capabilities. Evaluation results showed PixMix can produce coherent video streams for objects fixed in walls and for hand-held
Invited talk on AR/SLAM and IoT in ILAS Seminar :Introduction to IoT and
Security, Kyoto University, 2020.
(https://www.z.k.kyoto-u.ac.jp/freshman-guide/ilas-seminars/ )
◆登壇者: Tomoyuki Mukasa
This document discusses compressive displays and related technologies for reducing the bandwidth requirements of multi-view and light field displays. It describes several technologies including layered 3D displays, polarization field displays, and high-rank 3D displays that decompose 4D light fields into lower dimensional representations. It also discusses using mathematical techniques like non-negative matrix factorization for further compressing display data. The document promotes open collaboration through the proposed Compressive Display Consortium to advance next generation displays.
The document discusses using projectors to augment the real world by controlling light and projecting images onto surfaces. It describes capturing geometry and reflectance information of surfaces using cameras to allow projecting customized images. This could enable applications like annotating real objects or enhancing low-light videos with daytime context.
The document discusses spatially augmented reality (SAR) and using projectors to augment real-world objects by projecting virtual images and textures onto them. It describes key challenges in SAR such as calibration, rendering, and handling shadows and reflections. SAR allows augmentation of wide areas with high resolution and avoids issues of body-worn displays. The document also discusses using photosensing RFID tags and a handheld projector to determine tag locations and enable interaction with augmented real-world objects.
This document discusses techniques for creating accommodation-invariant near-eye displays. Current VR displays cause a vergence-accommodation conflict that produces eye fatigue. The authors investigate using point spread function engineering and multi-plane displays to remove the retinal blur cue and allow accommodation to match stereopsis. They describe a prototype that uses a spatial light modulator and focal sweep to render images with different depth-invariant point spread functions. A user study shows the prototype can drive accommodation with stereopsis alone. Future work includes improving image quality and investigating multifocal lenses and user comfort.
We have built a camera that can look around corners and beyond the line of sight. The camera uses light that travels from the object to the camera indirectly, by reflecting off walls or other obstacles, to reconstruct a 3D shape.
Google Glass, The META and Co. - How to calibrate your Optical See-Through He...Jens Grubert
Slides from our ISMAR 2014 tutorial http://stctutorial.icg.tugraz.at/
Abstract:
Head Mounted Displays such as Google Glass and the META have the potential to spur consumer-oriented Optical See-Through Augmented Reality applications. A correct spatial registration of those displays relative to a user’s eye(s) is an essential problem for any HMD-based AR application.
At our ISMAR 2014 tutorial we provide an overview of established and novel approaches for the calibration of those displays (OST calibration) including hands on experience in which participants will calibrate such head mounted displays.
This document summarizes a research paper on high-quality real-time video inpainting using an approach called PixMix. The paper introduces PixMix as an approach that can generate coherent video streams in real-time while handling complex backgrounds and camera movements. This is an improvement over prior approaches that either did not achieve real-time performance or imposed restrictions on camera movement. PixMix uses a combined pixel-and patch-based approach for fast, high-quality image inpainting. It also introduces new tracking and frame-to-frame coherence techniques using homography to achieve real-time video manipulation capabilities. Evaluation results showed PixMix can produce coherent video streams for objects fixed in walls and for hand-held
Invited talk on AR/SLAM and IoT in ILAS Seminar :Introduction to IoT and
Security, Kyoto University, 2020.
(https://www.z.k.kyoto-u.ac.jp/freshman-guide/ilas-seminars/ )
◆登壇者: Tomoyuki Mukasa
This document discusses compressive displays and related technologies for reducing the bandwidth requirements of multi-view and light field displays. It describes several technologies including layered 3D displays, polarization field displays, and high-rank 3D displays that decompose 4D light fields into lower dimensional representations. It also discusses using mathematical techniques like non-negative matrix factorization for further compressing display data. The document promotes open collaboration through the proposed Compressive Display Consortium to advance next generation displays.
The document discusses using projectors to augment the real world by controlling light and projecting images onto surfaces. It describes capturing geometry and reflectance information of surfaces using cameras to allow projecting customized images. This could enable applications like annotating real objects or enhancing low-light videos with daytime context.
The document discusses spatially augmented reality (SAR) and using projectors to augment real-world objects by projecting virtual images and textures onto them. It describes key challenges in SAR such as calibration, rendering, and handling shadows and reflections. SAR allows augmentation of wide areas with high resolution and avoids issues of body-worn displays. The document also discusses using photosensing RFID tags and a handheld projector to determine tag locations and enable interaction with augmented real-world objects.
This document discusses techniques for creating accommodation-invariant near-eye displays. Current VR displays cause a vergence-accommodation conflict that produces eye fatigue. The authors investigate using point spread function engineering and multi-plane displays to remove the retinal blur cue and allow accommodation to match stereopsis. They describe a prototype that uses a spatial light modulator and focal sweep to render images with different depth-invariant point spread functions. A user study shows the prototype can drive accommodation with stereopsis alone. Future work includes improving image quality and investigating multifocal lenses and user comfort.
We have built a camera that can look around corners and beyond the line of sight. The camera uses light that travels from the object to the camera indirectly, by reflecting off walls or other obstacles, to reconstruct a 3D shape.
Google Glass, The META and Co. - How to calibrate your Optical See-Through He...Jens Grubert
Slides from our ISMAR 2014 tutorial http://stctutorial.icg.tugraz.at/
Abstract:
Head Mounted Displays such as Google Glass and the META have the potential to spur consumer-oriented Optical See-Through Augmented Reality applications. A correct spatial registration of those displays relative to a user’s eye(s) is an essential problem for any HMD-based AR application.
At our ISMAR 2014 tutorial we provide an overview of established and novel approaches for the calibration of those displays (OST calibration) including hands on experience in which participants will calibrate such head mounted displays.
The document discusses strategies for visualizing points of interest (POIs) in 3D virtual environments on mobile devices. It presents two approaches: 3D halo projections, which display halos projected onto the screen around off-screen POIs, and 3D halo circles, which show halos as circles around POIs in the 3D world. It evaluates these techniques and discusses challenges like occlusion management and reducing visual clutter with many POIs. The authors implemented prototypes on an iPhone to test performance and explore open issues like landmark occlusion to address in future work.
2008 brokerage 03 scalable 3 d models [compatibility mode]imec.archive
1) There is a trend towards capturing and modeling massive 3D environments and dynamic 4D scenes for applications like virtual worlds, games, and navigation systems.
2) Acquiring and processing large amounts of 3D data poses challenges for technologies related to acquisition, editing, transmission, rendering and presentation as the scale increases.
3) The document discusses various methods for large-scale 3D acquisition including structure from motion, stereo vision, LIDAR, structured light scanning, as well as challenges in editing, streaming, and rendering massive 3D models.
This document summarizes research on compressive light field displays. It begins by introducing the concept of light field displays and their collaborators. It then provides examples of prototypes from layered 3D displays to tensor displays. Finally, it outlines the evolution of display technologies from conventional parallax barriers to the most recent compressive displays that use techniques like nonnegative matrix factorization and computed tomography to achieve compression in time, pixels, and depth for glasses-free 3D viewing.
The document discusses 3D laser scanning, including its process, applications, benefits, and drawbacks. 3D laser scanning uses a laser beam to create a point cloud representation of an object's geometric surface by recording distance values within the scanner's field of view. The laser scanner consists of a laser system and camera that passes a laser line over an object's surface to capture 3D data points, allowing accurate models to be created digitally without touching the physical object. Applications include entertainment, 3D photography, and law enforcement. Benefits are saving time on complex modeling and accurate surface representation, while drawbacks include large file sizes and requiring post-processing and high-end technology.
- The document describes a new technique called content-adaptive parallax barriers for automultiscopic 3D displays.
- Content-adaptive barriers optimize the parallax barrier patterns for each image/video by applying non-negative matrix factorization to increase brightness and refresh rate compared to traditional time-multiplexed barriers.
- An initial prototype was constructed using off-the-shelf LCD panels to demonstrate the concept. The content-adaptive barriers allowed the system to emit higher rank light fields compared to conventional barriers.
This document summarizes Toru Tamaki's presentation on scattering in computer graphics (CG) and computer vision (CV). It discusses reflection models including diffuse/specular reflection and bidirectional reflectance distribution functions (BRDFs). It also covers subsurface scattering within materials, models for subsurface scattering including diffuse approximation and plane-parallel approximation, and measuring scattering properties including single and multiple scattering. Examples of subsurface scattering rendering from past CG papers are shown.
This document summarizes scattering in computer graphics and computer vision, including:
- Types of scattering such as diffuse reflection, specular reflection, BRDF, subsurface scattering, single scattering, and multiple scattering.
- Models for subsurface scattering including diffuse approximation, plane-parallel approximation, and Donner's empirical BSSRDF model.
- Techniques for measuring scattering properties like BRDF and rendering effects of scattering in participating media and subsurface scattering.
This document presents a method for adaptive color display using perceptually-driven factored spectral projection. It describes limitations of conventional displays in accurately reproducing colors due to their limited gamuts. The method formulates color reproduction as a nonlinear optimization problem to select multiple color primaries for each image, mapped to a display's gamut, in a way that minimizes perceptual errors in CIELAB color space. Evaluation shows the method significantly reduces color errors compared to legacy methods and allows displays to reproduce colors beyond their native gamut boundaries.
This document introduces 3D digitization technologies. It discusses acquiring visually rich 3D models through either modeling or sampling approaches. Modeling involves manually redrawing objects, while sampling uses semi-automatic processes like 3D scanning to photograph objects. Common 3D scanning devices and technologies are described, including laser scanners that use triangulation or time-of-flight measurements to sample surfaces. Raw scan data requires processing to transform redundant sampled points into a complete 3D model.
3D Scanners and their Economic FeasibilityJeffrey Funk
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how the economic feasibility of 3D scanners is becoming better through improvements in lasers, camera ICs, and processor ICs. 3D scanning is both a complement to 3D printing and a technology with its own unique applications. 3D printing of complex objects can be done from a CAD database or from a 3D scan where a 3D scan can be done with laser or other sources of white light such as LEDs.
3D scanning can also be done for other purposes. For example, scientists and engineers are using 3D scanners to survey archeological, construction, crime scene, and engineering sites, to document maintenance and repair of engineered systems, and to customize medical and dental products for humans. Improvements in lasers, LEDs, camera chips, ICs, and other components continue to improve the economic feasibility of 3D scanning. Longer wavelength lasers increase the scanning range, better camera chips improve the scanning resolution, and better lasers, camera chips, and processor ICs reduce the scanning time. For example, third generation scanners from Argon, one leading supplier, have 100 times higher resolution and one tenth the scan times of Argon’s first generation system.
For costs, lasers make up the largest percentage followed by camera and processor ICs. For example, lasers make up 80% of the hardware cost for one high-end system with a current cost of $1346 and a price of about $3000. As laser costs fall and as volumes enable smaller margins, the price of such systems will fall.
For the same reasons, low-end systems continue to emerge. These include Microsoft’s Kinect and an app for the iPhone. Microsoft’s Kinect was $150 while the app was only $4.99, both in early 2013. As such low-end systems proliferate, and high-end systems continue to get cheaper, 3D scanning will find new applications.
FARO 2014 3D Documentation Presentation by Direct Dimensions "3D Scanning for...Direct Dimensions, Inc.
Presentation at the 2014 FARO 3D Documentation Conference by Direct Dimensions called "3D Scanning for 3D Printing, Making Reality Digital, and then Physical Again, Part 2"
Lecture 8 from a course on Mobile Based Augmented Reality Development taught by Mark Billinghurst and Zi Siang See on November 29th and 30th 2015 at Johor Bahru in Malaysia. This lecture describes how to develop AR panoramas for mobile devices. Look for the other 9 lectures in the course.
Augmented Reality Using High Fidelity Spherical Panorama with HDRIZi Siang See
Zi Siang See, Mark Billinghurst, Adrian David Cheok (2015). Augmented Reality using High Fidelity Spherical Panorama with HDRI. SIGGRAPH ASIA 2015 Mobile Graphics and Interactive Applications.
Publication
http://dx.doi.org/10.1145/2818427.2818445
Video Demo
https://youtu.be/UuYUqDeM9jc
Keep in touch for research, idea exchange and collaboration
http://www.zisiangsee.com/research
METHODS AND ALGORITHMS FOR STITCHING 360-DEGREE VIDEOIAEME Publication
The rapid development of virtual reality technologies in recent years has led to an increase in interest in 360-degree video and, as a consequence, in the production of equipment for shooting. Shooting 360-degree video differs from regular video shooting by the need to use multiple cameras (lenses) to create panoramicvideo. Stitching the video from several video cameras (lenses) in order to form panoramic video comes to the fore in this case. Currently, there are a number of algorithms and software solutions available for implementing video stitching. The purpose of the paper is to analyze and search for optimal algorithms and tools for 360-degree video stitching. The analysis takes into account, first of all, the quality of the stitching algorithms, which involves the absence of visible seams in the resulting image. The performance of the stitching methods also plays an important role, since the speed of processing the video footage is critical and ideally should be done in the real-timemode, which allows broadcasting 360-degree video.
This paper proposes a new watermarking scheme for 360-degree VR videos based on spherical wavelet transforms. The watermark is first embedded in the spherical wavelet domain of the VR video to be compatible with different projection formats. A just noticeable difference model is used to control watermark imperceptibility on the viewport based on the user's head-mounted display. The scheme can detect watermarks robustly from both the spherical projection and viewport projection. Experimental results show watermarks can be reliably extracted from spherical and viewport projections, and the scheme is more robust to attacks than planar approaches when detecting from viewports.
Binocular Eye Trackingand Calibration in Head-mounted DisplaysMichael Stengel
Presentation slides from my talk on eye tracking and gaze-contingency for Virtual Reality applications.
In this talk I present the Eye Tracking Head-mounted Display proposed in the paper "An Affordable Solution for Binocular Eye Trackingand Calibration in Head-mounted Displays".
The paper won the "Best Student Paper Award" at ACM Multimedia 2015 in Brisbane, Australia.
iMinds insights - 3D Visualization TechnologiesiMindsinsights
Transforming the way we deal with information - from consumption to interaction.
iMinds insights is a quarterly publication providing you with relevant tech updates based on interviews with academic and industry experts. iMinds is a digital research center and incubator based in Belgium.
A major challenge for the next decade is to design virtual and augmented reality systems (VR at large) for real-world use cases such as healthcare, entertainment, e-education, and high-risk missions. This requires VR systems to operate at scale, in a personalized manner, remaining bandwidth-tolerant whilst meeting quality and latency criteria. One key challenge to reach this goal is to fully understand and anticipate user behaviours in these mixed reality settings.
This can be accomplished only by a fundamental revolution of the network and VR systems that have to put the interactive user at the heart of the system rather than at the end of the chain. With this goal in mind, in this talk, we describe our current researches on user-centric systems. First, we describe our view-port based streaming strategies for 360-degree video. Then, we present more in details our research on of users‘ behaviour analysis, when users interact with the 360-degree content. Specifically, we describe a set of metrics that allows us to identify key behaviours among users and quantify the level of similarity of these behaviours. Specifically, we present our clique-based clustering methodology, information theory and trajectory base in-depth analysis. Finally, we conclude with an overview of the extension of this work to navigation within volumetric video sequences.
Virtual Reality (VR360): Production Framework for the Combination of High Dyn...Zi Siang See
Zi Siang See is researching production frameworks for combining HDRI and spherical panoramas. Their research aims to reduce obstacles and issues with current techniques. For HDRI, they found extending dynamic range from a single RAW digital negative capture avoids visual abnormalities from multiple exposures, requiring less production processes. For spherical panoramas, they are studying multi-row configurations to reduce variables while capturing multiple angles. Their goal is to optimize the techniques towards a leaner production approach.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/09/learning-for-360-vision-a-presentation-from-google/
Yu-Chuan Su, Research Scientist at Google, presents the “Learning for 360° Vision,” tutorial at the May 2023 Embedded Vision Summit.
As a core building block of virtual reality (VR) and augmented reality (AR) technology, and with the rapid growth of VR and AR, 360° cameras are becoming more available and more popular. People now create, share and watch 360° content in their everyday lives, and the amount of 360° content is increasing rapidly.
While 360° cameras offer tremendous new possibilities in various domains, they also introduce new technical challenges. For example, the distortion of 360° content in the planar projection degrades the performance of visual recognition algorithms. In this talk, Su explains how his company enables visual recognition on 360° imagery using existing models. Google’s solution improves 360° video processing and builds a foundation for further exploration of this new media.
1. Augmented reality (AR) supplements reality by combining virtual elements with the real world in real-time, unlike virtual reality which completely replaces the real world.
2. AR has applications in medical surgery by allowing surgeons to see "X-ray vision" through augmented reality head-mounted displays to help with visualization and navigation during minimally invasive surgeries.
3. Accurate tracking and registration of virtual elements with the real world is challenging in AR. Hybrid tracking systems that combine approaches may help cover weaknesses and allow for greater flexibility.
The document discusses strategies for visualizing points of interest (POIs) in 3D virtual environments on mobile devices. It presents two approaches: 3D halo projections, which display halos projected onto the screen around off-screen POIs, and 3D halo circles, which show halos as circles around POIs in the 3D world. It evaluates these techniques and discusses challenges like occlusion management and reducing visual clutter with many POIs. The authors implemented prototypes on an iPhone to test performance and explore open issues like landmark occlusion to address in future work.
2008 brokerage 03 scalable 3 d models [compatibility mode]imec.archive
1) There is a trend towards capturing and modeling massive 3D environments and dynamic 4D scenes for applications like virtual worlds, games, and navigation systems.
2) Acquiring and processing large amounts of 3D data poses challenges for technologies related to acquisition, editing, transmission, rendering and presentation as the scale increases.
3) The document discusses various methods for large-scale 3D acquisition including structure from motion, stereo vision, LIDAR, structured light scanning, as well as challenges in editing, streaming, and rendering massive 3D models.
This document summarizes research on compressive light field displays. It begins by introducing the concept of light field displays and their collaborators. It then provides examples of prototypes from layered 3D displays to tensor displays. Finally, it outlines the evolution of display technologies from conventional parallax barriers to the most recent compressive displays that use techniques like nonnegative matrix factorization and computed tomography to achieve compression in time, pixels, and depth for glasses-free 3D viewing.
The document discusses 3D laser scanning, including its process, applications, benefits, and drawbacks. 3D laser scanning uses a laser beam to create a point cloud representation of an object's geometric surface by recording distance values within the scanner's field of view. The laser scanner consists of a laser system and camera that passes a laser line over an object's surface to capture 3D data points, allowing accurate models to be created digitally without touching the physical object. Applications include entertainment, 3D photography, and law enforcement. Benefits are saving time on complex modeling and accurate surface representation, while drawbacks include large file sizes and requiring post-processing and high-end technology.
- The document describes a new technique called content-adaptive parallax barriers for automultiscopic 3D displays.
- Content-adaptive barriers optimize the parallax barrier patterns for each image/video by applying non-negative matrix factorization to increase brightness and refresh rate compared to traditional time-multiplexed barriers.
- An initial prototype was constructed using off-the-shelf LCD panels to demonstrate the concept. The content-adaptive barriers allowed the system to emit higher rank light fields compared to conventional barriers.
This document summarizes Toru Tamaki's presentation on scattering in computer graphics (CG) and computer vision (CV). It discusses reflection models including diffuse/specular reflection and bidirectional reflectance distribution functions (BRDFs). It also covers subsurface scattering within materials, models for subsurface scattering including diffuse approximation and plane-parallel approximation, and measuring scattering properties including single and multiple scattering. Examples of subsurface scattering rendering from past CG papers are shown.
This document summarizes scattering in computer graphics and computer vision, including:
- Types of scattering such as diffuse reflection, specular reflection, BRDF, subsurface scattering, single scattering, and multiple scattering.
- Models for subsurface scattering including diffuse approximation, plane-parallel approximation, and Donner's empirical BSSRDF model.
- Techniques for measuring scattering properties like BRDF and rendering effects of scattering in participating media and subsurface scattering.
This document presents a method for adaptive color display using perceptually-driven factored spectral projection. It describes limitations of conventional displays in accurately reproducing colors due to their limited gamuts. The method formulates color reproduction as a nonlinear optimization problem to select multiple color primaries for each image, mapped to a display's gamut, in a way that minimizes perceptual errors in CIELAB color space. Evaluation shows the method significantly reduces color errors compared to legacy methods and allows displays to reproduce colors beyond their native gamut boundaries.
This document introduces 3D digitization technologies. It discusses acquiring visually rich 3D models through either modeling or sampling approaches. Modeling involves manually redrawing objects, while sampling uses semi-automatic processes like 3D scanning to photograph objects. Common 3D scanning devices and technologies are described, including laser scanners that use triangulation or time-of-flight measurements to sample surfaces. Raw scan data requires processing to transform redundant sampled points into a complete 3D model.
3D Scanners and their Economic FeasibilityJeffrey Funk
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how the economic feasibility of 3D scanners is becoming better through improvements in lasers, camera ICs, and processor ICs. 3D scanning is both a complement to 3D printing and a technology with its own unique applications. 3D printing of complex objects can be done from a CAD database or from a 3D scan where a 3D scan can be done with laser or other sources of white light such as LEDs.
3D scanning can also be done for other purposes. For example, scientists and engineers are using 3D scanners to survey archeological, construction, crime scene, and engineering sites, to document maintenance and repair of engineered systems, and to customize medical and dental products for humans. Improvements in lasers, LEDs, camera chips, ICs, and other components continue to improve the economic feasibility of 3D scanning. Longer wavelength lasers increase the scanning range, better camera chips improve the scanning resolution, and better lasers, camera chips, and processor ICs reduce the scanning time. For example, third generation scanners from Argon, one leading supplier, have 100 times higher resolution and one tenth the scan times of Argon’s first generation system.
For costs, lasers make up the largest percentage followed by camera and processor ICs. For example, lasers make up 80% of the hardware cost for one high-end system with a current cost of $1346 and a price of about $3000. As laser costs fall and as volumes enable smaller margins, the price of such systems will fall.
For the same reasons, low-end systems continue to emerge. These include Microsoft’s Kinect and an app for the iPhone. Microsoft’s Kinect was $150 while the app was only $4.99, both in early 2013. As such low-end systems proliferate, and high-end systems continue to get cheaper, 3D scanning will find new applications.
FARO 2014 3D Documentation Presentation by Direct Dimensions "3D Scanning for...Direct Dimensions, Inc.
Presentation at the 2014 FARO 3D Documentation Conference by Direct Dimensions called "3D Scanning for 3D Printing, Making Reality Digital, and then Physical Again, Part 2"
Lecture 8 from a course on Mobile Based Augmented Reality Development taught by Mark Billinghurst and Zi Siang See on November 29th and 30th 2015 at Johor Bahru in Malaysia. This lecture describes how to develop AR panoramas for mobile devices. Look for the other 9 lectures in the course.
Augmented Reality Using High Fidelity Spherical Panorama with HDRIZi Siang See
Zi Siang See, Mark Billinghurst, Adrian David Cheok (2015). Augmented Reality using High Fidelity Spherical Panorama with HDRI. SIGGRAPH ASIA 2015 Mobile Graphics and Interactive Applications.
Publication
http://dx.doi.org/10.1145/2818427.2818445
Video Demo
https://youtu.be/UuYUqDeM9jc
Keep in touch for research, idea exchange and collaboration
http://www.zisiangsee.com/research
METHODS AND ALGORITHMS FOR STITCHING 360-DEGREE VIDEOIAEME Publication
The rapid development of virtual reality technologies in recent years has led to an increase in interest in 360-degree video and, as a consequence, in the production of equipment for shooting. Shooting 360-degree video differs from regular video shooting by the need to use multiple cameras (lenses) to create panoramicvideo. Stitching the video from several video cameras (lenses) in order to form panoramic video comes to the fore in this case. Currently, there are a number of algorithms and software solutions available for implementing video stitching. The purpose of the paper is to analyze and search for optimal algorithms and tools for 360-degree video stitching. The analysis takes into account, first of all, the quality of the stitching algorithms, which involves the absence of visible seams in the resulting image. The performance of the stitching methods also plays an important role, since the speed of processing the video footage is critical and ideally should be done in the real-timemode, which allows broadcasting 360-degree video.
This paper proposes a new watermarking scheme for 360-degree VR videos based on spherical wavelet transforms. The watermark is first embedded in the spherical wavelet domain of the VR video to be compatible with different projection formats. A just noticeable difference model is used to control watermark imperceptibility on the viewport based on the user's head-mounted display. The scheme can detect watermarks robustly from both the spherical projection and viewport projection. Experimental results show watermarks can be reliably extracted from spherical and viewport projections, and the scheme is more robust to attacks than planar approaches when detecting from viewports.
Binocular Eye Trackingand Calibration in Head-mounted DisplaysMichael Stengel
Presentation slides from my talk on eye tracking and gaze-contingency for Virtual Reality applications.
In this talk I present the Eye Tracking Head-mounted Display proposed in the paper "An Affordable Solution for Binocular Eye Trackingand Calibration in Head-mounted Displays".
The paper won the "Best Student Paper Award" at ACM Multimedia 2015 in Brisbane, Australia.
iMinds insights - 3D Visualization TechnologiesiMindsinsights
Transforming the way we deal with information - from consumption to interaction.
iMinds insights is a quarterly publication providing you with relevant tech updates based on interviews with academic and industry experts. iMinds is a digital research center and incubator based in Belgium.
A major challenge for the next decade is to design virtual and augmented reality systems (VR at large) for real-world use cases such as healthcare, entertainment, e-education, and high-risk missions. This requires VR systems to operate at scale, in a personalized manner, remaining bandwidth-tolerant whilst meeting quality and latency criteria. One key challenge to reach this goal is to fully understand and anticipate user behaviours in these mixed reality settings.
This can be accomplished only by a fundamental revolution of the network and VR systems that have to put the interactive user at the heart of the system rather than at the end of the chain. With this goal in mind, in this talk, we describe our current researches on user-centric systems. First, we describe our view-port based streaming strategies for 360-degree video. Then, we present more in details our research on of users‘ behaviour analysis, when users interact with the 360-degree content. Specifically, we describe a set of metrics that allows us to identify key behaviours among users and quantify the level of similarity of these behaviours. Specifically, we present our clique-based clustering methodology, information theory and trajectory base in-depth analysis. Finally, we conclude with an overview of the extension of this work to navigation within volumetric video sequences.
Virtual Reality (VR360): Production Framework for the Combination of High Dyn...Zi Siang See
Zi Siang See is researching production frameworks for combining HDRI and spherical panoramas. Their research aims to reduce obstacles and issues with current techniques. For HDRI, they found extending dynamic range from a single RAW digital negative capture avoids visual abnormalities from multiple exposures, requiring less production processes. For spherical panoramas, they are studying multi-row configurations to reduce variables while capturing multiple angles. Their goal is to optimize the techniques towards a leaner production approach.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/09/learning-for-360-vision-a-presentation-from-google/
Yu-Chuan Su, Research Scientist at Google, presents the “Learning for 360° Vision,” tutorial at the May 2023 Embedded Vision Summit.
As a core building block of virtual reality (VR) and augmented reality (AR) technology, and with the rapid growth of VR and AR, 360° cameras are becoming more available and more popular. People now create, share and watch 360° content in their everyday lives, and the amount of 360° content is increasing rapidly.
While 360° cameras offer tremendous new possibilities in various domains, they also introduce new technical challenges. For example, the distortion of 360° content in the planar projection degrades the performance of visual recognition algorithms. In this talk, Su explains how his company enables visual recognition on 360° imagery using existing models. Google’s solution improves 360° video processing and builds a foundation for further exploration of this new media.
1. Augmented reality (AR) supplements reality by combining virtual elements with the real world in real-time, unlike virtual reality which completely replaces the real world.
2. AR has applications in medical surgery by allowing surgeons to see "X-ray vision" through augmented reality head-mounted displays to help with visualization and navigation during minimally invasive surgeries.
3. Accurate tracking and registration of virtual elements with the real world is challenging in AR. Hybrid tracking systems that combine approaches may help cover weaknesses and allow for greater flexibility.
Simulating Presence: BIM to Virtual RealityJason Halaby
Virtual Reality is becoming a powerful tool for architects to validate and communicate their designs. Jason Halaby presents recent experimentation and implementation of VR at WRNS Studio. This presentation was given in San Francisco on February 13, 2015 at an AIA event on "Emerging Technology in Architecture" sponsored by the Bay Area Young Architects (BAYA) group.
www.wrnsstudio.com
1. Augmented reality (AR) supplements reality by combining virtual elements with the real world in real-time, unlike virtual reality which completely replaces the real world.
2. AR has applications in medical surgery by allowing surgeons to see "X-ray vision" through augmented reality head-mounted displays to help with visualization and navigation during minimally invasive surgeries.
3. Accurate tracking and registration of virtual elements with the real world is challenging in AR and critical for acceptance, as misalignment could endanger a patient during surgery. Hybrid tracking systems combining approaches may help improve tracking accuracy.
360 video allows viewers to pan and rotate their perspective in all directions within a spherical video. Footage is captured using an omnidirectional camera or multiple cameras and stitched together to create a panoramic view. This immersive experience allows viewers to explore and interact with spaces, becoming part of the story. When combined with VR, 360 video can simulate real-life environments. Examples include a tour of the Large Hadron Collider, a 360 cockpit view, and a virtual tour of a cancer research lab.
This document summarizes HCChang's research interests and experience in dense visual simultaneous localization and mapping (SLAM). It begins with an overview of monoSLAM, PTAM, FAB-MAP and DTAM as examples of visual SLAM techniques. It then provides more detail on KinectFusion, the seminal dense visual SLAM method, and extensions like InfiniTAM, ElasticFusion and DynamicFusion. The document outlines HCChang's background and current work using time-of-flight cameras at EZImage to improve depth sensing. It proposes future work on dense visual SLAM including deploying to Nvidia's TX1 and TK1 platforms, adding loop closures and path optimization, and reconstruct
Photogrammetry is the science of taking measurements from photographs. It has roots in the mid-19th century but recent advances in technology, including inexpensive cameras, drones, and data storage, have advanced the field. Photogrammetry can now be used to create accurate 3D models and point clouds of structures from drone imagery, providing data in areas that were previously difficult or dangerous to access. The company described offers photogrammetry services using drones to capture data for industrial sites, with licensed pilots and experience capturing 3D models, orthomosaics, and aerial photos to document structures.
Augmented Reality Smartglasses 2016 trends and challengesAlexandre BOUCHET
This document discusses trends and challenges in augmented reality hardware, specifically smartglasses. It begins by outlining the transition from handheld augmented reality to head-worn augmented reality devices. It then provides an overview of current smartglasses technologies and challenges, including field of view, weight, battery life, rendering latency, tracking accuracy and more. Finally, it argues that while early smartglasses generated hype, the next phase will focus on addressing challenges to enable productive use cases in industries like manufacturing.
This is our presentation from the Immersive Sydney #WebXRWeek event. This provides an overview of web based Mixed Reality and then dives into the specifics of the new #WebXR API. It includes market statistics and information on key release dates. It also includes links to #WebXR demos and other background information.
Building Mobile AR Applications Using the Outdoor AR Library (Part 1)Mark Billinghurst
The first part of a tutorial given on November 21st at the MGIA symposium at Siggraph Asia 2013. This shows how to build Outdoor AR applications using the HIT Lab NZ's Outdoor AR library. For more information see http://www.hitlabnz.org/index.php/products/mobile-ar-framework/334
This document discusses several research papers on 360-degree 3D display technologies. It describes systems that use multiple projectors, light field regeneration, and holographic optical elements to create floating 3D images viewable from all angles. The papers explore both near-eye and tabletop display designs that do not require glasses or have a large viewing angle. The goal is to develop immersive 3D displays for applications in virtual and augmented reality.
Similar to Omniphotos Whitepaper | SIGGRAPH Asia 2020 (20)
Emmy-awarded producer, director, and senior technologist with experience in startups, education, TV, VFX, and Fortune 500 companies. Lumiere-awarded storyteller specializing in AR and VR. Excel in creative partnerships, leading teams, and delivering high-quality content for brands, products, and visions. For a comprehensive review of my roles, with media, including technology research, analysis, and forecasting, please visit my LinkedIn profile.
"State of AI, 2019," from MMC Ventures, in partnership with Barclays.
The State of AI 2019: Divergence
As Artificial Intelligence (AI) proliferates, a divide is emerging. Between nations and
within industries, winners and losers are emerging in the race for adoption, the war
for talent and the competition for value creation.
The landscape for entrepreneurs is also changing. Europe’s ecosystem of 1,600 AI
startups is maturing and bringing creative destruction to new industries. While the
UK is the powerhouse of European AI, hubs in Germany and France are thriving and
may extend their influence in the decade ahead.
As new AI hardware and software make the impossible inevitable, we also face
divergent futures. AI offers profound benefits but poses significant risks. Which
future will we choose?
Our State of AI report for 2019 empowers entrepreneurs, corporate executives,
investors and policy-makers. While jargon-free, our Report draws on unique data
and 400 discussions with ecosystem participants to go beyond the hype and explain
the reality of AI today, what is to come and how to take advantage. Every chapter
includes actionable recommendations.
Artificial intelligence (AI) is a source of both huge excitement
and apprehension. What are the real opportunities and threats
for your business? Drawing on a detailed analysis of the business
impact of AI, we identify the most valuable commercial opening in
your market and how to take advantage of them.
Meta: Four Predictions for the Future of Work
Discover the trends shaping the future of hybrid working and work in the metaverse, and how they’ll redefine inclusion in the workplace. We spoke to 2,000 employees and 400 business leaders in the US and UK to understand the impact.
How could machines learn as eciently as humans and animals? How could machines
learn to reason and plan? How could machines learn representations of percepts
and action plans at multiple levels of abstraction, enabling them to reason, predict,
and plan at multiple time horizons? This position paper proposes an architecture and
training paradigms with which to construct autonomous intelligent agents. It combines
concepts such as congurable predictive world model, behavior driven through intrinsic
motivation, and hierarchical joint embedding architectures trained with self-supervised
learning.
This document is not a technical nor scholarly paper in the traditional sense, but a position
paper expressing my vision for a path towards intelligent machines that learn more like
animals and humans, that can reason and plan, and whose behavior is driven by intrinsic
objectives, rather than by hard-wired programs, external supervision, or external rewards.
Many ideas described in this paper (almost all of them) have been formulated by many
authors in various contexts in various form. The present piece does not claim priority for
any of them but presents a proposal for how to assemble them into a consistent whole. In
particular, the piece pinpoints the challenges ahead. It also lists a number of avenues that
are likely or unlikely to succeed.
The text is written with as little jargon as possible, and using as little mathematical
prior knowledge as possible, so as to appeal to readers with a wide variety of backgrounds
including neuroscience, cognitive science, and philosophy, in addition to machine learning,
robotics, and other fields of engineering. I hope that this piece will help contextualize some
of the research in AI whose relevance is sometimes difficult to see.
InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
Abstract. We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view, where this capability is learned from a collection of single photographs, without requiring camera poses or even multiple views of each scene. To achieve this, we propose a novel self-supervised view generation training paradigm, where we sample and rendering virtual camera trajectories, including cyclic ones, allowing our model to learn stable view generation from a collection of single views. At test time, despite never seeing a video during training, our approach can take a single image and generate long camera trajectories comprised of hundreds of new views with realistic and diverse contents. We compare our approach with recent state-of-the-art supervised view generation methods that require posed multi-view videos and demonstrate superior performance and synthesis quality.
Dialing up the danger: Virtual reality for the simulation of riskAlejandro Franceschi
There is a growing interest the use of virtual reality (VR) to simulate unsafe spaces, scenarios, and behaviours. Environments that might be difficult, costly, dangerous, or ethically contentious to achieve in real life can be created in virtual environments designed to give participants a convincing experience of “being there.” There is little consensus in the academic community about the impact of simulating risky content in virtual reality, and a scarcity of evidence to support the various hypotheses which range from VR being a safe place to rehearse challenging scenarios to calls for such content creation to be halted for fear of irreversible harm to users. Perspectives split along disciplinary lines, with competing ideas emerging from cultural studies and games studies, from psychology and neuroscience, and with industry reports championing the efficacy of these tools for information retention, time efficiency and cost, with little equivalence in information available regarding impact on the wellbeing of participants. In this study we use thematic analysis and close reading language analysis to investigate the way in which participants in a VR training scenario respond to, encode and relay their own experiences. We find that participants overall demonstrate high levels of “perceptual proximity” to the experience, recounting it as something that happened to them directly and personally. We discuss the impact of particular affordances of VR, as well as a participant’s prior experience on the impact of high-stress simulations. Finally, we consider the ethical mandate for training providers to mitigate the risk of traumatizing or re-traumatizing participants when creating high-risk virtual scenarios.
- The metaverse is still being defined but has the potential to be the next iteration of the internet by seamlessly combining digital and physical lives through immersion, interactivity, and use cases beyond gaming.
- Large technology companies, venture capital, private equity, start-ups, and brands have already invested over $120 billion in the metaverse in 2022 alone, driven by expectations of its economic impact and opportunities.
- The metaverse's potential economic impact is estimated to reach up to $5 trillion by 2030, generating new business models and engagement channels across industries like e-commerce, education, healthcare, and more.
Bank of England Staff Working Paper No 605 The Macroeconomics of Central Bank...Alejandro Franceschi
Bank of England Staff Working Paper No 605 The Macroeconomics of Central Bank Issued Digital Currencies (CBDC).
A study on the macroeconomic consequences of issuing a central bank digital currency (CBDC) - a universally accessible and interest-bearing central bank liability, implemented via distributed ledgers, that competes with bank deposits as a medium of exchange. Some summary results are: a possible rise in GDP by 3%, reductions in real interest rates, distortionary taxes, and monetary transaction costs. As a second monetary policy instrument, which could substantially improve the central bank's ability to stabilize the business cycle.
THE METAVERSE IS POTENTIALLY AN $8 TRILLION TO $13 TRILLION OPPORTUNITY (Citibank):
We believe the Metaverse may be the next generation of the
internet — combining the physical and digital world in a persistent
and immersive manner — and not purely a Virtual Reality world.
A device-agnostic Metaverse accessible via PCs, game consoles, and
smartphones could result in a very large ecosystem. Based on our
definition, we estimate the total addressable market for the Metaverse
economy could grow to between $8 trillion and $13 trillion by 2030.
METAVERSE USE CASES:
Gaming is viewed as a key Metaverse use case for the next several years due to the immersive and multi-player
experience of the space currently. But we believe that the Metaverse will eventually help us find new enhanced ways to
do all of our current activities, including commerce, entertainment and media, education and training, manufacturing and
enterprise in general. Enterprise use cases of the Metaverse in the coming years will likely include internal collaboration,
client contact, sales and marketing, advertising, events and conferences, engineering and design, and workforce train
METAVERSE INFRASTRUCTURE BUILDING:
the current state, the internet infrastructure is unsuitable for building a fully-immersive content streaming Metaverse
environment, that enables users to go seamlessly from one experience to another. To make the vision of Metaverse a reality, we
expect significant investment in a confluence of technology. Low latency — the time it takes a data signal to travel from one point
on the internet to another point and then come back — is critical to building a more realistic user experience.
MONEY IN THE METAVERSE:
We expect the next generation of the internet, i.e., the Metaverse, would encapsulate a range of form factors of money, including
the existing/traditional forms of money and also upcoming/digitally-native forms — cryptocurrency, stablecoins, central bank
digital currencies (CBDCs) — that were out of scope in a pre-blockchain virtual world
This document is the copyright of its respective holders. It is freely available on the Internet to anyone who searches for it independently. It is provided here under the "Fair Use Doctrine of U.S. Copyright Law."
Vitalik Buterin | Crypto Cities
In this whitepper, Vitalik outlines some concepts for how a new model of a city running on a blockchain, empoowers the community to essentially self-govern.
Everyone can vote on the blockchain, decisions are made as a collective on where to put monies to use, and work outside the community not unlike a centralized DAO (an oxymoron, but it is the proposal herein).
This is not science fiction. Thanks to a new law in Wisconsin that permits these types of collectives, or communities, etc., the radical (?), or rather, evolutionary step for this has already begun; and not just in Wisconsin, but Miami is a key location as well.
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesAlejandro Franceschi
This document explains how the technology involved, can semantically replace moving objects, humans, and other such visual input, and transform it, using mixed reality, into whatever the viewer would prefer the real world looks like instead.
From videogames to movies, to education, healthcare, commerce, communications, and industrial solutions, this will radically change the way we interact with the world, with others, and ourselves.
MaterialX has been integrated into MayaUSD and ArnoldUSD in the following ways:
- MaterialX definitions can be imported and exported from MayaUSD to enable interoperability via UsdShade and the MaterialX Preview Surface.
- The standard MaterialX shader generator is used to visualize MaterialX materials directly in Maya's viewport without translation, leveraging the same GLSL code generator as hdStorm.
- A plugin architecture allows custom MaterialX to Preview Surface translations to be defined and used for import and export between DCC tools and renderers.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
Intel, Intelligent Systems Lab: Syable View Synthesis WhitepaperAlejandro Franceschi
Intel, Intelligent Systems Lab:
Stable View Synthesis Whitepaper
We present Stable View Synthesis (SVS). Given a set
of source images depicting a scene from freely distributed
viewpoints, SVS synthesizes new views of the scene. The
method operates on a geometric scaffold computed via
structure-from-motion and multi-view stereo. Each point
on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of
this point in the input images.
The core of SVS is view dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view.
The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection.
Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse realworld datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes.
PWC: Why we believe VR/AR will boost global GDP by $1.5 trillionAlejandro Franceschi
VR and AR have the potential to boost the global economy by $1.5 trillion by 2030. The document discusses how these technologies can transform business in areas like product development, training, process improvements, healthcare and retail. It provides examples of how different industries like automotive, energy, and healthcare are already using VR and AR to improve efficiency and customer experiences. The technologies are maturing and will be further enabled by 5G networks, with AR expected to have a bigger economic impact than VR through 2030. For businesses to fully realize the value, they need to address cultural concerns and privacy/security issues around the technologies.
This document provides an overview of 10 UK immersive technology companies that have raised significant funding. It discusses Anything World, a platform for creating 3D worlds using voice controls and AI; Bodyswaps, a VR platform for soft skills training that has worked with organizations like Save the Children; and Anthropic, an AI safety startup that has raised $50 million and works on techniques like constitutional AI. The document highlights the companies' technologies, customers, fundraising journeys, and growth opportunities in areas like enterprise training as immersive tech adoption increases.
1. The document presents an approach to enhance the realism of synthetic images rendered by game engines. A convolutional network is trained to modify rendered images using intermediate representations from the rendering process.
2. The network is trained with an adversarial objective to provide strong supervision at multiple perceptual levels. A new strategy is proposed for sampling image patches during training to address differences in scene layout distributions between datasets.
3. The approach significantly enhances photorealism over recent image-to-image translation methods and baselines, as shown in controlled experiments. It can add realistic details like gloss, vegetation, and road textures while keeping enhancements consistent with the input image content.
Stanford University: Artificial Intelligence Index Report, 2021
The AI Index is an effort to track, collate, distill and visualize data relating to artificial intelligence. It aspires to be a comprehensive resource of data and analysis for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI.
Source: https://aiindex.stanford.edu/wp-content/uploads/2021/03/2021-AI-Index-Report_Master.pdf
HAI Industry Brief: AI & the Future of Work Post Covid
Stanford University, Human-Centered Artificial Intelligence:
Researchers studying how AI can be used to help teams collaborate, improve workplace culture, promote employee well-being, assist humans in dangerous environments, and more.
Source: https://aiindex.stanford.edu/wp-content/uploads/2021/03/2021-AI-Index-Report_Master.pdf
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
2. 266:2 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
seated VR experiences. We further improve the visual fidelity of
the VR viewing experience by automatically and robustly recon-
structing a scene-adaptive proxy geometry that reduces vertical
distortions during image-based view synthesis. We demonstrate the
robustness and quality of our OmniPhotos approach on dozens of
360° VR photographs captured in seven countries across Europe
and Asia. We further perform extensive ablation studies as well as
quantitative and qualitative comparisons to the state of the art.
2 RELATED WORK
Panoramas. The most common type of VR photography today is
360° panoramas stitched from multiple input views [Szeliski 2006].
However, panoramas generally appear flat due their lack of depth
cues like binocular disparity. This limitation is addressed by omnidi-
rectional stereo techniques [Peleg et al. 2001; Richardt 2020], which
create stereo panoramas from a camera moving on a circular path
[Baker et al. 2020; Richardt et al. 2013], a rotating camera rig for
live video streaming [Konrad et al. 2017], or per-frame from two
360° cameras [Matzen et al. 2017]. The extension of these techniques
to videos using multi-camera rigs [Anderson et al. 2016; Schroers
et al. 2018] is currently the standard format for 360° stereo videos.
While these approaches provide stereo views with binocular dis-
parity, most do not support motion parallax directly – the change
in view as the viewpoint is moved, which is an important depth
cue for human visual perception [Howard and Rogers 2008] and
crucial for feeling immersed in VR [Slater et al. 1994]. Schroers et al.
[2018] first demonstrated parallax interpolation for professionally
captured omnistereoscopic video with a 16-camera rig.
Panoramas with motion parallax. Panoramas can be augmented
by interactively sculpting geometry for projecting the panorama
on [Sayyad et al. 2017]. Similarly, stereo panoramas can be aug-
mented by estimating depth [Bertel et al. 2020; Thatte et al. 2016]
and segmenting the panorama into multiple depth layers [Serrano
et al. 2019; Zhang et al. 2020; Zheng et al. 2007], which enables free-
viewpoint rendering of novel views with motion parallax. The input
images can also be used directly for image-based rendering of novel
views [Bertel et al. 2019; Chaurasia et al. 2013; Hedman et al. 2016;
Lipski et al. 2014]. These approaches are limited to head motion in
the plane of the circular camera trajectory, but using a robot arm
[Luo et al. 2018], a camera gantry [Overbeck et al. 2018], or a spher-
ical 16-camera rig [Parra Pozo et al. 2019], one can capture viewing
directions over the surface of a sphere, which enables 6-degree-of-
freedom (6-DoF) view synthesis using panoramic light fields. These
state-of-the-art capture methods are, however, restricted to profes-
sional usage and not accessible or affordable for casual consumers
interested in practising 360° VR photography. Huang et al. [2017]
present an approach for mesh-based warping of 360° video accord-
ing to sparse scene geometry, but the visual fidelity is limited due
to warping artefacts.
3D reconstruction. Capturing the shape and appearance of objects
or scenes by means of 3D photography has been an active topic of
research for more than 20 years [Curless et al. 2000]; we refer to
Richardt et al. [2020] for an extensive review of the state of the art.
Recent advances exploit the ubiquity of phone cameras for casual
3D photography [Hedman et al. 2017], and use depth maps obtained
from built-in stereo cameras [Hedman and Kopf 2018; Kopf et al.
2019], multi-view stereo [Holynski and Kopf 2018], temporal stereo
[Valentin et al. 2018], or monocular depth estimation [Shih et al.
2020] to reconstruct the scene geometry; similar approaches are
also used to estimate depth maps from 360° images [da Silveira
and Jung 2019; Im et al. 2016; Wang et al. 2020; Zioulis et al. 2019].
Most approaches produce a textured mesh as output, which can be
rendered efficiently even on mobile devices, and supports motion
parallax natively. For 360° VR photography, Hedman et al. [2017]
use fisheye input images, which are stitched into a multilayer, tex-
tured panoramic mesh that can easily be rendered from novel views.
Hedman and Kopf [2018] produce a similar output from narrow
field-of-view RGBD images that are captured with minimal displace-
ment to facilitate their registration into an RGBD panorama. Their
360° panoramic captures take around 100–200 seconds, ten times
slower than our approach. Parra Pozo et al. [2019] estimate per-
view depth maps using a variant of coarse-to-fine PatchMatch with
temporal bilateral and median filtering. All views are rendered as
a separate textured meshes and fused together using a weighting
scheme. This pipeline is optimised for 6-DoF video and real-time
playback. However, accurate 3D reconstruction of unconstrained en-
vironments remains challenging, particularly in uniformly coloured
regions like the sky, or for highly detailed geometry such as trees.
We employ image-based rendering to address these limitations and
optimise for the visual fidelity of results without relying on accurate
3D reconstructions, which are hard to obtain for general scenes.
Learned view synthesis. Deep learning is starting to replace parts
of the view synthesis pipeline or even the entire pipeline. Hedman
et al. [2018] learn blending weights for view-dependent texture map-
ping to reduce artefacts in poorly reconstructed regions. Recently,
multiplane images [Zhou et al. 2018] have set a new bar in terms of
the visual quality of synthesised views from just one to four input
views [Flynn et al. 2019; Mildenhall et al. 2019; Srinivasan et al.
2019; Tucker and Snavely 2020]. Concurrent work generalises this
approach to multi-sphere images for rendering novel views from a
360° stereo video [Attal et al. 2020] or 46 input videos [Broxton et al.
2020], respectively. Other approaches use point clouds [Meshry et al.
2019] with deep features [Aliev et al. 2020; Wiles et al. 2020], voxel
grids [Nguyen-Phuoc et al. 2019; Sitzmann et al. 2019a] or implicit
functions [Mildenhall et al. 2020; Sitzmann et al. 2019b] to learn
view synthesis; we refer to Tewari et al. [2020] for a recent survey
on neural rendering. The main limitation of these approaches is
that they do not meet the performance requirements of current VR
headsets (2 views × 2 megapixels × 80 Hz = 320 MP/s), with some
techniques being four orders of magnitude too slow (e.g. NeRF:
1008×756/30 𝑠 =0.025 MP/s). Using shaders for view-dependent tex-
ture mapping with flow-based blending, our approach consistently
exceeds the required performance on an off-the-shelf laptop for a
seamless, high-quality VR experience.
3 OMNIPHOTO PIPELINE
Our goal is to enable casual 360° VR photography of mostly static
environments that is fast (less than 10 seconds), easy and robust. Our
approach follows the general structure of the VR capture pipeline
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
4. 266:4 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
camera motion while keeping vertical lines upright, presumably us-
ing IMU data recorded by the camera. This stabilisation significantly
reduces the average motion magnitude between video frames, which
is beneficial for tracking and optical flow estimation, as argued by
Schroers et al. [2018].
3.2.2 Camera reconstruction. We estimate camera poses for each
frame of the stitched 360° video, and reconstruct a sparse 3D point
cloud of the scene using OpenVSLAM [Sumikura et al. 2019], an
open-source visual SLAM approach that natively supports equirect-
angular 360° video. Features are tracked in an omnidirectional fash-
ion, which helps overcome reconstruction challenges related to
small-baseline normal field-of-view inside-out video inputs [Bertel
et al. 2019; Hedman et al. 2017]. We perform the camera reconstruc-
tion in two passes: we first track the complete video to obtain a
globally consistent 3D point cloud, and then localise all video frames
with respect to the global 3D point cloud in a second pass, to obtain
a globally consistent reconstruction of camera poses (see Figure 2).
3.2.3 Loop selection. We manually select a looping sub-clip of the
video that jointly optimises the following criteria: (1) smooth camera
motion over time to avoid artefacts caused by jerky motion; (2) as-
continuous-as-possible looping, i.e. smooth camera motion across
the cut, to prevent a visible seam in the result; and (3) if a seam
is unavoidable, it should be as hidden as possible to minimise its
impact, e.g. in a less interesting direction of the scene (far away
or uniform textures), not ‘cutting’ through people. The first two
criteria could be optimised automatically, but we found that the
last criterion still requires manual input, so we perform the loop
selection manually. Finally, we scale the global coordinate system
such that the radius of the camera circle matches the measured or
estimated real-world dimensions, and centre the circle at the origin
without loss of generality.
3.2.4 Frame sampling. We observed that videos captured at 50 Hz
with the rotating selfie stick produce loops of 84±14 frames (aver-
aged over 38 videos). However, our handheld videos produce loops
of 300–500 frames, depending on frame rate, as the photographer is
rotating moderately slowly (~10 s per loop). To reduce space require-
ments and computation time in these cases, we select a subset of
around 90 frames with approximately uniform angular spacing. We
evaluate the impact of further downsampling to 45, 30 or 15 frames
in Table 1.
3.2.5 Optical flow. Our view synthesis approach in Section 3.3
relies on optical flow between pairs of neighbouring images. We
precompute optical flow fields using FlowNet2 [Ilg et al. 2017] and
DIS flow [Kroeger et al. 2016] directly on the stitched equirectangu-
lar images. Note that these methods were designed for perspective
images. They work well on the pseudo-perspective equatorial re-
gion of equirectangular images, but degrade near the poles due to
the severe distortions. To ensure consistent optical flow across the
azimuth wrap-around, we repeat a vertical strip of the image just be-
yond the left and right edges of the equirectangular projection, and
crop the computed flow fields back to the original size. In practice,
we find that flow fields at half the image resolution are sufficient
for high-quality view synthesis at run time using view-dependent
flow-based blending [Bertel et al. 2019]. Our approach is agnostic to
the specific optical flow technique that is used, and thus automati-
cally benefits from future improvements in optical flow computation
techniques.
3.2.6 Proxy fitting. We compute a scene-adaptive proxy geometry
by fitting a deformable spherical mesh to the reconstructed 3D world
points in Section 4. This approach is inspired by Lee et al.’s Rich360
video stitching method [2016], which demonstrated improved align-
ment and blending of input videos. Our proxy fitting technique
is specifically tailored for our casually captured OmniPhotos, and
robustly produces scene-adaptive proxy geometry that more ac-
curately represents the geometry of the captured scene than the
simple planar or cylindrical proxy used before [Bertel et al. 2019;
Richardt et al. 2013]. This step noticeably reduces visual distortions,
as shown in our results.
3.3 Rendering 360° VR photographs
Our 360° VR photography viewer generates new viewpoints in real
time given the location and orientation of the user’s headset. Our
rendering approach is based on the MegaParallax image-based ren-
dering method [Bertel et al. 2019], which we extended to equirectan-
gular images (see Figure 3a). Each desired new view ID is rendered
by first rasterizing the proxy geometry, yielding scene points X, and
then computing the colour of each pixel xD independently and in
parallel. Specifically, we use the direction of each pixel’s camera ray
rD in the desired output view to find the optimal input camera pair
to colour the pixel, and then project the proxy 3D point X into both
cameras using equirectangular projection giving image projections
xL and xR for the left and right view, respectively. Finally, we apply
MegaParallax’s view-dependent flow-based blending (see Figure 3b)
using the optical flow fields, ˆFLR and ˆFRL, while explicitly handling
the azimuth wrap-around in the flow-based blending computations.
We implement our VR photography viewer using OpenVR, which
at the time of writing supported a variety of consumer headsets
based on SteamVR, Oculus and Windows Mixed Reality VR, with
the same code base. We render stereoscopic views using the eye
transformation matrices provided by OpenVR, which encode the
camera poses for the left- and right-eye cameras.
(a) (b)
proxy
…
…
Fig. 3. Illustration of our rendering approach using equirectangular input
images (shown in blue and orange). (a) Each pixel xD of the desired image
(in green) is computed using a view-dependent blending of two reprojected
pixel coordinates (small coloured circles) in the nearest two viewpoints. (b)
We compute flow-adjusted pixel coordinates using equirectangular optical
flow (small coloured squares), similar to MegaParallax [Bertel et al. 2019].
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
5. OmniPhotos: Casual 360° VR Photography • 266:5
4 SCENE-ADAPTIVE DEFORMABLE PROXY FITTING
We represent the sphere mesh S = (𝑉, 𝐹) in terms of vertices 𝑉 and
triangle faces 𝐹. Given the spherical nature of the mesh, vertices
are naturally defined in spherical coordinates (𝜃, 𝜑, 𝑟). We initialise
the vertices 𝑉 in a regular grid configuration of size 𝑚 × 𝑛, i.e.
𝑉 = {v𝑖 } 𝑚×𝑛
𝑖=1 , with uniform spacing along the azimuth and polar
angles, and regularly tessellated triangle faces 𝐹. In the following, we
formulate an energy minimisation that deforms this sphere mesh by
adjusting the vertex radii, while keeping their angular coordinates
and their triangle connectivities fixed to ensure the problem is well-
conditioned and edges are not collapsing. Lee et al. [2016] found that
optimising vertex radii directly may lead to unstable results with
negative or very large values, which they address using additional
1D partial derivative terms. Instead, we parametrise our optimisation
in terms of inverse depth, 𝑑(p) = 1/∥p∥, which helps regularise the
scale of variables in the optimisation [Im et al. 2016], particularly
for far-away points [Civera et al. 2008].
Our energy formulation consists of four terms:
argmin
𝑉
𝐸data(𝑃, 𝑉 ) + 𝐸smooth(𝑉 ) + 𝐸pole(𝑉 ) + 𝐸prior(𝑉 ), (1)
where 𝑃 is the set of reconstructed 3D world points, and 𝑉 the
vertices of the sphere mesh.
Data term. We would like to deform the sphere mesh to optimally
approximate the set 𝑃 of 3D points, which means minimising the
distance between points and triangles. By construction, as the mesh
is centred at the origin, the ray from the origin through any point p
intersects one or more triangles4, which can be identified based on
the spherical coordinates of the point p and the grid of vertices 𝑉 .
Let’s denote the intersected triangle
using 𝑓 (p) = {v 𝑎, v 𝑏, v𝑐 } and the in-
tersection point as ˆp, expressed in
barycentric coordinates with respect
to the triangle vertices, so we can min-
imise the distance between all points
p and their triangle intersections ˆp:
𝐸data(𝑃, 𝑉 ) =
𝜆data
|𝑃|
p∈𝑃
𝜌 𝑑(p) − 𝑑
ˆp
v∈𝑓 (p)
𝑏(p, v)v
2
, (2)
where 𝑏(p, v) is the barycentric coordinate of p with respect to
the vertex v ∈ 𝑓 (p), computed in terms of the spherical angles
(𝜃, 𝜑), such that ˆp = v∈𝑓 (p) 𝑏(p, v)v, and 𝜆data is the weight of
the data term. In addition, we introduce a robust loss function 𝜌(𝑥)
to make the optimisation more robust to outlier 3D points, which
are unavoidable in current SLAM techniques. Specifically, we use a
scaled Huber loss (with scale factor 𝜎):
𝜌(𝑥) =
𝑥 𝑥 ⩽ 𝜎2
2𝜎
√
𝑥 − 𝜎2 𝑥 > 𝜎2
(3)
4If the ray intersects an edge or a vertex, we can pick any adjacent triangle, as the
resulting energy formulation is practically identical: one or two vertices will have
barycentric coordinates of zero and thus not contribute to the energy.
Smoothness term. We use a Laplacian smoothness term to encour-
age smoothly varying radii within the mesh:
𝐸smooth(𝑉 ) =
𝜆smooth
|𝑉 |
v∈𝑉
𝑑(v) −
w∈𝑁 (v)
𝑑(w)
|𝑁 (v)|
2
, (4)
where 𝑁 (v) denotes the set of vertices neighbouring v: (1) non-
polar vertices have four neighbours, along their azimuth/polar angle
isocontours, and (2) polar vertices have two non-polar neighbours,
on opposite sides of the sphere (same elevation, with Δazimuth = 𝜋).
This results in 2D Laplacian losses everywhere outside the poles,
and 1D Laplacian losses across both poles.
Pole term. In our sphere mesh representation, we have multiple
vertices at the pole (the first and last ‘row’ of vertices correspond to
the North and South pole, respectively). We constrain a pole vertex
v and its right neighbour v to be close to each other using
𝐸pole(𝑉 ) =
𝜆smooth
|𝑉 |
v∈𝑉poles
∥𝑑(v) − 𝑑(v)∥2
. (5)
Prior term. To handle large regions of the mesh without any 3D
points, we add a weak prior term that attracts each vertex towards
the mean inverse depth 𝑑prior of all points 𝑃:
𝐸prior(𝑉 ) =
𝜆prior
|𝑉 |
v∈𝑉
𝑑(v) − 𝑑prior
2
. (6)
Implementation. In practice, we replace each residual ∥𝑎 − 𝑏∥ in
Equations 2 and 4 to 6 with a normalised residual
𝑎 − 𝑏
𝑎 + 𝑏
(7)
that cancels out any global scale factor, as
(𝑘𝑎)−(𝑘𝑏)
(𝑘𝑎)+(𝑘𝑏) = 𝑎−𝑏
𝑎+𝑏 . This en-
sures that the same globally optimal solution is found regardless of
different scale factors due to varying units of length. We implement
this optimisation using the Ceres non-linear least squares solver
[Agarwal et al. 2012], and choose the sparse Cholesky solver to
exploit the sparse structure of the energy with thousands of points.
The optimisation stops when |Δcost| /cost < 10−6, or after 100 itera-
tions. For the initial solution, we set all vertices to the mean inverse
depth of all points; more sophisticated schemes like a hemisphere
with a ground plane are possible. We evaluate a range of parameter
values in Figure 9 and Table 1, and use the following parameter
values for all our results: 𝑚 = 160, 𝑛 = 80, 𝜆data = 1, 𝜎 = 0.1,
𝜆smooth = 100, 𝜆prior = 0.001.
5 RESULTS AND EVALUATION
Figure 4 shows 30 OmniPhotos we captured and processed using
our approach. Three of these were taken handheld (Cathedral,
Shrines 1+2), with the majority (90%) captured using our rotating
selfie stick with an average loop length of 1.2–1.8 seconds. The selfie
stick is telescopic, which allows for capture radii between 33 and
100 cm, with about 63% at 55 cm and 27% at 78 cm.
In this section, we show qualitative results and comparisons,
perform quantitative evaluation and ablation studies, and finally
discuss the computational performance of our approach. Our results
are best appreciated and evaluated in motion, which gives a better
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
6. 266:6 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
Alley (89 images) Ballintoy (94 images) Beihai Park (80 images) Cathedral (84 images) Circus (113 images) Circus Trees (94 images)
Coast (84 images) Crescent (85 images) Dark Hedges (116 images) Field (80 images) Green (82 images) Hillside (95 images)
Hilltop (126 images) Jingqingzhai (87 images) Krämerbrücke (57 images) Mura del Prato (98 images) Nunobiki 1 (72 images) Nunobiki 2 (81 images)
Parade Gardens (88 images) Secret Garden 1 (77 images) Secret Garden 2 (95 images) Ship (71 images) Shrines 1 (91 images) Shrines 2 (118 images)
Sqare 1 (74 images) Sqare 2 (73 images) Temple 1 (90 images) Temple 2 (86 images) Temple 3 (72 images) Wulongting (96 images)
Fig. 4. Datasets shown in our paper and supplemental material. Slightly cropped for visualisation.
impression of the visual experience. To this end, we include some
animated figures in our paper that can be viewed using Adobe Reader.
We further include extensive visual results and comparisons in our
supplemental material and video.
5.1 Comparative evaluation
The approaches closest to ours, Bertel et al.’s MegaParallax [2019]
and Luo et al.’s Parallax360 [2018], also use image-based rendering
with flow-based blending to synthesise novel views in real time.
However, they rely on basic proxy geometry, which causes vertical
distortion artefacts in nearby regions, as illustrated in Figure 5.
Our scene-adaptive deformable proxy geometry deforms to fit the
scene more closely, which greatly reduces these vertical distortion
artefacts, as visible in Figure 6 and our supplemental material.
We next compare to Casual 3D Photography [Hedman et al. 2017].
Their 360° 3D photos were reconstructed from around 50 fisheye
DSLR photos, which take about one minute to capture, an order
of magnitude slower than our approach. Their 3D reconstruction
approach works well for textured scenes, but fails for fine geometry
like tree branches, or uniformly coloured regions like the sky, for
which accurate depth estimation and 3D reconstruction remain open
problems. As their implementation is not available but their datasets
are, we process one of their two camera circles (about 25 images)
with our approach. To adapt their fisheye images to our approach, we
first undistort them to equirectangular images and then stabilise the
views by rotating them inversely to the camera orientations. Figure 7
shows that our image-based rendering approach does not require a
highly accurate 3D reconstruction for convincing view synthesis
from the same input. Monocular 3D photography approaches [Kopf
et al. 2019; Shih et al. 2020] also tend to fail for complex geometry,
as shown in Figure 8. Our OmniPhotos achieve better visual results
Coarse
proxy
(a) coarse proxy geometry
(b) scene-adaptive proxy geometry Adaptive
proxy
Fig. 5. Coarse proxy geometry (a) introduces vertical distortion as the input
cameras are closer to the object than the viewing location (the eye). The
red face, as seen by the camera, appears vertically stretched (blue face)
when rendered using the coarse proxy geometry for a viewpoint behind the
camera. (b) Our scene-adaptive proxy geometry deforms to fit the scene
better, which strongly reduces vertical distortion.
thanks to multi-view input and the combination of scene-adaptive
proxy geometry and flow-based blending for aligning texture details.
Our next comparison is to Serrano et al.’s approach for adding
motion parallax to 360° videos captured with a static camera [2019].
As their approach takes as input a 360° RGBD video, we render an
equirectangular image and depth map from Hedman et al.’s datasets
using Blender and repeat this 360° RGBD frame to create a (static)
360° RGBD video. The resulting static scene does not play to their
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
7. OmniPhotos: Casual 360° VR Photography • 266:7
Parallax360 [Luo et al. 2018] MegaParallax [Bertel et al. 2019] Our approach
Fig. 6. Comparison of image-based 360° VR photography techniques for a virtual camera moving on a circular path. Our result reduces vertical distortion
visibly, as can be seen in the table benches in the top row. This is an animated figure, please view with Adobe Reader if it does not play. Parallax360 [Luo
et al. 2018] interpolates views on the capture circle, but not inside of it for the virtual camera path. MegaParallax [Bertel et al. 2019] generates views that
suffer from vertical distortion, which distorts motion parallax. Our results show clear improvements in the quality of view synthesis and motion parallax.
Casual 3D Photography [Hedman et al. 2017] 360° Motion Parallax [Serrano et al. 2019] Our approach
Fig. 7. Comparison to Hedman et al.’s Casual 3D Photography [2017] and Serrano et al.’s Motion Parallax for 360° RGBD Video [2019] on two datasets from
Hedman et al. [2017]. 3D reconstruction works well for the highly textured Library scene (top), but struggles with the thin tree branches and distant clouds in
the BoatShed scene (bottom). Green regions are holes in the textured mesh. For Serrano et al.’s approach, we use colour and depth from Hedman et al.’s
results, which works well for foreground objects with accurate depth, but not for occluded regions that are challenging to fill from the monocular 360° input.
Our approach works well for both datasets, but shows some flow warping artefacts due to the undersampled input views (only 25 views).
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
8. 266:8 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
Table 1. Quantitative comparison of baseline methods (top) and ablated versions of our approach (bottom). Numbers are mean±standard error; ‘▲’ means
higher is better, ‘▼’ means lower is better. ‘GT’ indicates ground truth, and ‘*’ a modified proxy geometry. Please see Section 5.2 for a detailed description.
Baseline/Ablation Model Images Proxy LPIPS▼ SSIM▲ PSNR▲
MegaParallax [Bertel et al. 2019] 90 cylinder 0.169±0.002 0.750±0.003 21.83±0.12
MegaParallax [Bertel et al. 2019] 90 plane 0.181±0.002 0.737±0.003 21.45±0.12
Parallax360 [Luo et al. 2018] 90 cylinder 0.207±0.003 0.711±0.003 20.75±0.11
Our complete method 90 ours 0.059±0.001 0.867±0.002 28.02±0.09
0) Our method (ground-truth inputs) 90 GT 0.041±0.000 0.905±0.001 30.08±0.11
1) No robust data term 90 ours* 0.062±0.001 0.859±0.002 27.64±0.10
2) No normalised residuals 90 ours* 0.072±0.001 0.854±0.002 27.30±0.10
3) Optimising depth + no normalised residuals 90 ours* 0.073±0.001 0.853±0.002 27.28±0.10
4) Optimising depth (not inverse) 90 ours* 0.059±0.001 0.867±0.002 28.01±0.10
5) DIS flow [Kroeger et al. 2016] 90 ours 0.060±0.001 0.865±0.002 27.98±0.09
6) No flow (linear blending) 90 ours 0.059±0.001 0.868±0.002 28.03±0.09
7a) Low-resolution proxy (𝑚=80, 𝑛=40) 90 ours* 0.067±0.001 0.843±0.002 27.07±0.09
7b) High-resolution proxy (𝑚=240, 𝑛=120) 90 ours* 0.064±0.001 0.867±0.002 27.78±0.10
8a) Less smooth (𝜆smooth =10) 90 ours* 0.068±0.001 0.866±0.002 27.70±0.10
8b) More smooth (𝜆smooth =1000) 90 ours* 0.064±0.001 0.849±0.002 27.31±0.09
9a) Fewer images (1 view per 8°) 45 ours 0.061±0.001 0.864±0.002 27.96±0.09
9b) Fewer images (1 view per 12°) 30 ours 0.063±0.001 0.862±0.002 27.90±0.09
9c) Fewer images (1 view per 24°) 15 ours 0.071±0.001 0.855±0.002 27.44±0.09
method’s strength of propagating background information behind
dynamic objects. Please see Figure 7 and our supplemental video.
5.2 Quantitative evaluation
We quantitatively evaluate and compare our OmniPhotos approach
to the most closely-related baseline methods [Bertel et al. 2019; Luo
et al. 2018], and validate our design choices and parameters using
an extensive ablation study in Table 1. We perform this evaluation
in the spirit of virtual rephotography [Waechter et al. 2017] on a
synthetic test set of five scenes (Apartment0, Hotel0, Office0,
Room0, Room1) from the Replica dataset [Straub et al. 2019]. Specif-
ically, we render synthetic equirectangular images on a camera
circle with a radius of 0.5 m as input for the various methods, and
we evaluate cubemap views generated by each baseline/ablation
at 69 locations inside the capture circle, on a 10 cm Cartesian grid.
We do not evaluate the up/down views to focus our evaluation on
the region near the equator, where viewers tend to fixate when
exploring panoramas [Sitzmann, Serrano et al. 2018]. For each lo-
cation, we render 512×512 cube maps, and compare the generated
view to the ground truth using structural similarity index (SSIM;
Wang et al., 2004), peak signal-to-noise ratio (PSNR), and the LPIPS
perceptual similarity measure [Zhang et al. 2018]. We report the
maximum value within a shiftable window of ±1 pixel. Note that
this evaluation uses indoor spaces whereas our real OmniPhotos
were all captured outdoors (Figure 4).
Our OmniPhotos quantitatively outperform MegaParallax and
Parallax360 by a large margin, in addition to the clear qualitative
improvement visible in Figure 6 and our supplemental material. We
next evaluate our method on ground-truth camera poses and proxy
geometry (0) to test the upper limit of our approach. In the next
OurapproachShihetal.[2020]
Fig. 8. Current 3D photography approaches, such as Shih et al.’s, struggle
with complex scenes like the pillars (left), as well as fine geometry, like leaves
(centre) or a rope (right). Our approach succeeds due to our image-based
rendering approach. Please see the animated figure for full effect.
rows, we replace our robust data term with a plain L2 loss (1), re-
move our normalised residuals (2), and use depth instead of inverse
depth (4), each of which reduces performance. Using depth instead
of inverse depth (3), DIS flow (5) or no flow (6), achieves comparable
performance to our approach. Row 3 shows that depth and inverse
depth perform similarly when using normalised residuals. This sug-
gests that using inverse depth and using normalised residuals are
complimentary techniques for regularising the scale of variables
during the optimisation. The normalised residuals have the addi-
tional benefit that one set of parameter values works for both depth
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
9. OmniPhotos: Casual 360° VR Photography • 266:9
0 10 20 30 40 50
10
20
30
40
50
60
70
% outliers added
RMSE[cm]
L2/depth/sres. L2/depth/nres. Huber/disparity/sres. ours (Huber/disparity/nres.)
100 101 102 103
10
20
30
40
50
60
70
smoothness weight 𝜆smooth
10−4 10−3 10−2 10−1 100
10
20
30
40
50
60
70
scale factor 𝜎 of robust data loss
Fig. 9. Evaluation of robustness and parameter choices for different versions of our scene-adaptive deformable proxy fitting on five ground-truth scenes
(Apartment0, Hotel0, Office0, Room0, Room1) from Replica [Straub et al. 2019]. We measure reconstruction accuracy using RMSE in cm, see Section 5.2.1 for
details. The shaded areas indicate the standard error of the mean. We compare Huber versus L2 data loss (Equation 2), optimisation in terms of depth or
inverse depth (disparity), and standard residuals (‘sres.’) versus our normalised residuals (‘nres.’, Equation 7). Left: Our proxy fitting technique (dark green
line) is the most robust to an increasing number of outlier 3D points. The arrow indicates the level of outliers we assume for the following comparisons.
Centre and right: Our chosen smoothness weight of 𝜆smooth = 100 and robust loss scale factor 𝜎 = 0.1 (indicated by arrows) are close to the global minimum
reconstruction errors, and empirically work better for outdoor scenes that have more depth complexity than the indoor rooms of Replica. The light green line
shows that standard residuals do work in practice, but the optimal value of the robust loss scale factor 𝜎 will depend on the scale of the scene.
and inverse depth, despite their scale differences. Changing the reso-
lution (7) or smoothness (8) of the proxy geometry results in a drop
in performance. Reducing the number of input views (9) steadily
reduces performance, with 45 input images almost matching the
performance of 90 input views.
5.2.1 Proxy accuracy. In addition to the visual quality of generated
views, we also evaluate the accuracy of our deformable proxy fitting
in Figure 9. This experiment evaluates the robustness and parameter
choices for different versions of our scene-adaptive deformable
proxy fitting on five ground-truth scenes from the Replica dataset
[Straub et al. 2019]. We render 1920×960 synthetic equirectangular
depth maps and downsample them using area averaging to 80×40 =
3200 3D points, to approximately match the number of 3D points we
usually obtain from OpenVSLAM [Sumikura et al. 2019]. To simulate
typical SLAM noise and outliers, we add ±2 cm uniform noise to all
3D point locations, and add 25%=800 outlier points sampled from a
10-metre cube centred on the scene. We measure the reconstruction
quality of the proxy geometry using RMSE per vertex of the spherical
depth map, in cm, averaged over 10 runs for each of the five scenes.
Figure 9 shows that our proposed approach, with robust Huber data
loss on inverse depth and normalised residuals, performs best with
increasing number of outliers. Our default parameter values, which
we use for all our OmniPhotos, can also be seen to produce results
close to the global minimum, in terms of reconstruction error, within
the explored design space. We also observed that the quality of the
proxy geometry increases with the number of (inlier) scene points
that can be used to guide the deformation process, for example using
sparse COLMAP reconstructions [Schönberger and Frahm 2016] or
dense multi-view stereo reconstructions [Parra Pozo et al. 2019].
5.3 Performance
Freshly captured OmniPhotos can be processed in about 30–40
minutes on a standard computer (3 GHz 8-core CPU, 16 GB RAM,
NVIDIA GeForce RTX 2060). For a typical 9-second 360° video with
3840×1920 at 50 Hz (450 frames total, 90 frame loop), these are the
major preprocessing steps:
• Stabilised 360° video stitching with CUDA: ~12 seconds
• Two-pass OpenVSLAM reconstruction: ~3 minutes
• Blender visualisation import: ~15 minutes
• Manual loop selection: ~5 minutes
• Reading images & other IO: ~20 seconds
• Scene-adaptive proxy fitting: ~10 seconds
• FlowNet2 / DIS flow: ~10 minutes / ~20 seconds
Importantly, the reconstruction with OpenVSLAM is about two
orders of magnitude faster than with COLMAP. The unoptimised
size of preprocessed OmniPhotos is dominated by the precomputed
optical flow fields (14 MB/frame), followed by the input images
(~2 MB/frame) and the proxy geometry (0.8 MB). For a typical dataset
with 90 frames, this sums up to about 1.4 GB all-in. Our viewer loads
such a dataset from SSD into GPU memory in about 20 seconds.
Rendering of 1920×1080 views consistently takes less than 4.16 ms
(240 Hz), and VR rendering is performed at the 80 Hz display rate of
an Oculus Rift S HMD, for a smooth and immersive VR experience.
6 DISCUSSION
Applications. OmniPhotos are a great new way to reliably cap-
ture immersive 3D environments for casual to ambitious consumers
as well as professional users. OmniPhotos can capture personal
memories, for example on holidays, or group photos on family occa-
sions. It would be interesting to see how people could create stories
by concatenating multiple OmniPhotos. In terms of professional
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
10. 266:10 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
applications, OmniPhotos are ideal for virtual tourism, which lets
people explore far-away places from the comfort of their own home.
OmniPhotos would also be useful for real estate scenarios to capture
outdoor spaces or individual rooms.
Resolution vs frame rate. As discussed in Section 3.1, we captured
input videos with different resolutions and frame rates to evaluate
the trade-off between spatial resolution and the number of images
per camera circle. We were originally aiming to capture more than
100 views per camera circle, but our new scene-adaptive proxy
geometry has significantly reduced the number of required input
views from 200–400 [Bertel et al. 2019] to 50–100 for our approach
(see Table 1, row 9). Visually, the 5.7K videos produce the highest-
fidelity VR photos, even when downsampled to 4K. The native 4K
resolution tends to be slightly blurry, as it is the result of stitching
two 2K×2K fisheye images into a 4K×2K equirectangular image.
Finally, the 3K videos look noticeably blurry in the final result.
Viewing area analysis. Our rendering approach is modelled after
MegaParallax [Bertel et al. 2019] and we can therefore benefit from
their theoretical analysis of the supported viewing area (aka head
box). They showed that the horizontal translation 𝑥 is limited to
𝑥 < 𝑟 sin
𝛾
2 for a given camera circle radius 𝑟 and camera field of
view 𝛾. The field of view of our cameras is effectively 𝛾 = 𝜋, as
they capture the complete outward-facing hemisphere. This yields
the radius of the camera circle as the upper limit of the viewing
space radius. Experiments verify this behaviour, our synthesis works
anywhere inside the camera circle, i.e. most of our OmniPhotos
provide a head box with 1-metre diameter (capture radius: 55 cm).
Schroers et al. [2018] also analysed the minimum visible depth
observed by two cameras in a circular configuration. Their formula
is expressed in terms of the field of view 𝛾 = 𝜋 and the angle 𝜃
between optical axes of adjacent cameras (𝜃 ≈ 2𝜋
𝑁 for 𝑁 cameras):
𝑑 = 𝑟
sin(𝜋 − 𝛾/2)
sin(𝛾/2 − 𝜃)
=
𝑟
cos 2𝜋
𝑁
. (8)
For 𝑁 = 90 cameras, like in our case, this evaluates to 0.24% of the
capture circle radius, or 1.3 mm for 𝑟 = 55 cm, which is negligible.
Compression. OmniPhotos can be compressed from 1.4 GB to a
more reasonable 0.25 GB (18%) using off-the-shelf 7-Zip. A further
0.07 GB can be saved if optical flow fields are not transmitted and
instead computed on the local machine (final size: 0.18 GB or 13%).
6.1 Limitations and future work
All approaches have limitations; we discuss the most important ones
here and use them to motivate directions for future work.
Proxy geometry. While deforming a sphere mesh to fit into the
reconstructed point cloud usually works well in practice (see Fig-
ure 6), it clearly has its limitations. Its fixed topology combined
with the enforced smoothness produces a relatively smooth proxy
geometry, which can cause warping artefacts in areas with large
depth differences. Object boundaries of nearby objects, essential for
(dis-)occlusion effects, cannot be fitted tightly enough, leading to
warping artefacts that tend to change as the viewpoint changes (see
Figure 10). These issues could potentially be overcome in different
Proxy warping artefact Flow warping artefact Stitching artefact
Fig. 10. Remaining visual artefacts in our results. Errors in the proxy ge-
ometry or optical flow may produce warping artefacts. We observed proxy
warping artefacts primarily at large depth discontinuities, while most flow
warping artefacts affect objects adjacent to a uniform region like the sky. A
stitching bug in the Insta360 Studio software causes a ‘swimming’ artefact.
ways: (1) Mesh vertices could be moved more freely, not just radi-
ally, e.g. to align to depth edges. (2) Multi-view stereo or optical
flow correspondences would provide more scene points that can
make the proxy geometry more accurate and detailed. (3) Learned
methods like monocular depth estimation [e.g. Wang et al. 2020]
or implicit scene representations [e.g. Mildenhall et al. 2019] could
be used to densify sparse reconstructions, especially in texture-less
regions. As demonstrated by the ground-truth proxy experiment in
Table 1, better proxy geometry improves visual results, as expected.
Optical flow. Even though the quantitative evaluation in Table 1
may suggest otherwise, flow-based blending helps reduce ghosting
artefacts when the scene proxy does not fit the real scene geometry
tightly. Examples for this include detailed geometry, like fences or
thin tree branches (Figure 8), or reflections, for which there is a
mismatch between the real and apparent depth. In some cases, we
observed that FlowNet2 predicted incorrect flow near strong edges,
e.g. a ship vs the blue sky (see Figure 10), which results in view
interpolation artefacts. In these cases, we fall back to DIS flow.
Stitching artefacts. We observed minor to moderate stitching arte-
facts being introduced in some videos, particularly those captured
at 3K/100 Hz. These artefacts are not limited to the overlap region
between the two fisheye lenses and appear to be caused by warping
parts of the video frame incorrectly, probably due to a software
bug.5 Since the artefacts are not consistent over time, they can
cause ‘swimming’ during rendering, as shown in Figure 10. We only
found these artefacts in the stabilised stitch, not the standard stitch.
However, we consider the benefits of the stabilised stitch (improved
camera reconstruction and flow computation) to outweigh these
usually minor artefacts in some of our OmniPhotos.
Vertical motion. Our approach provides compelling 5-degree-of-
freedom (5-DoF) view synthesis by supporting arbitrary head ro-
tations as well as translations in the plane of the capture circle
(see Figure 2). The missing DoF is vertical translation as our cap-
ture approach deliberately captures viewpoints at roughly the same
height and thus cannot plausibly synthesise new viewpoints from
a different height. In practice, this is not a problem for seated VR
experiences, where users naturally keep their heads at a consistent
5This bug in the proprietary software Insta360 Studio 3.4.2 has been fixed in v3.4.10.
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
11. OmniPhotos: Casual 360° VR Photography • 266:11
height. Capturing camera views on a sphere instead of a circle can
overcome this limitation [Luo et al. 2018; Overbeck et al. 2018].
Memory footprint. Our uncompressed OmniPhotos require more
than one GB of memory, which is manageable for a 360° VR photo
experience, but cannot be easily extended to 360° VR video. By far the
largest contributor to this memory footprint are the precomputed
optical flow fields. Reducing the number of input views can reduce
the memory footprint, and so can discarding the inward-facing
hemisphere of the input images and their flow fields. In many cases,
the proxy geometry aligns the input views sufficiently well without
optical flow. In these regions, no flow needs to be stored, which
could lead to a more compact scene-dependent flow storage format.
Editing. Our OmniPhotos are currently limited to reproducing
the scenes that were captured as is. Virtual objects, such as digital
humans, can easily be rendered on top, but the quality of occlusions
by scene geometry, such as trees or buildings, is limited by the detail
of the proxy geometry. Relighting the captured scene, adding new
objects with consistent lighting, or removing captured objects are
interesting directions for future work.
Combination of proxy and flow. For future work, we would like
to investigate the design space of camera poses, proxy geometry
and optical flow with respect to the observed visual artefacts in the
rendered results. A promising direction might be a differentiable
renderer for jointly optimising scene and camera geometry as well
as flows to maximise the quality of synthesised views.
7 CONCLUSION
We presented OmniPhotos, a new type of 360° VR photography that
enables fast, casual and robust capture of immersive real-world VR
experiences. The key to the fast capture of OmniPhotos is to rotate a
consumer 360° video camera mounted on a rotary selfie stick, which
takes less than 3 seconds per loop or 10 seconds overall, and is
currently the fastest approach for capturing immersive 360° VR pho-
tos. The visual quality of our novel view rendering is significantly
improved by the automatic reconstruction of a scene-adaptive de-
formable proxy geometry, which reduces the number of required
input views by a factor of 4 and strongly reduces vertical distor-
tion compared to the state of the art. Our approach robustly creates
OmniPhotos across a wide range of outdoor scenes, as demonstrated
in our results and supplemental material. We will publicly release
our OmniPhotos implementation in the hope of enabling casual
consumers and professional users to create and experience their
own OmniPhotos.
ACKNOWLEDGMENTS
We thank the reviewers for their thorough feedback that has helped
to improve our paper. We also thank Peter Hedman, Ana Serrano
and Brian Cabral for helpful discussions, and Benjamin Attal for his
layered mesh rendering code.
This work was supported by EU Horizon 2020 MSCA grant FIRE
(665992), the EPSRC Centre for Doctoral Training in Digital Enter-
tainment (EP/L016540/1), RCUK grant CAMERA (EP/M023281/1),
an EPSRC-UKRI Innovation Fellowship (EP/S001050/1), a Rabin Ezra
Scholarship and an NVIDIA Corporation GPU Grant.
REFERENCES
Sameer Agarwal, Keir Mierle, and Others. 2012. Ceres Solver. http://ceres-solver.org.
Kara-Ali Aliev, Artem Sevastopolsky, Maria Kolos, Dmitry Ulyanov, and Victor Lempit-
sky. 2020. Neural Point-Based Graphics. In ECCV. doi: 10.1007/978-3-030-58542-
6_42
Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely,
Carlos Hernandez, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality
Video. ACM Transactions on Graphics 35, 6 (2016), 198:1–13. doi: 10.1145/2980179.
2980257
Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and James Tompkin.
2020. MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images. In ECCV. doi: 10.1007/978-3-030-58452-8_26
Lewis Baker, Steven Mills, Stefanie Zollmann, and Jonathan Ventura. 2020. CasualStereo:
Casual Capture of Stereo Panoramas with Spherical Structure-from-Motion. In IEEE
VR. doi: 10.1109/VR46266.2020.00102
Tobias Bertel, Neill D. F. Campbell, and Christian Richardt. 2019. MegaParallax: Casual
360° Panoramas with Motion Parallax. IEEE Transactions on Visualization and
Computer Graphics 25, 5 (2019), 1828–1835. doi: 10.1109/TVCG.2019.2898799
Tobias Bertel, Moritz Mühlhausen, Moritz Kappel, Paul Maximilian Bittner, Christian
Richardt, and Marcus Magnor. 2020. Depth Augmented Omnidirectional Stereo for
6-DoF VR Photography. In IEEE VR Posters. doi: 10.1109/VRW50115.2020.00181
Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erickson, Peter Hedman, Matthew
DuVall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. 2020. Im-
mersive Light Field Video with a Layered Mesh Representation. ACM Transactions
on Graphics 39, 4 (2020), 86:1–15. doi: 10.1145/3386569.3392485
Gaurav Chaurasia, Sylvain Duchêne, Olga Sorkine-Hornung, and George Drettakis.
2013. Depth Synthesis and Local Warps for Plausible Image-based Navigation. ACM
Transactions on Graphics 32, 3 (2013), 30:1–12. doi: 10.1145/2487228.2487238
Javier Civera, Andrew J. Davison, and J. M. Martínez Montiel. 2008. Inverse Depth
Parametrization for Monocular SLAM. IEEE Transactions on Robotics 24, 5 (2008),
932–945. doi: 10.1109/TRO.2008.2003276
Brian Curless, Steve Seitz, Jean-Yves Bouguet, Paul Debevec, Marc Levoy, and Shree K.
Nayar. 2000. 3D Photography. In SIGGRAPH Courses. http://www.cs.cmu.edu/
~seitz/course/3DPhoto.html
Thiago Lopes Trugillo da Silveira and Claudio R Jung. 2019. Dense 3D Scene Recon-
struction from Multiple Spherical Images for 3-DoF+ VR Applications. In IEEE VR.
9–18. doi: 10.1109/VR.2019.8798281
John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan
Overbeck, Noah Snavely, and Richard Tucker. 2019. DeepView: View Synthesis With
Learned Gradient Descent. In CVPR. 2367–2376. doi: 10.1109/CVPR.2019.00247
Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual
3D Photography. ACM Transactions on Graphics 36, 6 (2017), 234:1–15. doi:
10.1145/3130800.3130828
Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Transactions
on Graphics 37, 4 (2018), 101:1–12. doi: 10.1145/3197517.3201384
Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and
Gabriel Brostow. 2018. Deep Blending for Free-Viewpoint Image-Based Rendering.
ACM Transactions on Graphics 37, 6 (2018), 257:1–15. doi: 10.1145/3272127.3275084
Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable
Inside-Out Image-Based Rendering. ACM Transactions on Graphics 35, 6 (2016),
231:1–11. doi: 10.1145/2980179.2982420
Aleksander Holynski and Johannes Kopf. 2018. Fast Depth Densification for Occlusion-
aware Augmented Reality. ACM Transactions on Graphics 37, 6 (2018), 194:1–11.
doi: 10.1145/3272127.3275083
Ian P. Howard and Brian J. Rogers. 2008. Seeing in Depth. Oxford University Press. doi:
10.1093/acprof:oso/9780195367607.001.0001
Jingwei Huang, Zhili Chen, Duygu Ceylan, and Hailin Jin. 2017. 6-DOF VR videos with
a single 360-camera. In IEEE VR. 37–44. doi: 10.1109/VR.2017.7892229
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and
Thomas Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep
Networks. In CVPR. doi: 10.1109/CVPR.2017.179
Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe, and
In So Kweon. 2016. All-around Depth from Small Motion with A Spherical Panoramic
Camera. In ECCV. doi: 10.1007/978-3-319-46487-9_10
Robert Konrad, Donald G. Dansereau, Aniq Masood, and Gordon Wetzstein. 2017.
SpinVR: Towards Live-Streaming 3D Virtual Reality Video. ACM Transactions on
Graphics 36, 6 (2017), 209:1–12. doi: 10.1145/3130800.3130836
Johannes Kopf, Suhib Alsisan, Francis Ge, Yangming Chong, Kevin Matzen, Ocean
Quigley, Josh Patterson, Jossie Tirado, Shu Wu, and Michael F. Cohen. 2019. Practical
3D Photography. In CVPR Workshops.
George Alex Koulieris, Kaan Akşit, Michael Stengel, Rafał K. Mantiuk, Katerina Mania,
and Christian Richardt. 2019. Near-Eye Display and Tracking Technologies for
Virtual and Augmented Reality. Computer Graphics Forum 38, 2 (2019), 493–519.
doi: 10.1111/cgf.13654
Till Kroeger, Radu Timofte, Dengxin Dai, and Luc Van Gool. 2016. Fast Optical Flow
Using Dense Inverse Search. In ECCV. 471–488. doi: 10.1007/978-3-319-46493-0_29
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.
12. 266:12 • Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt
Jungjin Lee, Bumki Kim, Kyehyun Kim, Younghui Kim, and Junyong Noh. 2016. Rich360:
Optimized Spherical Representation from Structured Panoramic Camera Arrays.
ACM Transactions on Graphics 35, 4 (2016), 63:1–11. doi: 10.1145/2897824.2925983
Christian Lipski, Felix Klose, and Marcus Magnor. 2014. Correspondence and Depth-
Image Based Rendering a Hybrid Approach for Free-Viewpoint Video. IEEE Trans-
actions on Circuits and Systems for Video Technology 24, 6 (2014), 942–951. doi:
10.1109/TCSVT.2014.2302379
Bicheng Luo, Feng Xu, Christian Richardt, and Jun-Hai Yong. 2018. Parallax360:
Stereoscopic 360° Scene Representation for Head-Motion Parallax. IEEE Trans-
actions on Visualization and Computer Graphics 24, 4 (2018), 1545–1553. doi:
10.1109/TVCG.2018.2794071
Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski.
2017. Low-cost 360 Stereo Photography and Video Capture. ACM Transactions on
Graphics 36, 4 (2017), 148:1–12. doi: 10.1145/3072959.3073645
Moustafa Meshry, Dan B Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah
Snavely, and Ricardo Martin-Brualla. 2019. Neural Rerendering in the Wild. In CVPR.
doi: 10.1109/CVPR.2019.00704
Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari,
Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local Light Field Fusion:
Practical View Synthesis with Prescriptive Sampling Guidelines. ACM Transactions
on Graphics 38, 4 (2019), 29:1–14. doi: 10.1145/3306346.3322980
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ra-
mamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields
for View Synthesis. In ECCV. doi: 10.1007/978-3-030-58452-8_24
Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang.
2019. HoloGAN: Unsupervised learning of 3D representations from natural images.
In ICCV. doi: 10.1109/ICCV.2019.00768
Ryan Styles Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul
Debevec. 2018. A System for Acquiring, Compressing, and Rendering Panoramic
Light Field Stills for Virtual Reality. ACM Transactions on Graphics 37, 6 (2018),
197:1–15. doi: 10.1145/3272127.3275031
Albert Parra Pozo, Michael Toksvig, Terry Filiba Schrager, Joyse Hsu, Uday Mathur,
Alexander Sorkine-Hornung, Rick Szeliski, and Brian Cabral. 2019. An Integrated
6DoF Video Camera and System Design. ACM Transactions on Graphics 38, 6 (2019),
216:1–16. doi: 10.1145/3355089.3356555
Shmuel Peleg, Moshe Ben-Ezra, and Yael Pritch. 2001. Omnistereo: Panoramic Stereo
Imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3 (2001),
279–290. doi: 10.1109/34.910880
Federico Perazzi, Alexander Sorkine-Hornung, Henning Zimmer, Peter Kaufmann,
Oliver Wang, Scott Watson, and Markus Gross. 2015. Panoramic Video from Un-
structured Camera Arrays. Computer Graphics Forum 34, 2 (2015), 57–68. doi:
10.1111/cgf.12541
Christian Richardt. 2020. Omnidirectional Stereo. In Computer Vision: A Reference
Guide. Springer, 1–4. doi: 10.1007/978-3-030-03243-2_808-1
Christian Richardt, Peter Hedman, Ryan S. Overbeck, Brian Cabral, Robert Konrad, and
Steve Sullivan. 2019. Capture4VR: From VR Photography to VR Video. In SIGGRAPH
Courses. 1–319. doi: 10.1145/3305366.3328028
Christian Richardt, Yael Pritch, Henning Zimmer, and Alexander Sorkine-Hornung.
2013. Megastereo: Constructing High-Resolution Stereo Panoramas. In CVPR. 1256–
1263. doi: 10.1109/CVPR.2013.166
Christian Richardt, James Tompkin, and Gordon Wetzstein. 2020. Capture, Recon-
struction, and Representation of the Visual Real World for Virtual Reality. In Real
VR – Immersive Digital Reality: How to Import the Real World into Head-Mounted
Immersive Displays. Springer, 3–32. doi: 10.1007/978-3-030-41816-8_1
Ehsan Sayyad, Pradeep Sen, and Tobias Höllerer. 2017. PanoTrace: Interactive 3D
Modeling of Surround-View Panoramic Images in Virtual Reality. In VRST. doi:
10.1145/3139131.3139158
Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revis-
ited. In CVPR. 4104–4113. doi: 10.1109/CVPR.2016.445
Christopher Schroers, Jean-Charles Bazin, and Alexander Sorkine-Hornung. 2018. An
Omnistereoscopic Video Pipeline for Capture and Display of Real-World VR. ACM
Transactions on Graphics 37, 3 (2018), 37:1–13. doi: 10.1145/3225150
Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertz-
mann, and Belen Masia. 2019. Motion parallax for 360° RGBD video. IEEE Trans-
actions on Visualization and Computer Graphics 25, 5 (2019), 1817–1827. doi:
10.1109/TVCG.2019.2898757
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography
using Context-aware Layered Depth Inpainting. In CVPR. doi: 10.1109/CVPR42600.
2020.00805
Heung-Yeung Shum and Li-Wei He. 1999. Rendering with concentric mosaics. In
SIGGRAPH. 299–306. doi: 10.1145/311535.311573
Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen
Masia, and Gordon Wetzstein. 2018. How do people explore virtual environments?
IEEE Transactions on Visualization and Computer Graphics 24, 4 (2018), 1633–1642.
doi: 10.1109/TVCG.2018.2793599
Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and
Michael Zollhöfer. 2019a. DeepVoxels: Learning Persistent 3D Feature Embeddings.
In CVPR. 2437–2446. doi: 10.1109/CVPR.2019.00254
Vincent Sitzmann, Michael Zollhöfer, and Gordon Wetzstein. 2019b. Scene Representa-
tion Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In
NeurIPS.
Mel Slater, Martin Usoh, and Anthony Steed. 1994. Depth of Presence in Virtual
Environments. Presence: Teleoperators and Virtual Environments 3, 2 (1994), 130–144.
doi: 10.1162/pres.1994.3.2.130
Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng,
and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation With
Multiplane Images. In CVPR. 175–184. doi: 10.1109/CVPR.2019.00026
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green,
Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei
Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon,
Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis
Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven
Lovegrove, and Richard Newcombe. 2019. The Replica Dataset: A Digital Replica
of Indoor Spaces. (2019). https://github.com/facebookresearch/Replica-Dataset
arXiv:1906.05797.
Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. 2019. OpenVSLAM: a Versatile
Visual SLAM Framework. In International Conference on Multimedia. doi: 10.1145/
3343031.3350539
Richard Szeliski. 2006. Image alignment and stitching: a tutorial. Foundations and
Trends in Computer Graphics and Vision 2, 1 (2006), 1–104. doi: 10.1561/0600000009
Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan
Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner,
Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt,
Maneesh Agrawala, Eli Shechtman, Dan B Goldman, and Michael Zollhöfer. 2020.
State of the Art on Neural Rendering. Computer Graphics Forum 39, 2 (2020), 701–727.
doi: 10.1111/cgf.14022
Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, and Bernd Girod. 2016. Depth
augmented stereo panorama for cinematic virtual reality with head-motion parallax.
In ICME. doi: 10.1109/ICME.2016.7552858
Richard Tucker and Noah Snavely. 2020. Single-View View Synthesis with Multiplane
Images. In CVPR. doi: 10.1109/CVPR42600.2020.00063
Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk,
Michael Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Turner, Ivan Dryanovski,
Joao Afonso, Jose Pascoal, Konstantine Tsotsos, Mira Leung, Mirko Schmidt, Onur
Guleryuz, Sameh Khamis, Vladimir Tankovitch, Sean Fanello, Shahram Izadi, and
Christoph Rhemann. 2018. Depth from Motion for Smartphone AR. ACM Transac-
tions on Graphics 37, 6 (2018), 193:1–19. doi: 10.1145/3272127.3275041
Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and
Michael Goesele. 2017. Virtual Rephotography: Novel View Prediction Error for
3D Reconstruction. ACM Transactions on Graphics 36, 1 (2017), 8:1–11. doi:
10.1145/2999533
Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, and Yi-Hsuan Tsai. 2020. BiFuse:
Monocular 360 Depth Estimation via Bi-Projection Fusion. In CVPR. 462–471. doi:
10.1109/CVPR42600.2020.00054
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image
quality assessment: from error visibility to structural similarity. IEEE Transactions
on Image Processing 13, 4 (2004), 600–612. doi: 10.1109/TIP.2003.819861
Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. 2020. SynSin:
End-to-end View Synthesis from a Single Image. In CVPR. doi: 10.1109/CVPR42600.
2020.00749
Jianing Zhang, Tianyi Zhu, Anke Zhang, Xiaoyun Yuan, Zihan Wang, Sebastian
Beetschen, Lan Xu, Xing Lin, Qionghai Dai, and Lu Fang. 2020. Multiscale-VR:
Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality. In ICCP. doi:
10.1109/ICCP48838.2020.9105244
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018.
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
doi: 10.1109/CVPR.2018.00068
Ke Colin Zheng, Sing Bing Kang, Michael F. Cohen, and Richard Szeliski. 2007. Layered
Depth Panoramas. In CVPR. doi: 10.1109/CVPR.2007.383295
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018.
Stereo Magnification: Learning View Synthesis using Multiplane Images. ACM
Transactions on Graphics 37, 4 (2018), 65:1–12. doi: 10.1145/3197517.3201323
Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros
Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation.
In 3DV. 690–699. doi: 10.1109/3DV.2019.00081
ACM Trans. Graph., Vol. 39, No. 6, Article 266. Publication date: December 2020.