Successfully reported this slideshow.

1

Share

×
1 of 83
1 of 83

# Montage4D: Interactive Seamless Fusion of Multiview Video Textures

1

Share

Project Site: http://montage4d.com

The commoditization of virtual and augmented reality devices and the availability of inexpensive consumer depth cameras have catalyzed a resurgence of interest in spatiotemporal performance capture. Recent systems like Fusion4D and Holoportation address several crucial problems in the real-time fusion of multiview depth maps into volumetric and deformable representations. Nonetheless, stitching multiview video textures onto dynamic meshes remains challenging due to imprecise geometries, occlusion seams, and critical time constraints. In this paper, we present a practical solution towards real-time seamless texture montage for dynamic multiview reconstruction. We build on the ideas of dilated depth discontinuities and majority voting from Holoportation to reduce ghosting effects when blending textures. In contrast to their approach, we determine the appropriate blend of textures per vertex using view-dependent rendering techniques, so as to avert fuzziness caused by the ubiquitous normal-weighted blending. By leveraging geodesics-guided diffusion and temporal texture fields, our algorithm mitigates spatial occlusion seams while preserving temporal consistency. Experiments demonstrate significant enhancement in rendering quality, especially in detailed regions such as faces. We envision a wide range of applications for Montage4D, including immersive telepresence for business, training, and live entertainment.

Project Site: http://montage4d.com

The commoditization of virtual and augmented reality devices and the availability of inexpensive consumer depth cameras have catalyzed a resurgence of interest in spatiotemporal performance capture. Recent systems like Fusion4D and Holoportation address several crucial problems in the real-time fusion of multiview depth maps into volumetric and deformable representations. Nonetheless, stitching multiview video textures onto dynamic meshes remains challenging due to imprecise geometries, occlusion seams, and critical time constraints. In this paper, we present a practical solution towards real-time seamless texture montage for dynamic multiview reconstruction. We build on the ideas of dilated depth discontinuities and majority voting from Holoportation to reduce ghosting effects when blending textures. In contrast to their approach, we determine the appropriate blend of textures per vertex using view-dependent rendering techniques, so as to avert fuzziness caused by the ubiquitous normal-weighted blending. By leveraging geodesics-guided diffusion and temporal texture fields, our algorithm mitigates spatial occlusion seams while preserving temporal consistency. Experiments demonstrate significant enhancement in rendering quality, especially in detailed regions such as faces. We envision a wide range of applications for Montage4D, including immersive telepresence for business, training, and live entertainment.

## More Related Content

### Related Books

Free with a 14 day trial from Scribd

See all

### Related Audiobooks

Free with a 14 day trial from Scribd

See all

### Montage4D: Interactive Seamless Fusion of Multiview Video Textures

1. 1. Montage4D: Interactive Seamless Fusion of Multiview Video Textures Ruofei Du†‡, Ming Chuang‡¶ , Wayne Chang‡, Hugues Hoppe‡§ , and Amitabh Varshney† †Augmentarium and GVIL | UMIACS | University of Maryland, College Park ‡Microsoft Research, Redmond ¶ Apple Inc. § Google Inc. THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLAND
2. 2. Motivation Popularity of VR and AR devices 2
3. 3. Motivation Popularity of VR and AR devices 3
4. 4. Motivation Popularity of VR and AR devices 4
5. 5. Motivation Popularity of VR and AR devices 5
6. 6. Motivation Popularity of VR and AR devices 6
7. 7. Motivation Assorted VR and AR applications 7
8. 8. Motivation Assorted VR and AR applications 8
9. 9. Motivation Assorted VR and AR applications 9
10. 10. These VR/AR applications have Huge Datarequirements
11. 11. These VR/AR applications have Huge Datarequirements Where is the 3D data going to come from?
12. 12. Introduction Fusion4D and Holoportation Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016) RGB Depth RGB Depth Mask RGB Depth Mask RGB Depth Mask Depth estimation & segmentation 8 Pods Capture SITE A SITE B Volumetric fusion Color Rendering Remote rendering Mesh, color, audio streams Network 13
13. 13. Fusing multiview video textures onto dynamic meshes with real-time constraint remains a challenging task 14
14. 14. 15Screen recorded with OBS. Computed in real-time from 8 videos.
15. 15. 16 of the participants does not believe the 3D reconstructed person looks real Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
16. 16. 17 of the participants does not believe the 3D reconstructed person looks real Screen recorded with OBS. Computed in real-time from 8 videos.
17. 17. Related Work 3D Texture Montage 18
18. 18. Related Work 3D Texture Montage 19
19. 19. Related Work 3D Texture Montage 20
20. 20. Related Work 3D Texture Montage 21
21. 21. Related Work 3D Texture Montage 22
22. 22. Related Work 3D Texture Montage 23
23. 23. Related Work 3D Texture Montage 24 Screen-space optical flow could fix many misregistration issues, but heavily relies on RGB features and screen resolution, and fails when rapidly changing viewpoints.
24. 24. Related Work 3D Texture Montage 25 Up to now, few systems but Holoportation could fuse dynamic meshes with multiple cameras in real time. This recent SIGGRAPH 2017 paper produces excellent dynamic reconstruction results, but uses a single RGBD camera, which may result in lots of occlusion.
25. 25. Related Work 3D Texture Montage 26 𝑤𝑖 = 𝑉 ⋅ max 0, 𝑛 ⋅ 𝑣𝑖 𝛼 Normal Weighted Blending (200 FPS) Majority Voting for color correction For each vertex, and for each texture, test if the projected color agrees with more than half of the other textures, if not, set the texture weight field to 0. Visibility test Normal vector Texture camera view direction
26. 26. Motivation Visual Quality Matters 27 Holoportation (Escolano et al. 2016) Montage4D (Du et al. 2018)
27. 27. Motivation Visual Quality Matters Montage4D (Du et al. 2018) 28 Holoportation (Escolano et al. 2016)
28. 28. What is our approach for real-time seamless texture fusion?
29. 29. Workflow Identify and diffuse the seams 30
30. 30. Workflow Identify and diffuse the seams 31
31. 31. Workflow Identify and diffuse the seams 32
32. 32. Workflow Identify and diffuse the seams 33
33. 33. What are the causes for the seams?
34. 34. Motivation Causes for Seams 35 Self-occlusion (depth seams) Field-of-View (background seams) Majority-voting (color seams)
35. 35. Self-occlusion (Depth Seams) One or two vertices of the triangle are occluded in the depth map while the others are not. Seams Causes 36
36. 36. Seams Causes 37 Depth: 1.3 Depth: 1.4 Depth: 30
37. 37. Seams Causes 38 Raw projection mapping results Seams after occlusion test
38. 38. Majority Voting (Color Seams) Each vertex is assigned with 8 colors coming from the 8 cameras. These colors are classified into different clusters in LAB color space with a 0.1 threshold. The mean color of the largest cluster is named majority voting color. Seams Causes 39
39. 39. Majority Voting (Color Seams) The triangle vertices have different results of the majority voting colors, which may be caused by either mis- registration or self-occlusion. Seams Causes 40
40. 40. Seams Causes 41 Inside Cluster of Mean L*A*B Color Inside Cluster of Mean L*A*B Color Outside Cluster of Mean L*A*B Color
41. 41. Seams Causes 42 Raw projection mapping results Seams after occlusion test Seams after majority voting test
42. 42. Field of View (Background Seams) One or two triangle vertices lie outside the camera’s field of view or in the subtracted background region while the rest are not. Seams Causes 43
43. 43. Seams Causes 44 Foreground Foreground Background
44. 44. Seams Causes 45 Raw projection mapping results Seams after field-of-view test
45. 45. Seams Causes 46
46. 46. 47 of the million-level triangles are seams (for each view) Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
47. 47. For a static frame, how can we get rid of the annoying seams at interactive frame rate?
48. 48. How can we spatially smooth the texture (weight) field near the seams so that we cannot see visible seams in the results?
49. 49. Workflow Identify and diffuse the seams 50
50. 50. Geodesics For diffusing the seams 51 Geodesic is the shortest route between two points on the surface.
51. 51. Geodesics For diffusing the seams 52 On triangle meshes, this is challenging because of the computation of tangent directions. And shortest paths are defined on edges instead of the vertices.
52. 52. Geodesics For diffusing the seams 53 We use the algorithm by Surazhsky and Hoppe for computing the approximate geodesics. The idea is to maintain only 2~3 shortest paths along each edge to reduce the computational cost.
53. 53. 54
54. 54. What are the causes for the blurring?
55. 55. Motivation Causes for blurring 56 Texture projection errors Careful calibration + Bundle adjustment Normal-weighted blending Imprecise geometries / Noisy point clouds / Different specular highlights 𝑤𝑖 = 𝑉 ⋅ max 0, 𝑛 ⋅ 𝑣𝑖 𝛼
56. 56. Motivation Causes for blurring 57 Texture projection errors Normal-weighted blending View-dependent rendering 𝑤𝑖 = 𝑉 ⋅ max 0, 𝑣 ⋅ 𝑣𝑖 𝛼 𝑤𝑖 = 𝑉 ⋅ max 0, 𝑛 ⋅ 𝑣𝑖 𝛼
57. 57. 58 Visibility test Visibility% of view i User camera’s view direction Texture camera view direction Geodesics 𝑫 𝑣 𝑖 (𝑡) For frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣, We define the Desired Texture Field:
58. 58. 59
59. 59. 60
60. 60. Workflow Identify and diffuse the seams 61
61. 61. Temporal Texture Field Temporally smooth the texture fields 62 For frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣, We define the Temporal Texture Field (exponential smoothing) 𝑻 𝑣 𝑖 𝑡 = 1 − 𝜆 𝑻 𝑣 𝑖 𝑡 − 1 + 𝜆𝑫 𝑣 𝑖 (𝑡) Texture field of the previous frame Temporal smoothing factor 1 𝐹𝑃𝑆
62. 62. Temporal Texture Fields Transition between views 63
63. 63. 64 Holoportaion Montage4D
64. 64. 65 Holoportaion Montage4D
65. 65. 66
66. 66. 67
67. 67. 68 Note: Results use default parameters of Floating Textures, which may not be optimal for our datasets. Still, for optical flow based approach, it would be better if seam vertices are assigned with less texture weights.
68. 68. 69
69. 69. 70
70. 70. With additional computation for seams, geodesics, and temporal texture fields, is our approach still in real time?
71. 71. Exmperiment Cross-validation 72 Montage4D achieves better quality with over 90 FPS on NVIDIA GTX 1080 • Root mean square error (RMSE) ↓ • Structural similarity (SSIM) ↑ • Signal-to-noise ratio (PSNR) ↑
72. 72. 73 of the participants does not believe the 3D reconstructed person looks real
73. 73. Experiment Break-down of a typical frame 74 Most of the time is used in communication between CPU and GPU
74. 74. In conclusion, Montage4D uses seam identification, geodesic fields, and temporal texture field to provides a practical texturing solution for real-time 3D reconstructions. In the future, we envision that Montage4D is useful for fusing the massive multi-view video data into VR applications like remote business meeting, remote training, and live broadcasting.
75. 75. Thank you With a Starry Night Stylization Ruofei Du ruofei@cs.umd.edu Amitabh Varshney varshney@cs.umd.edu Wayne Chang wechang@microsoft.com Ming Chuang mingchuang82@gmail.com Hugues Hoppe hhoppe@gmail.com • www.montage4d.com • www.duruofei.com • shadertoy.com/user/starea • github.com/ruofeidu • I am graduating December 2018.
76. 76. Introduction Fusion4D and Holoportation image courtesy: Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
77. 77. Limitations Holoportation image courtesy: afterpsychotherapy.com 1.6 Gbps per second
78. 78. Introduction Mobile Holoportation image courtesy: Wayne Chang and Spencer Fowers
79. 79. Introduction Mobile Holoportation image courtesy: Jeff Kramer 30-50 Mbps
80. 80. 83

### Editor's Notes

• Good afternoon everyone,
My name is Ruofei Du. I am a Ph.D. candidate advised by Prof. Amitabh Varshney from University of Maryland, Colelge Park.
Today I am going present Montage4D, Interactive Seamless Fusion of Multiview Video Textures.
This work is collaborated with Ming Chuang, Wanye Chang, and Hugues Hoppe at Microsoft Research, Redmond.
• Recently, consumer-level virtual and augmented reality displays have exploded in popularity.
• For example, Oculus Rift,
• HTV Vive
• , and the Playstation VR was sold over 3 million during the past years.
• Microsoft HoloLens have shipped over a few thousand all over the world.
• These advances of VR and AR technologies have catalyzed many potential applications across many disciplines, such as, remote education,
• immersive entertainment,
• These VR and AR applications have huge data requirements—so the question becomes: <click>
• where is this massive amount of 3D data going to come from?
• One potential source of 3D data in VR and AR will be real-time multi-view reconstruction.
• In 2016, Microsoft Research presents Holoportation, the first-ever high-fidelity real-time reconstruction system.
This system uses 8 pods of color and depth cameras. It computes volumetric fusion in real time, and transfer the meshes, colors, and audio sterams to the renderer for remote rendering.

• However, fusing multiview video textures onto dynamic meshes with real-time constraint is a challenging task.
• For example, here is a short demo computed and recorded in real-time
• According to the user study,
about 30% of the participants does not believe that
the 3D reconstructed person looks real compared with a real person.
• This is mostly due to imprecise reconstructed geometries, textures seams, and normal-weighted blending.
So in this paper, we present Montage4D, a practical solution to blend the video textures on dynamic meshes in real time.
• Previous research has addressed the problem of fusing multiple textures from static 3D models,
• But compared with Holoportation,
• their methods usually takes 10 minutes to process a single frame, because they try to solve an optimization problem on millions of vertices.
• Or 30 seconds according to a recent SIGGRAPH paper.
• Or 30 seconds per frame with texture atlas.
• In addition, Eisemann have invented a novel view-dependent rendering technique, named Floating Texture (FT), running at $5$-$24$ frames per second. They have found that Screen-space optical flow could fix many misregistration issues, but heavily relies on RGB features, and fails when rapidly changing viewpoints.

% While FT solves a similar problem as our paper, the core ideas and rendering results between FT and Montage4D have significant differences. % First, Montage4D uses {\em Temporal Texture Fields} to achieve {\bf both} spatial and temporal consistency during the visual navigation of dynamic meshes. % FT chooses two or three of the closest input images for sparse camera arrangements, which works well for static models and static viewpoints, but suffers from spatial artifacts when the user quickly changes the viewpoints or when the mesh changes incoherently (occlusion changes can, for example, lead to such artifacts). % %This leads to the future work section of FT which states that "one source for artefacts remaining in rendering dynamic scenes is that of temporally incoherent geometry". % Second, as an optical-flow-based approach, surface specularity and poor color calibration of cameras can cause challenges in the optical flow estimation for the FT algorithm. Additionally, screen-based optical flow approach may also lead to visual inconsistencies (due to changes in specular lighting or occlusion) when moving the viewpoint.% These are some of the reasons we chose to not use the optical-flow-based approach.% Third, the diffusion processes are fundamentally different. FT diffuses the binary visibility map outwards in the screen-space, while Montage4D diffuses the seams inwards on the 3D geometry. Since the diffusion radius in FT is fixed in the fragment shader, its value is not consistent when the user zooms in or out. See Figure~\ref{fig:results2} for a comparison of different texturing schemes.% Although FT does not work well for our datasets, applying real-time optical flow over the mesh or within the screen space has the potential to address some of the misalignment errors in small patches.
• So our main comparison is with the Holoportation system published last year.

First, the Holoportation uses normal weighted blending to compute a texture field for each vertex.
They first conduct a visibility test for occlusion, then uses the dot product of the normal vector and the view direction of the texture cameras to compute the scalar field of the texture weights
Next, they use a majority voting algorithm for color correction,
• However, as we showed before, it suffers from blurring
• and visible seams, especially in human faces.
• So, What is our approach for real-time seamless texture fusion?
• Our pipeline takes the input triangle meshes from a real-time reconstruction system called Fusion4D.
• First, out system receives the input triangles meshes from Fusion4D via the network
• Then we compute the rasterized depth maps and texture maps with a background subtraction model.
• Next, we carefully identify the seams caused by mis-registration and occlusoin
• Our first major research questions is: what are the causes for the blurring and seams?
• The seams majorly comes from three reasons, self-occlusion, majority-voting, and field-of-view.
I will explain them one by one.
• First, for each triangle, we recognize it as self-occlusion seam, if and only if,
• One or two vertices of the triangle are occluded in the depth map while the others are not.
• Here is an example, the person is holding a toy in front of the body, which results self-occlusion seams on his T-shirt.
• Next, we identify the majority voting seam, if and only if
• Next, we identify the majority voting seam, if and only if
The triangle vertices have different results in the majority voting process, which may be caused by either misregistration.
• The triangle vertices have different results near the mean color
• Here is an example, the occlusion test cannot remove some of the brown color projected from the toy,
While the majority voting test can eliminate them, they may also introduce visible seams.
• The last category of seams are caused by camera’s limited field of view.
These seam triangles have one of two vertices lie outside of the camera’s field of view while the rest are not

• Here is an illustration
• For example, there is toy lying on the ground, where each camera only observe a small portion of the toy,
• Here is an example of the seam triangles identified on this person’s faces.
• Typically, less than 1% of the million-level triangles are seams.
• Our next questions are, for a static frame,
How can we get rid of the annoying seams at interactive frame rate
• Moreover, how can we spatially smooth the texture weight field near the seams, so that we cannot see visible seams in the results?
• In Montage4D, we use the discrete geodesics to diffuse the texture fields from the seams.
• Basically, geodesic is the shortest route between two points on the surface
• On triangle meshes, computing geodesics is very challenging, because of the computation of tangent directions,
And shortest paths are defined on edges instead of the vertices.
• We have adopted the algorithm from Hugues Hoppe’s group for computing the approximate geodesics.
The main idea, is to maintain only 2~3 shortest paths along each edges to reduce the computational cost.
• Here are the results of estimating geodesic on the triangle meshes.
Using 20 iterations on the GPU, we can achieve a good diffusion field for the rendering
• Another research questions is: what are the causes for the blurring?
• On one hand, the blurring comes from the texture projection errors,
• On the other hand, the normal weighted blending may accumulate errors from every camera, resulting the blurring in people’s faces;
To solve this, we use view-dependent rendering to replace it.
• With the geodesics, we also used view-dependent rendering techniques instead of normal based blending.
However, this may introduce temporal artifacts,
• However, using only the desired texture fields, it may lead to temporal inconsistency if you move the camera rapidly, or suddenly add another video stream in the reconstruction pipeline.
• With the geodesics, we also used view-dependent rendering techniques instead of normal based blending.
However, this may introduce temporal artifacts,
• so we use temporal smoothing technique for updating the temporal texture fields.
• We also used view-dependent rendering techniques instead normal based blending
However, this may introduce temporal artifacts, so we calculate the gradients of the changes in the texture fields, then multiply by a factor lambda of 0.02 to slowly change the texture fields.
• Here I show the animation by applying temporal texture fields.
• Here are the results after using the view-dependent rendering, and the temporal texture fields.

Row 2 and row 4 are the results after applying the view-dependent rendering with the temporal texture fields.
• Here are the results after using the view-dependent rendering, and the temporal texture fields.

Row 2 and row 4 are the results after applying the view-dependent rendering with the temporal texture fields.
• Here are some more results between the Holoportation and Montage4D.
• In addition to Holoportation, we also compared results with other rendering algorithms such as the floating textures.
• Without identification of the seam triangles, the floating texture sometimes amplify the errors alongside the seams.
• Next you may wonder:
With additional computation for seams, geodesics, and temporal texture fields, is our approach still in real time?
• In out cross-validation experiments in 5 datasets on an NVIDIA GTX 1080, we can see that Montage4D achieves better quality with over 90 FPS, which is fully capable of VR and AR applications.
• Here are some more examples.
• We also evaluate the break-down time consumption of a particular input mesh,
Most of the time is used in communication between CPU and GPU when computing the geodesics.
• In conclusion, Montage4D provides a practical texturing solution for
real-time 3D reconstructions. In the future, we envision that Montage4D is useful
for fusing the massive multi-view video data into VR applications like