Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fusing Multimedia Data Into Dynamic Virtual Environments

77 views

Published on

In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments.
First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatio-temporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos.
We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. We conduct a user study with 20 participants to qualitatively evaluate Social Street View and Geollery. The user study has identified several use cases for these systems, including immersive social storytelling, experiencing culture, and crowd-sourced tourism.
We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures, haptic feedback, and visual cryptography in virtual and augmented reality environments.
Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Fusing Multimedia Data Into Dynamic Virtual Environments

  1. 1. Fusing Multimedia Data Into Dynamic Virtual Environments Ruofei Du ruofei@cs.umd.edu Dissertation Committee: Dr. Varshney, Dr. Zwicker, Dr. Huang, Dr. JaJa, and Dr. Chuang THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  2. 2. Motivation Popularity of VR and AR devices 2
  3. 3. Motivation Popularity of VR and AR devices 3
  4. 4. Motivation Popularity of VR and AR devices 4
  5. 5. Motivation Popularity of VR and AR devices 5
  6. 6. Motivation Popularity of VR and AR devices 6 4.5million VR and AR devices shipped worldwide https://www.statista.com/statistics/671403/global-virtual-reality-device-shipments-by-vendor
  7. 7. Motivation Assorted VR and AR applications Remote Education 7
  8. 8. Motivation Assorted VR and AR applications Immersive Entertainment 8
  9. 9. Motivation Assorted VR and AR applications Business Meetings 9
  10. 10. Where is the 3D data going to come from?
  11. 11. Motivation Lack of contents in VR and AR 12
  12. 12. Motivation Lack of contents in VR and AR 13
  13. 13. Automatically fusing multimedia data into dynamic virtual environments
  14. 14. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES 15
  15. 15. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Montage4D ACM I3D’18 To appear in JCGT’18 Social Street View Best Paper Award ACM Web3D’16 Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 16
  16. 16. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Video Fields ACM Web3D’16 Montage4D ACM I3D’18 To appear in JCGT’18 Social Street View Best Paper Award ACM Web3D’16 Geollery Under Review, ACM CHI’18 Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 17
  17. 17. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Video Fields ACM Web3D’16 Montage4D ACM I3D’18 JCGT’18 Haptics and Gestures • VRSurus (CHI EA’16) • HandSight (ECCVW’14,TACCESS’16, SIGACCESS’16) FoveatedRendering • Kernel Foveated Rendering (I3D’18) Learning Sketchy Scenes • Sketchy Scenes (ECCV’18) • LUCCS (Under Review, TOG’18) Visualizationand Cryptography • AtmoSPHERE(CHI EA’15) • TopicFields • ARCrypt(Under Review VR’19) Social Street View Best Paper Award ACM Web3D’16 Geollery Under Review, ACM CHI’19 Under Review, IEEE VR’19 Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 HAPTICS IN VR, VISUAL CRYPTOGRAPHY IN AR, VISUALIZATION, ETC. 18
  18. 18. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Social Street View Best Paper Award ACM Web3D’16 19
  19. 19. Social Street View: Blending Immersive Street Views with Geotagged Social Media Ruofei Du and Amitabh Varshney {ruofei, varshney} @ cs.umd.edu www.SocialStreetView.com In Proceedings of the International Conference of Web3D 2016. [Best Paper Award] THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  20. 20. image courtesy: plannedparenthood.org Introduction Social Media 21
  21. 21. image courtesy: huffingtonpost.com Introduction Social Media – Reflect real life 23
  22. 22. image courtesy: forbes.com Introduction Social Media - Versatility 24
  23. 23. 25 Metadata are useful for understanding spatial relevance, temporal impacts, relationship amongst users, and the associated events.
  24. 24. EXIF image courtesy: B&T Magazine 26
  25. 25. Introduction Meta data - Time of Creation image courtesy: Brian Clegg 27
  26. 26. Introduction Meta data - Location of Creation image courtesy: squarespace.com 28
  27. 27. Introduction Meta data - Camera Information image courtesy: naldzgraphics.net 29
  28. 28. image courtesy: instagram.com, facebook.com, twitter.com Related Work Linear narrative visualization 30
  29. 29. image courtesy: pinterest.com 31 Related Work 2D layout
  30. 30. Related Work Natural Immersive Virtual Reality? image courtesy: drivenhealthy.com 32
  31. 31. Related Work Karnath et al. and Loomis et al. 33
  32. 32. Related Work VR and Visual Memories 34
  33. 33. Related Work VR and Visual Memories 35
  34. 34. Related Work VR and Visual Memories 36
  35. 35. Related Work VR and Visual Memories 37 Immersive virtual environment can improve social interactions and spatial awareness relates to visual memories.
  36. 36. 38 Very little work has been carried out in designing immersive interfaces that interleave visual navigation of our surroundings with social media content
  37. 37. 39 Visualizing social media in a 2D world map
  38. 38. Related Work Visualization of Geo-tagged Information 40
  39. 39. image courtesy: twitterstand.umiacs.umd.edu Related Work TwitterStand: News in Tweets Sankaranarayanan et al. SIGGIS 2009 41
  40. 40. Related Work Visualization of Geo-tagged Information 42
  41. 41. image courtesy: flickr.com Related Work Flickr - World Map Serdyukov et al. SIGIR 2009 43
  42. 42. Related Work Visualization of Geo-tagged Information 44
  43. 43. image courtesy: panoramio.com Related Work Panoramio - Photos of the World Cuenca and Manchón, 2005-2015 45
  44. 44. Related Work Visualization of Geo-tagged Information 46
  45. 45. image courtesy: photostand.umiacs.umd.edu Related Work PhotoStand for News Photos Samet et al. VLDB 2013 47 All these systems are limited by the 2D world map.
  46. 46. Related Work Visualization of Geo-tagged Information (cont.) 48 What about 3D solutions?
  47. 47. Related Work Visualization of Geo-tagged Information (cont.) 49Registers user-annotated text and images to a particular point in 3D space.
  48. 48. Related Work Visualization of Geo-tagged Information (cont.) 50However, both the 3D model and the annotation are predefined for rendering such a scene.
  49. 49. Related Work Visualization of Geo-tagged Information (cont.) 51
  50. 50. Related Work Visualization of Geo-tagged Information (cont.) 52
  51. 51. Related Work Visualization of Geo-tagged Information (cont.) 53
  52. 52. Related Work Visualization of Geo-tagged Information (cont.) 54 They use offline 3D reconstruction algorithm or existing 3D models for registering the photos onto the meshes. Such methods usually suffer from HOURS of processing time for a single location.
  53. 53. So what about our approach?
  54. 54. Social Street View
  55. 55. Social Street View
  56. 56. Demonstration The Augmentarium, UMIACS 6000 x 3000 pixels 58
  57. 57. 59
  58. 58. 60
  59. 59. Conception, architecting & implementation Social Street View A mixed reality system that can depict geo-tagged social media in immersive 3D web environments 61
  60. 60. Blending multiple modalities of Street View + Social Media Depth maps, normal maps, and road orientation GPS coordinates and time creation 62
  61. 61. Enhancing visual augmentation Maximal Poisson-disk sampling Evaluated by image saliency metrics 63
  62. 62. Achieving cross-platform compatibility by WebGL + Three.js smartphones, tablets, desktop, high-resolution large- area wide field of view tiled display walls, as well as head-mounted displays. 64
  63. 63. Technical Challenges? 65
  64. 64. 66
  65. 65. Architecture Social Street View System Flowchart 67
  66. 66. Street View Cars - Cameras, LIDARs and GPS Image courtesy from Google Street View image courtesy: Google 68
  67. 67. Tiles of Panoramic Images Image courtesy from Google Street View image courtesy: Google 69
  68. 68. image courtesy: Google 70
  69. 69. 71
  70. 70. Depth Map Decoded from Google Maps API v3 72
  71. 71. Normal Map Decoded from Google Maps API v3 73
  72. 72. Road Orientations Decoded from Google Maps API v3 74
  73. 73. image courtesy: Instagram 75
  74. 74. Multi-level Strategy for Streaming Social Media Image courtesy from Instagram users in public domains image courtesy: Instagram 76
  75. 75. image courtesy: Instagram 77 Data Mining Multi-level Strategy … … 8 METERS 16 METERS 1024 METERS
  76. 76. SQL Database Indexed by B+ Tree logarithmic insertion & query 78
  77. 77. SQL Database Indexed by B+ Tree logarithmic insertion & query 79 Longitude, latitude, timestamps are indexed by B+ trees.
  78. 78. Haversine Formula Andrew, 1805 80 The distance between social media and street view panorama Haversine formula, which defines the distance on the surface of a sphere.
  79. 79. 81 geotagged social media spatiotemporal filters custom location query immersive 360° map 2D mini-map
  80. 80. 82
  81. 81. 83 How can we render and layout the social media?
  82. 82. Baseline Random Uniform Sampling 84
  83. 83. Baseline Random Uniform Sampling 85 Without uniform random sampling
  84. 84. Without uniform sampling Accumulation 86
  85. 85. Baseline Random Uniform Sampling 87 With uniform random sampling
  86. 86. Not preferred Overlays high saliency regions 88
  87. 87. Add depth map Remove sky and ground (most) 89 Sky: Distance Limit:
  88. 88. Add depth map Remove sky and ground (most) 90
  89. 89. 91 Can we ensure each image aligns with the building geometries?
  90. 90. Add normal map Remove all ground, align images 92 Define the ground:
  91. 91. 93
  92. 92. 94 Can we further reduce visual clutter and occlusion?
  93. 93. Maximal Poisson-disk Sampling Gamito et al. Remove visual clutter and occlusion 95 The Poisson-disk distribution is uniform. Minimum distance between each pair of social media is greater than R. Terminates the sampling when it reaches the maximal coverage.
  94. 94. Dart-throwing Algorithm PixelPie by Ip et al. using vertex and fragment shaders image courtesy: PeterPan23@Wikipedia 96
  95. 95. Pixel-Pie Algorithm Remove when occlusion occurs 97 Pixel-Pie by Ip. et al. uses GPU depth-testing feature for efficient sampling.
  96. 96. Project Social Media Pictures By Maximal Poisson-disc Sampling 98 Sampling in regions which are not Sky nor Ground, and with a limitation of depth values; Place social media as billboards upon the building geometries; Cast soft shadows as if lightings were 45 degree to the normal vectors.
  97. 97. Sampling Comparison Remove visual clutter and occlusion 99
  98. 98. Results for Urban Areas 100
  99. 99. 101 Results for Urban Areas
  100. 100. Results for Urban Areas 102
  101. 101. 103 The algorithm works well in dense urban areas such as the Manhattan District
  102. 102. 104 What if there are no buildings in the scene?
  103. 103. Scenic Landscapes Using orientation of the road 105 We use only the orientations of the road.
  104. 104. Scenic Landscapes Using the road orientations 106
  105. 105. Scenic Landscapes Using the road orientations 107
  106. 106. Scenic Landscapes Using the road orientations 108
  107. 107. Scenic Landscapes Using the road orientations 109
  108. 108. What if … Temporal information is used for filtering and rendering? image courtesy: mindtheproduct.com 110
  109. 109. 111
  110. 110. Experiment 100 panoramas, 80,000+ social media 112
  111. 111. Experiment Nvidia Quadro K6000 113
  112. 112. Experiment Conditions of panoramic images Immersive high-resolution screens Common Consumer-level Displays 114
  113. 113. Initialization Time Panorama takes a while to load 115 Panoramas takes much more time than social media for initializing a new view.
  114. 114. After Prefetching ¾ - ⅘ time reduced 116 ~250msPreload the panoramic texture into the GPU memory:
  115. 115. Initialization Time Panorama takes a while to load 117 Stitching and shipping panoramas to GPU is the most time-consuming part.
  116. 116. Rendering Time Almost 58~60 FPS 118 After initialization, the system runs in real time at about 58~60 FPS.
  117. 117. How effective is the Maximal Poisson-disk Sampling for reducing the visual clutter?
  118. 118. Saliency Map Hou et al. TPAMI 2011 120 How much social media covers the salient regions of the panorama?
  119. 119. Saliency Map Hou et al. TPAMI 2011 121 Saliency maps use color, intensity, and orientation contrasts to estimate where humans will look at.
  120. 120. Saliency Map Hou et al. TPAMI 2011 122 We prefer the social media to cover less high-saliency regions.
  121. 121. Social Media Coverage 100 panoramas for each algorithm 123
  122. 122. Social Media Coverage 100 panoramas for each algorithm 124 The first two algorithms occupy 30% - 50% of the high-saliency regions.
  123. 123. Social Media Coverage 100 panoramas for each algorithm 125 In the last algorithm, all the images are separated with each other and uniformly distributed on the building surfaces. The last two algorithms largely reduce the coverage of saliency map to less than 20%.
  124. 124. What could be the potential applications of Social Street View?
  125. 125. 127
  126. 126. Stuck in traffic on our way to Cabo with this awesome view #roadtrip #cabo #view #mexico Daniela on Instagram July 12, 2014 128
  127. 127. 129
  128. 128. 130
  129. 129. Business Advertising Museum, restaurant, real-estate ... 131
  130. 130. ... dinner started off with amazing oysters paired with my favorite Ruinart blanc de blancs champagne By frankiextah on Instagram 132
  131. 131. Business Advertising Museum, restaurant, real-estate ... 133 Social Street View enhances future consumers’ visual memories and makes it easier for them to seek the “amazing oysters” around this place.
  132. 132. Business Advertising Museum, restaurant ... 134
  133. 133. Learning Culture Taierzhuang, Chinese Spring Festivals 135
  134. 134. Crowd-sourced Tourism 136
  135. 135. In conclusion, Social Street View provides a novel solution for automatically fusing geotagged social media in an immersive 3D environments.
  136. 136. We envision social media and panoramic videos are significant parts of the big data requirements for VR and AR applications
  137. 137. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 140
  138. 138. Spherical Harmonics for Saliency Computation and Virtual Cinematography in 360° Videos Ruofei Du and Amitabh Varshney {ruofei, varshney} @ cs.umd.edu Best Student Poster Award at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2018. Under Review, IEEE Transactions on Visualization and Computer Graphics. THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  139. 139. Motivation Spherical 360 degree videos 142 image courtesy: https://www.wareable.com/media/images/2016/04/nokia-ozo-1461975451-F4jp-column-width-inline.jpg
  140. 140. Motivation Spherical 360 degree videos 143image courtesy: http://www.newsshooter.com/wp-content/uploads/2016/11/Screen-Shot-2016-11-01-at-11.23.46-PM.png
  141. 141. Motivation Spherical 360 degree videos 144image courtesy: http://www.newsshooter.com/wp-content/uploads/2016/11/Screen-Shot-2016-11-01-at-11.23.46-PM.png
  142. 142. Motivation Spherical 360 degree videos 145image courtesy: http://danielschristian.com/learning-ecosystems/wp-content/uploads/2016/07/bubl-in-classroom-july2016.jpg
  143. 143. Motivation Spherical 360 degree videos 146
  144. 144. Motivation Almost 90% of the pixels are outside the binocular field of view 147 Visual Medium Binocular Field of View (FoV) Ratio Beyond FoV Horizontal Vertical Human Eyes 114° 135° HTC Vive, Oculus Rift 85° 95° Samsung Gear VR 75° 85° Google Cardboard 65° 75°
  145. 145. Motivation Almost 90% of the pixels are outside the binocular field of view 148 Visual Medium Binocular Field of View (FoV) Ratio Beyond FoV Horizontal Vertical Human Eyes 114° 135° 76.25% HTC Vive, Oculus Rift 85° 95° 87.53% Samsung Gear VR 75° 85° 90.16% Google Cardboard 65° 75° 92.48% Almost 90% of the pixels are outside a typical HMD's field of view (FoV)
  146. 146. Motivation Understanding where people are likely to look at 149
  147. 147. Background Saliency Maps 150 Saliency maps represent the conspicuity at every location in the visual field by a scalar quantity.
  148. 148. Background Saliency Maps, Itti et al., PAMI 1998 151 The goal of saliency detection is to model the visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system.
  149. 149. Motivation Saliency Prediction 152 Conventional saliency models can hardly deal with horizontal clipping and spherical rotations
  150. 150. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 153 The classic Itti et al’s model treats the input as an rectilinear image.
  151. 151. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 154 What if the spherical image is horizontally translated?
  152. 152. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 155 Will the results be the same?
  153. 153. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 156 A simple horizontal clipping causes a false negative result.
  154. 154. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 157 What about a simple spherical rotation?
  155. 155. Motivation Classic Saliency Approaches are not consistent (Itti et al.’s model) 158 A spherical rotation may cause both false negative and false positive.
  156. 156. Motivation How can we build an efficient and consistent saliency model in the spherical space? 159 The Input Frame Source Horizontal Clipping Spherical Rotation Our ModelItti et al’s Model How can we build an efficient and consistent saliency model in the spherical space?
  157. 157. 161
  158. 158. Spherical Harmonics Video demo 162 Spherical Harmonics (SH) coefficients can be used for summarizing the feature maps and computing saliency maps in the 𝑺𝑶(𝟐) space
  159. 159. Algorithm Evaluating SH Functions 163 Pre-compute SH functions for every pair of (𝜃, 𝜙) 𝑂(𝑁𝑀𝐿2) Load to the GPU memory Several minutes
  160. 160. Algorithm Extracting Static Features 164 Compute SH coefficients using prefix sum on the GPU in real time 𝑂(𝐿2 𝑙𝑜𝑔𝑁𝑀) Here, 𝑓𝑖,𝑗 is the feature value, 𝐻𝑖,𝑗 is the precomputed value with SH functions at each pixel
  161. 161. Algorithm Extracting Static Features 165 Intensity Red-Green Contrast Blue-Yellow Contrast
  162. 162. Algorithm Evaluating SH coefficients 166 For a single feature map and L = 15 bands of spherical harmonics, It produces 152 = 225 SH coefficients in real time.
  163. 163. Reconstruction From 15 bands of SH coefficients 167 Clouds Mountains Humans Caves Landscape
  164. 164. 168 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 The High Band - QTheLowBand-P Difference
  165. 165. Algorithm Spherical spectral residual model 169 Our spherical spectral residual (SSR) model only takes 𝑄2 − 𝑃2 = 176 multiplication and summation
  166. 166. Algorithm Nonlinear normalization 170 We use the non-linear normalization technique (Itti et al.) to combine different spatial and temporal saliency maps.
  167. 167. 171
  168. 168. Results Visual Comparison Between Itti et al’s model and our SSR model 172
  169. 169. Results Visual Comparison Between Itti et al’s model and our SSR model 173
  170. 170. Results Timing Comparison Between Itti et al’s model and our SSR model 174 Resolution The Average Timing Per Frame Itti et al. (CPU) SSR (CPU) SSR (GPU) 1920x1080 104.46 ms 21.34 ms 10.81 ms 4096x2048 314.94 ms 48.18 ms 13.20 ms 7680x3840 934.26 ms 69.53 ms 26.58 ms
  171. 171. Virtual Cinematography How can saliency maps be used? 175 How can saliency maps guide automatic camera control for 360° videos?
  172. 172. 176 The most salient objects may vary from frame to frame, due to the varying occlusions, colors, and self-movement. Virtual Cinematography Most salient objects are not reliable
  173. 173. 177 An approach that relies on just tracking the most salient objects may incur rapid motion of the camera, and worse still, may induce motion sickness in virtual reality. Virtual Cinematography Camera jittering
  174. 174. 178
  175. 175. SpatioTemporal Model Virtual Cinematography 179 Ratio of the observed sum of saliency values The virtual camera distance between adjacent frames
  176. 176. Evaluation Compute 2048 potential camera positions 180 32×64 pairs of camera positions uniformly over the sphere The optimal position considers both saliency coverage and temporal movement.
  177. 177. Evaluation Compute 2048 potential camera positions 181 The video frame rate is much smaller than the rendering frame rate
  178. 178. 182 Convert the spherical coordinates to quaternions: Spherical interpolation using the spline curves: Interpolation Quaternion + Spline curves
  179. 179. Spline Interpolation Yellow: Optimal camera position Blue: Interpolated path 183
  180. 180. 184
  181. 181. 185 0 20 40 60 80 100 120 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 TemporalMotion(diagonaldegrees) Frame Number Comparison of Temporal Artifacts Between MaxCoverage and SpatioTemporal Models MaxCoverage SpatioTemporal
  182. 182. Conclusion Spherical Harmonics Saliency 186 In conclusion, we have presented a novel GPU-driven pipeline which employs spherical harmonics to directly compute the saliency maps for 360° videos in the 𝑆𝑂(2) space.
  183. 183. Conclusion Spherical Harmonics Saliency 187 Our method remains consistent for challenging cases like horizontal clipping, spherical rotations, and equatorial bias, and can be computed on the GPU in real time.
  184. 184. Conclusion Spherical Harmonics Saliency 188 We demonstrate the application of using spherical harmonics saliency for virtual cinematography in 360° videos.
  185. 185. Conclusion Spherical Harmonics Saliency 189 We present a novel spatiotemporal optimization model to ensure large spatial saliency coverage and reduce temporal jitters of the camera motion.
  186. 186. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Montage4D ACM I3D’18 To appear in JCGT’18 190
  187. 187. Montage4D: Interactive Seamless Fusion of Multiview Video Textures Ruofei Du†‡, Ming Chuang‡¶ , Wayne Chang‡, Hugues Hoppe‡§ , and Amitabh Varshney† †Augmentarium and GVIL | UMIACS | University of Maryland, College Park ‡Microsoft Research, Redmond ¶ Apple Inc. § Google Inc. THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLAND
  188. 188. Introduction Fusion4D and Holoportation Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016) RGB Depth RGB Depth Mask RGB Depth Mask RGB Depth Mask Depth estimation & segmentation 8 Pods Capture SITE A SITE B Volumetric fusion Color Rendering Remote rendering Mesh, color, audio streams Network 192
  189. 189. Fusing multiview video textures onto dynamic meshes with real-time constraint is a challenging task
  190. 190. 194Screen recorded with OBS. Computed in real-time from 8 videos.
  191. 191. 195 of the participants does not believe the 3D reconstructed person looks real
  192. 192. 196 of the participants does not believe the 3D reconstructed person looks real Screen recorded with OBS. Computed in real-time from 8 videos.
  193. 193. Related Work 3D Texture Montage 197
  194. 194. Related Work 3D Texture Montage 198
  195. 195. Related Work 3D Texture Montage 199
  196. 196. Related Work 3D Texture Montage 200
  197. 197. Related Work 3D Texture Montage 201
  198. 198. Related Work 3D Texture Montage 202
  199. 199. Related Work 3D Texture Montage 203 𝑤𝑖 = 𝑉 ⋅ max 0, ො𝑛 ⋅ ෝ𝑣𝑖 𝛼 Normal Weighted Blending (200 FPS) Visibility test Normal vector Texture camera view direction
  200. 200. Motivation Visual Quality Matters 204 Holoportation (Escolano et al. 2016) Montage4D (Du et al. 2018)
  201. 201. Motivation Visual Quality Matters Montage4D (Du et al. 2018) 205 Holoportation (Escolano et al. 2016)
  202. 202. What is our approach for real-time seamless texture fusion?
  203. 203. Workflow Identify and diffuse the seams 207
  204. 204. Workflow Identify and diffuse the seams 208
  205. 205. Workflow Identify and diffuse the seams 209
  206. 206. Workflow Identify and diffuse the seams 210
  207. 207. What are the causes for the seams?
  208. 208. Motivation Causes for Seams 212 Self-occlusion (depth seams) Field-of-View (background seams) Majority-voting (color seams)
  209. 209. Self-occlusion (Depth Seams) One or two vertices of the triangle are occluded in the depth map while the others are not. Seams Causes 213
  210. 210. Seams Causes 214 Depth: 1.3 Depth: 1.4 Depth: 30
  211. 211. Seams Causes 215 Raw projection mapping results Seams after occlusion test
  212. 212. Majority Voting (Color Seams) Each vertex is assigned with 8 colors coming from the 8 cameras. These colors are classified into different clusters in LAB color space with a 0.1 threshold. The mean color of the largest cluster is named majority voting color. Seams Causes 216
  213. 213. Majority Voting (Color Seams) The triangle vertices have different results of the majority voting colors, which may be caused by either mis- registration or self-occlusion. Seams Causes 217
  214. 214. Seams Causes 218 Mean L*A*B Color Mean L*A*B Color Mean L*A*B Color
  215. 215. Seams Causes 219 Raw projection mapping results Seams after occlusion test Seams after majority voting test
  216. 216. Field of View (Background Seams) One or two triangle vertices lie outside the camera’s field of view or in the subtracted background region while the rest are not. Seams Causes 220
  217. 217. Seams Causes 221 Inside Field-of-View Inside Field-of-View Outside Field-of-View
  218. 218. Seams Causes 222 Raw projection mapping results Seams after field-of-view test
  219. 219. Seams Causes 223
  220. 220. 224 of the million-level triangles are seams (for each view) Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
  221. 221. For a static frame, how can we get rid of the annoying seams at interactive frame rates?
  222. 222. How can we spatially smooth the texture (weight) field near the seams so that we cannot see visible seams in the results?
  223. 223. Workflow Identify and diffuse the seams 227
  224. 224. Geodesics For diffusing the seams 228 Geodesic is the shortest route between two points on the surface.
  225. 225. Geodesics For diffusing the seams 229 On triangle meshes, this is challenging because of the computation of tangent directions. And shortest paths are defined on edges instead of the vertices.
  226. 226. Geodesics For diffusing the seams 230 We use the algorithm by Surazhsky and Hoppe for computing the approximate geodesics. The idea is to maintain only 2~3 shortest paths along each edge to reduce the computational cost.
  227. 227. 231
  228. 228. What are the causes for the blurring?
  229. 229. Motivation Causes for blurring 233 Texture projection errors Careful calibration + Bundle adjustment Normal-weighted blending Imprecise geometries / Noisy point clouds / Different specular highlights 𝑤𝑖 = 𝑉 ⋅ max 0, ො𝑛 ⋅ ෝ𝑣𝑖 𝛼
  230. 230. Motivation Causes for blurring 234 Texture projection errors Normal-weighted blending View-dependent rendering 𝑤𝑖 = 𝑉 ⋅ max 0, ො𝑣 ⋅ ෝ𝑣𝑖 𝛼 𝑤𝑖 = 𝑉 ⋅ max 0, ො𝑛 ⋅ ෝ𝑣𝑖 𝛼
  231. 231. 235 Visibility test Visibility% of view i User camera’s view direction Texture camera view direction Geodesics 𝑫 𝑣 𝑖 (𝑡) For a frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣, We define the Desired Texture Field:
  232. 232. 236
  233. 233. 237
  234. 234. Workflow Identify and diffuse the seams 238
  235. 235. Temporal Texture Field Temporally smooth the texture fields 239 For frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣, We define the Temporal Texture Field (exponential smoothing) 𝑻 𝑣 𝑖 𝑡 = 1 − 𝜆 𝑻 𝑣 𝑖 𝑡 − 1 + 𝜆𝑫 𝑣 𝑖 (𝑡) Texture field of the previous frame Temporal smoothing factor 1 𝐹𝑃𝑆
  236. 236. Temporal Texture Fields Transition between views 240
  237. 237. 241 Holoportaion Montage4D
  238. 238. 242 Holoportaion Montage4D
  239. 239. 243
  240. 240. 244
  241. 241. 245 Note: Results use default parameters of Floating Textures, which may not be optimal for our datasets. Still, for optical flow based approach, it would be better if seam vertices are assigned with less texture weights.
  242. 242. 246
  243. 243. With additional computation for seams, geodesics, and temporal texture fields, is our approach still in real time?
  244. 244. Exmperiment Cross-validation 249 Montage4D achieves better quality with over 90 FPS on NVIDIA GTX 1080 • Root mean square error (RMSE) ↓ • Structural similarity (SSIM) ↑ • Signal-to-noise ratio (PSNR) ↑
  245. 245. 250 of the participants does not believe the 3D reconstructed person looks real
  246. 246. Experiment Break-down of a typical frame 251 Most of the time is used in communication between CPU and GPU
  247. 247. In conclusion, Montage4D uses seam identification, geodesic fields, and temporal texture field to provides a practical texturing solution for real-time 3D reconstructions. In the future, we envision that Montage4D is useful for fusing the massive multi-view video data into VR applications like remote business meeting, remote training, and live broadcasting.
  248. 248. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Video Fields ACM Web3D’16 Montage4D ACM I3D’18 To appear in JCGT’18 Haptics • VRSurus (CHI EA’16) • HandSight (ECCVW’14,TACCESS’16, SIGACCESS’16) FoveatedRendering • Kernel Foveated Rendering (I3D’18) Learning Sketchy Scenes • Sketchy Scenes (ECCV’18) • LUCCS (Under Review, TOG’18) Visualization and Cryptography • AtmoSPHERE (CHI EA’15) • TopicFields (In Submission to VR’18) • ARCrypt(In Submission to VR’18) Social Street View Best Paper Award ACM Web3D’16 Geollery Under Review, ACM CHI’18 Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 HAPTICS IN VR, VISUAL CRYPTOGRAPHY IN AR, VISUALIZATION, ETC. 253
  249. 249. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Social Street View Best Paper Award ACM Web3D’16 Geollery Under Review, ACM CHI’18 254
  250. 250. Geollery: An Interactive Social Platform in Mixed Reality Ruofei Du, David Li, and Amitabh Varshney {ruofei, dli7319, varshney} @ umd.edu www.Geollery.com Under Review, ACM CHI Conference on Human Factors in Computing Systems, 2018. THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  251. 251. Research Question Social Street View Navigation 256 How to enable user to virtually walk in Social Street View?
  252. 252. 257 How to create a 3D world in real time where street views are not available? Challenges Registration of billboards
  253. 253. 258
  254. 254. 259
  255. 255. 260
  256. 256. Outline Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Video Fields ACM Web3D’16 261
  257. 257. 262
  258. 258. VRSurus: Enhancing Interactivity and Tangibility of Puppets in Virtual Reality Ruofei Du and Liang He University of Maryland, College Park ACM CHI Conference on Human Factors in Computing Systems, 2015. 4th place in UIST 2014 Student Innovation Contest. THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  259. 259. Demo VRSurus 264
  260. 260. Conclusion Global market constraint for VR and AR: Weak Content 265
  261. 261. Conclusion Fusing Multimedia Data Into Dynamic Virtual Environments PANORAMAS MULTIVIEW VIDEOS SOCIAL MEDIA MESHES Video Fields ACM Web3D’16 Montage4D ACM I3D’18 To appear in JCGT’18 Haptics and Gestures • VRSurus (CHI EA’16) • HandSight (ECCVW’14,TACCESS’16, SIGACCESS’16) FoveatedRendering • Kernel Foveated Rendering (I3D’18) Learning Sketchy Scenes • Sketchy Scenes (ECCV’18) • LUCCS (Under Review, TOG’18) Visualization and Cryptography • AtmoSPHERE(CHI EA’15) • TopicFields (In Submission to VR’18) • ARCrypt(In Submission to VR’18) Social Street View Best Paper Award ACM Web3D’16 Geollery Under Review, ACM CHI’18 Spherical Harmonics Saliency Best Student Poster Award ACM I3D’18 Under Review, IEEE TVCG’18 HAPTICS IN VR, VISUAL CRYPTOGRAPHY IN AR, VISUALIZATION, ETC. 266
  262. 262. Conclusion Application: Mixed-reality social platforms
  263. 263. Conclusion Application: Immersive surveillance
  264. 264. Conclusion Application: Immersive surveillance + digital city
  265. 265. Image courtesy: https://www.docwirenews.com/docwire-pick/future-of-medicine-picks/use-of-holograms-in-medical-training/ Conclusion Application: Telepresence
  266. 266. Future Directions The ultimate XR system
  267. 267. Future Directions Fusing past events
  268. 268. Future Directions With the present
  269. 269. Future Directions And look into the future
  270. 270. Future Directions Change the way we communicate in 3D and consume the information
  271. 271. Future Directions Change the way we communicate in 3D and consume the information
  272. 272. Peer-reviewed Publications 277 1. Ruofei Du, Ming Chuang, Wayne Chang, Hugues Hoppe, and Amitabh Varshney. Montage4D: Interactive Seamless Fusion of Multiview Video Textures. Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), pp. 124–133, 2018. 2. Ruofei Du, Ming Chuang, Wayne Chang, Hugues Hoppe, and Amitabh Varshney. Montage4D: Real-time Seamless Fusion and Stylization of Multiview Video Textures. Under Review. Journal of Computer Graphics Techniques, 2018. 3. Ruofei Du and Amitabh Varshney. Spherical Harmonics for Saliency Computation and Virtual Cinematography in 360° Videos. [Best Student Poster Award] ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2018. Under Review, IEEE Transactions on Visualization and Computer Graphics, 2018. 4. Ruofei Du and Amitabh Varshney. Social Street View: Blending Immersive Street Views with Geo-Tagged Social Media. [Best Paper Award] In Proceedings of the 21st International Conference on Web3D Technology, pp. 77–85, 2016. 5. Ruofei Du, David Li, and Amitabh Varshney. Geollery: Designing an Interactive Mirrored World with Geo-tagged Social Media. Under Review, ACM CHI Conference on Human Factors in Computing Systems, 2018. 6. Xiaoxu Meng, Ruofei Du, Matthias Zwicker, and Amitabh Varshney. Kernel Foveated Rendering. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(5), pp. 1–20, 2018. 7. Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, and Hao Zhang. SketchyScene: Richly-Annotated Scene Sketches. In European Conference of Computer Vision (ECCV), pp. 421–436, 2018.
  273. 273. Peer-reviewed Publications 278 8. Ruofei Du, Sujal Bista, and Amitabh Varshney. Video Fields: Fusing Multiple Surveillance Videos into a Dynamic Virtual Environment. Proceedings of the 21st International Conference on Web3D Technology, pp. 165–172, 2016. 9. Ruofei Du and Liang He. VRSurus: Enhancing Interactivity and Tangibility of Puppets in Virtual Reality. Proceedings of the 34th Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems (CHI), pp. 2454--2461, 2016. 10. Ruofei Du, Kent Wills, Max Potasznik, and Jon Froehlich. AtmoSPHERE: Representing Space and Movement Using Sand Traces in an Interactive Zen Garden Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1627–1632, 2016. 11. Changqing Zou, Haoran Mo, Ruofei Du, Xing Wu, Chengying Gao, and Hongbo Fu. LUCCS: Language-based User- customized Colorization of Scene Sketches. arXiv preprint, arXiv:1808.10544, Under Review. ACM Transaction on Graphics, 2018. 12. Lee Stearns, Ruofei Du, Uran Oh, Catherine Jou, Leah Findlater, David A. Ross, Rama Chellappa, and Jon E. Froehlich. Evaluating Haptic and Auditory Directional Guidance to Assist Blind People in Reading Printed Text Using Finger-Mounted Cameras ACM Transactions on Accessible Computing (TACCESS), 9(1):1, 2014. 13. Lee Stearns, Ruofei Du, Uran Oh, Yumeng Wang, Leah Findlater, Rama Chellappa, and Jon E. Froehlich. The Design and Preliminary Evaluation of a Finger-Mounted Camera and Feedback System to Enable Reading of Printed Text for the Blind European Conference on Computer Vision (ECCV), Workshop on Assistive Computer Vision and Robotics, pp. 615–631, 2014. 14. Leah Findlater, Lee Stearns, Ruofei Du, Uran Oh, David Ross, Rama Chellappa, and Jon E. Froehlich. Supporting Everyday Activities for Persons with Visual Impairments Through Computer Vision-Augmented Touch Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS 2015), pp. 383–384, 2015.
  274. 274. Acknowledgement NSF | Nvidia | MPower | UMIACS 279
  275. 275. Acknowledgement Collaborators, advisors, labmates, and committee members 280 Ben Cutler bcutler@microsoft.com Spencer Fowers sfowers@microsoft.com Wayne Chang wechang@microsoft.com Sameh Khamis sameh.khamis@gmail.com Shahram Izadi shahram.izadi.uk@gmail.com Sujal Bista Sujal@cs.umd.edu Amitabh Varshney varshney@cs.umd.edu Eric Krokos ekrokos@cs.umd.edu Hugues Hoppe hhoppe@gmail.com Ming Chuang mingchuang82@gmail.com Matthias Zwicker zwicker@cs.umd.edu Furong Huang furongh@cs.umd.edu Committee Augmentarium / Family MSR Hsueh-ChienCheng hccheng@cs.umd.edu Mentors / Collaborators Sai Yuan syuan@umd.edu Xuetong Sun xtsunl@cs.umd.edu Xiaoxu Meng xmeng525@umiacs.umd.edu Jon Froehlich jonf@cs.washington.edu Leah Findlater leahkf@uw.edu Sida Li sidali@umiacs.umd.edu Eric Lee ericlee@umiacs.umd.edu LiangHe edigahe@gmail.com Jesph F. JaJa josephj@umd.edu David Li ericlee@umiacs.umd.edu Gregory Kramida algomorph@gmail.com Jonathan Heagerty ericlee@umiacs.umd.edu Shuo Li li3shuo1@gmail.com Alexander Rowden alrowden@teprmail.umd.edu
  276. 276. Thank you Ruofei Du ruofei@cs.umd.edu Committee: Dr. Varshney, Dr. Zwicker, Dr. Huang, Dr. JaJa, and Dr. Chuang THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  277. 277. Fusing Multimedia Data Into Dynamic Virtual Environments Ruofei Du ruofei@cs.umd.edu Committee: Dr. Varshney, Dr. Zwicker, Dr. Huang, Dr. JaJa, and Dr. Chuang THE AUGMENTARIUM VIRTUAL AND AUGMENTED REALITY LABORATORY AT THE UNIVERSITY OF MARYLAND COMPUTER SCIENCE UNIVERSITY OF MARYLANDUMIACSGVIL
  278. 278. 283
  279. 279. 284
  280. 280. 285
  281. 281. 286
  282. 282. 287
  283. 283. 288
  284. 284. 289
  285. 285. 290
  286. 286. Thank you With a Starry Night Stylization Ruofei Du ruofei@cs.umd.edu Amitabh Varshney varshney@cs.umd.edu Wayne Chang wechang@microsoft.com Ming Chuang mingchuang82@gmail.com Hugues Hoppe hhoppe@gmail.com • www.montage4d.com • www.duruofei.com • shadertoy.com/user/starea • github.com/ruofeidu • I am graduating December 2018.
  287. 287. Introduction Fusion4D and Holoportation image courtesy: Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
  288. 288. Limitations Holoportation image courtesy: afterpsychotherapy.com 1.6 Gbps per second
  289. 289. Introduction Mobile Holoportation image courtesy: Wayne Chang and Spencer Fowers
  290. 290. Introduction Mobile Holoportation image courtesy: Jeff Kramer 30-50 Mbps
  291. 291. 298
  292. 292. Geollery: An Interactive Social Platform in Mixed Reality Ruofei Du, David Li, and Amitabh Varshney The Augmentarium| UMIACS | University of Maryland, College Park {ruofei, dli7319, varshney} @ umd.edu www.Geollery.com

×