Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

300 views

Published on

ICML2018 short oral paper.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

  1. 1. 1 Neural Inverse Rendering for General Reflectance Photometric Stereo Short oral presentation ICML 2018 July 11, 2018 Tatsunori Taniai RIKEN AIP Takanori Maehara RIKEN AIP ICML 2018 Paper
  2. 2. 2 Photometric stereo: shape from varying shading [Woodham, 80] Scene observations under varying illuminations 3D surface normals (surface orientations) PS is an essential technique for highly detailed 3D shape recovery in combination with multiview stereo (MVS) MVS only [Park+ 13] MVS + PS
  3. 3. 3 Photometric stereo: shape from varying shading [Woodham, 80] Challenges β€’ Real-world objects have various complex reflectance properties (BRDFs) β†’ Use of deep learning to model various BRDFs seems promising but it is actually very inactive because… Scene observations under varying illuminations 3D surface normals (surface orientations) β€’ Not much training data. Accurately measuring surface normals is difficult.
  4. 4. 4 ML perspective: physics-based unsupervised learning Observed data Hidden dataEstimator 𝑿 𝒀 Synthesized data 𝑿′ 𝒁 Physical generative model 𝑿 = 𝑓(𝒀, 𝒁) β€’ Not directly observable or annotatable. β€’ No ground truth for training data. Use physics to bypass the issue of lacking training data. Disentangled representation Reconstruction loss 𝑿 βˆ’ 𝑿′ 𝑾
  5. 5. 5 Talk Overview β€’ Introduction β€’ Basics of photometric stereo β€’ Our approach β€’ Experimental results
  6. 6. 6 Photometric stereo as inverse imaging process 𝒗𝒏 β„“ Point light source Object surface Camera 𝐼: Image intensity (known) β„“: Light direction & intensity (known) 𝒗: View direction (known) 𝒏: Surface normal (unknown) 𝜌: BRDF (unknown) 𝜌
  7. 7. 7 Photometric stereo as inverse imaging process 𝒗𝒏 β„“ Point light source Object surface Camera βŠ™= max(0, β„“ 𝑇 𝒏)𝐼 = βŠ™ 𝜌( 𝒏, β„“, 𝒗) Observed pixel Shading Reflectance (BRDF) Reflectance (rendering) equation 𝐼: Image intensity (known) β„“: Light direction & intensity (known) 𝒗: View direction (known) 𝒏: Surface normal (unknown) 𝜌: BRDF (unknown) Estimate 𝒏 from intensities when changing illuminations β„“ 𝜌 Γ— Γ— Γ—
  8. 8. 8 Lest squares solution for diffuse surfaces [Woodham, 80] 𝒏 β„“ Point light source Object surface 𝜌0 A closed-form solution exists if 𝝆 is constant (uniform distribution)
  9. 9. 9 Lest squares solution for diffuse surfaces [Woodham, 80] 𝒏 β„“ Point light source Object surface A closed-form solution exists if 𝝆 is constant (uniform distribution) 𝜌0 Lambertian diffuse model 𝐼 = 𝜌0 max(0, β„“ 𝑇 𝒏) 𝐼1 = 𝜌0β„“1 𝑇 𝒏 𝐼2 = 𝜌0β„“2 𝑇 𝒏 𝐼 𝑀 = 𝜌0β„“ 𝑀 𝑇 𝒏⋯ Multiple observations by varying illuminations 𝑰 = 𝑳 𝑇(𝜌0 𝒏) Linear system for a set of bright pixels = 𝜌0β„“ 𝑇 𝒏 (for 𝐼 > 0)
  10. 10. 10 Our goal: general reflectance photometric stereo Can we determine 𝒏 from intensities when β€’ 𝝆 is unknown and spatially-varying β€’ no training data with ground truth of 𝒏 and 𝝆 Multiple intensity observations under known illumination patterns 𝐼1 = max 0, β„“1 𝑇 𝒏 βŠ™ 𝜌( 𝒏, β„“1, 𝒗) β‹― 𝐼2 = max 0, β„“2 𝑇 𝒏 βŠ™ 𝜌( 𝒏, β„“2, 𝒗) 𝐼 𝑀 = max 0, β„“ 𝑀 𝑇 𝒏 βŠ™ 𝜌( 𝒏, β„“ 𝑀, 𝒗) β„“ 𝜌 Surfaces with unknown and spatially-varying BRDFs
  11. 11. 11 Talk Overview β€’ Introduction β€’ Basics of photometric stereo β€’ Our approach – Physics-embedded auto-encoder network – Reconstruction loss – Test-time learning algorithm β€’ Experimental results
  12. 12. 12 Our physics-embedded auto-encoder network (simplified)… 𝚽 𝒀𝑖𝑿𝑖 𝑡 … … … … 𝑰1 𝑰𝑖 𝑰 𝑀 𝒁𝑖 Photometric stereo network (PSNet) Image reconstruction network (IRNet) 𝑀𝐢 x 𝐻 x π‘Š 3 x 𝐻 x π‘Š 𝑰𝑖 𝑀 x 𝐢 x 𝐻 x π‘Š 𝑀 x 𝐢 x 𝐻 x π‘Š 𝑀 x 𝐢 x 𝐻 x π‘Š 384 x 𝐻 x π‘Š 𝑀 x 16 x 𝐻 x π‘Š Surface normal map Synthesized images Observed images 𝑰2 Concat Batch Rendering equation 𝑡 𝑹𝑖 𝑰 Reflectance Two-streams network to 1) produce a normal map and 2) re-render images analyzes all observations to produce a single normal map processes each observation individually to disentangle and reconstruct an image
  13. 13. 13 Physics-embedded auto-encoder network (full)… 𝑺𝑖 𝚽 𝒀𝑖𝑿𝑖 𝑡 𝑓ps1: 3x3 Conv BatchNorm ReLU x 3 𝑓ps2: 3x3 Conv 𝐿2 Norm 𝑓ir1: 3x3 Conv BatchNorm ReLU x 3 𝑓ir2: 1x1 Conv BatchNorm ReLU … … … … 𝑰1 𝑰𝑖 𝑰 𝑀 𝒁𝑖 Photometric stereo network (PSNet) Image reconstruction network (IRNet) 𝑀𝐢 x 𝐻 x π‘Š 3 x 𝐻 x π‘Š 𝑰𝑖 𝑀 x 𝐢 x 𝐻 x π‘Š Compute specular component using 𝑡, ℓ𝑖, 𝒗 𝑀 x 𝐢 x 𝐻 x π‘Š 𝑀 x 𝐢+1 x 𝐻 x π‘Š 384 x 𝐻 x π‘Š 𝑀 x 16 x 𝐻 x π‘Š Surface normal map Synthesized images 𝑓ir3: 3x3 Conv BatchNorm ReLU + 3x3 Conv Observed images 𝑰2 Concat Batch Rendering equation 𝑡 𝑹𝑖 𝑰
  14. 14. 14 Loss function with early-stage weak supervision Image reconstruction loss Least squares (LS) prior 𝐿 = 1 𝑀 𝑖=1 𝑀 𝑰𝑖 βˆ’ 𝑰𝑖 1 + πœ† 𝑑 𝑡 βˆ’ 𝑡′ 2 2 Minimize intensity differences btw synthesized 𝑰𝑖 and observed 𝑰𝑖 images. Constrain the output normals 𝑡 to be close to prior normals 𝑡′ obtained by the LS method. Early-stage weak supervision β€’ LS prior 𝑡′ has low accuracy, so it is used only for an early-stage of learning process (i.e., πœ† 𝑑 ← 0 after some SGD iterations). β€’ It can stabilize learning of randomly initialized network parameters.
  15. 15. 15 Test-time learning algorithm Input: Pairs of an image and corresponding lighting (𝑰𝑖, ℓ𝑖) of a test scene. Output: A surface normal map 𝑡 of a test scene. β€’ Run PSNet to produce a normal map 𝑡. β€’ Run IRNet to reconstruct all input images as 𝑰𝑖 . β€’ Compute the loss and update the network parameters. β€’ Terminate the prior (πœ† 𝑑 ← 0) if iterations > 50. Until convergence (1000 iterations) Without any pre-training, we directly fit the network to a given test scene. Initialize network parameters randomly. Compute LS solution 𝑡′. Repeat Adam’s iterations
  16. 16. 16 Talk Overview β€’ Introduction β€’ Basics of photometric stereo β€’ Our approach β€’ Experimental results
  17. 17. 17 Benchmark on real-world scenes [Shi+ 18] Outperformed deep learning based [Santo+ 17] and other classical methods β€’ Totally 10 scenes, each provides 96 images. Evaluated by mean angular errors (degrees). β€’ [Santo+ 17] is a supervised DNN method pre-trained on synthetic data. Classicalphysics-based
  18. 18. 18 Visual comparison
  19. 19. 19 Convergence analysis with early-stage supervision MeanangularerrorsLoss Early-stage sup. No sup. All-stage sup.  Stable & accurate  Unstable  Inaccurate Terminating supervision
  20. 20. 20 Convergence analysis with early-stage supervision MeanangularerrorsLoss Early-stage sup. No sup. All-stage sup.  Stable & accurate  Unstable  Inaccurate Terminating supervision
  21. 21. 21 Summary We demonstrated β€’ Physics-based unsupervised learning approach to general BRDF photometric stereo. β€’ Use of physics can bypass the issue of lacking annotated training data. β€’ SOTA results, outperforming a supervised deep learning method and other classical unsupervised methods. Come to our poster for more details about our network architecture and experiments.

Γ—