Your SlideShare is downloading.
×

- 1. 1 Neural Inverse Rendering for General Reflectance Photometric Stereo Short oral presentation ICML 2018 July 11, 2018 Tatsunori Taniai RIKEN AIP Takanori Maehara RIKEN AIP ICML 2018 Paper
- 2. 2 Photometric stereo: shape from varying shading [Woodham, 80] Scene observations under varying illuminations 3D surface normals (surface orientations) PS is an essential technique for highly detailed 3D shape recovery in combination with multiview stereo (MVS) MVS only [Park+ 13] MVS + PS
- 3. 3 Photometric stereo: shape from varying shading [Woodham, 80] Challenges • Real-world objects have various complex reflectance properties (BRDFs) → Use of deep learning to model various BRDFs seems promising but it is actually very inactive because… Scene observations under varying illuminations 3D surface normals (surface orientations) • Not much training data. Accurately measuring surface normals is difficult.
- 4. 4 ML perspective: physics-based unsupervised learning Observed data Hidden dataEstimator 𝑿 𝒀 Synthesized data 𝑿′ 𝒁 Physical generative model 𝑿 = 𝑓(𝒀, 𝒁) • Not directly observable or annotatable. • No ground truth for training data. Use physics to bypass the issue of lacking training data. Disentangled representation Reconstruction loss 𝑿 − 𝑿′ 𝑾
- 5. 5 Talk Overview • Introduction • Basics of photometric stereo • Our approach • Experimental results
- 6. 6 Photometric stereo as inverse imaging process 𝒗𝒏 ℓ Point light source Object surface Camera 𝐼: Image intensity (known) ℓ: Light direction & intensity (known) 𝒗: View direction (known) 𝒏: Surface normal (unknown) 𝜌: BRDF (unknown) 𝜌
- 7. 7 Photometric stereo as inverse imaging process 𝒗𝒏 ℓ Point light source Object surface Camera ⊙= max(0, ℓ 𝑇 𝒏)𝐼 = ⊙ 𝜌( 𝒏, ℓ, 𝒗) Observed pixel Shading Reflectance (BRDF) Reflectance (rendering) equation 𝐼: Image intensity (known) ℓ: Light direction & intensity (known) 𝒗: View direction (known) 𝒏: Surface normal (unknown) 𝜌: BRDF (unknown) Estimate 𝒏 from intensities when changing illuminations ℓ 𝜌 × × ×
- 8. 8 Lest squares solution for diffuse surfaces [Woodham, 80] 𝒏 ℓ Point light source Object surface 𝜌0 A closed-form solution exists if 𝝆 is constant (uniform distribution)
- 9. 9 Lest squares solution for diffuse surfaces [Woodham, 80] 𝒏 ℓ Point light source Object surface A closed-form solution exists if 𝝆 is constant (uniform distribution) 𝜌0 Lambertian diffuse model 𝐼 = 𝜌0 max(0, ℓ 𝑇 𝒏) 𝐼1 = 𝜌0ℓ1 𝑇 𝒏 𝐼2 = 𝜌0ℓ2 𝑇 𝒏 𝐼 𝑀 = 𝜌0ℓ 𝑀 𝑇 𝒏⋯ Multiple observations by varying illuminations 𝑰 = 𝑳 𝑇(𝜌0 𝒏) Linear system for a set of bright pixels = 𝜌0ℓ 𝑇 𝒏 (for 𝐼 > 0)
- 10. 10 Our goal: general reflectance photometric stereo Can we determine 𝒏 from intensities when • 𝝆 is unknown and spatially-varying • no training data with ground truth of 𝒏 and 𝝆 Multiple intensity observations under known illumination patterns 𝐼1 = max 0, ℓ1 𝑇 𝒏 ⊙ 𝜌( 𝒏, ℓ1, 𝒗) ⋯ 𝐼2 = max 0, ℓ2 𝑇 𝒏 ⊙ 𝜌( 𝒏, ℓ2, 𝒗) 𝐼 𝑀 = max 0, ℓ 𝑀 𝑇 𝒏 ⊙ 𝜌( 𝒏, ℓ 𝑀, 𝒗) ℓ 𝜌 Surfaces with unknown and spatially-varying BRDFs
- 11. 11 Talk Overview • Introduction • Basics of photometric stereo • Our approach – Physics-embedded auto-encoder network – Reconstruction loss – Test-time learning algorithm • Experimental results
- 12. 12 Our physics-embedded auto-encoder network (simplified)… 𝚽 𝒀𝑖𝑿𝑖 𝑵 … … … … 𝑰1 𝑰𝑖 𝑰 𝑀 𝒁𝑖 Photometric stereo network (PSNet) Image reconstruction network (IRNet) 𝑀𝐶 x 𝐻 x 𝑊 3 x 𝐻 x 𝑊 𝑰𝑖 𝑀 x 𝐶 x 𝐻 x 𝑊 𝑀 x 𝐶 x 𝐻 x 𝑊 𝑀 x 𝐶 x 𝐻 x 𝑊 384 x 𝐻 x 𝑊 𝑀 x 16 x 𝐻 x 𝑊 Surface normal map Synthesized images Observed images 𝑰2 Concat Batch Rendering equation 𝑵 𝑹𝑖 𝑰 Reflectance Two-streams network to 1) produce a normal map and 2) re-render images analyzes all observations to produce a single normal map processes each observation individually to disentangle and reconstruct an image
- 13. 13 Physics-embedded auto-encoder network (full)… 𝑺𝑖 𝚽 𝒀𝑖𝑿𝑖 𝑵 𝑓ps1: 3x3 Conv BatchNorm ReLU x 3 𝑓ps2: 3x3 Conv 𝐿2 Norm 𝑓ir1: 3x3 Conv BatchNorm ReLU x 3 𝑓ir2: 1x1 Conv BatchNorm ReLU … … … … 𝑰1 𝑰𝑖 𝑰 𝑀 𝒁𝑖 Photometric stereo network (PSNet) Image reconstruction network (IRNet) 𝑀𝐶 x 𝐻 x 𝑊 3 x 𝐻 x 𝑊 𝑰𝑖 𝑀 x 𝐶 x 𝐻 x 𝑊 Compute specular component using 𝑵, ℓ𝑖, 𝒗 𝑀 x 𝐶 x 𝐻 x 𝑊 𝑀 x 𝐶+1 x 𝐻 x 𝑊 384 x 𝐻 x 𝑊 𝑀 x 16 x 𝐻 x 𝑊 Surface normal map Synthesized images 𝑓ir3: 3x3 Conv BatchNorm ReLU + 3x3 Conv Observed images 𝑰2 Concat Batch Rendering equation 𝑵 𝑹𝑖 𝑰
- 14. 14 Loss function with early-stage weak supervision Image reconstruction loss Least squares (LS) prior 𝐿 = 1 𝑀 𝑖=1 𝑀 𝑰𝑖 − 𝑰𝑖 1 + 𝜆 𝑡 𝑵 − 𝑵′ 2 2 Minimize intensity differences btw synthesized 𝑰𝑖 and observed 𝑰𝑖 images. Constrain the output normals 𝑵 to be close to prior normals 𝑵′ obtained by the LS method. Early-stage weak supervision • LS prior 𝑵′ has low accuracy, so it is used only for an early-stage of learning process (i.e., 𝜆 𝑡 ← 0 after some SGD iterations). • It can stabilize learning of randomly initialized network parameters.
- 15. 15 Test-time learning algorithm Input: Pairs of an image and corresponding lighting (𝑰𝑖, ℓ𝑖) of a test scene. Output: A surface normal map 𝑵 of a test scene. • Run PSNet to produce a normal map 𝑵. • Run IRNet to reconstruct all input images as 𝑰𝑖 . • Compute the loss and update the network parameters. • Terminate the prior (𝜆 𝑡 ← 0) if iterations > 50. Until convergence (1000 iterations) Without any pre-training, we directly fit the network to a given test scene. Initialize network parameters randomly. Compute LS solution 𝑵′. Repeat Adam’s iterations
- 16. 16 Talk Overview • Introduction • Basics of photometric stereo • Our approach • Experimental results
- 17. 17 Benchmark on real-world scenes [Shi+ 18] Outperformed deep learning based [Santo+ 17] and other classical methods • Totally 10 scenes, each provides 96 images. Evaluated by mean angular errors (degrees). • [Santo+ 17] is a supervised DNN method pre-trained on synthetic data. Classicalphysics-based
- 18. 18 Visual comparison
- 19. 19 Convergence analysis with early-stage supervision MeanangularerrorsLoss Early-stage sup. No sup. All-stage sup. Stable & accurate Unstable Inaccurate Terminating supervision
- 20. 20 Convergence analysis with early-stage supervision MeanangularerrorsLoss Early-stage sup. No sup. All-stage sup. Stable & accurate Unstable Inaccurate Terminating supervision
- 21. 21 Summary We demonstrated • Physics-based unsupervised learning approach to general BRDF photometric stereo. • Use of physics can bypass the issue of lacking annotated training data. • SOTA results, outperforming a supervised deep learning method and other classical unsupervised methods. Come to our poster for more details about our network architecture and experiments.