Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

1
Neural Inverse Rendering for
General Reflectance Photometric Stereo
Short oral presentation
ICML 2018
July 11, 2018
Tatsunori Taniai
RIKEN AIP
Takanori Maehara
RIKEN AIP
ICML 2018 Paper

2
Photometric stereo: shape from varying shading [Woodham, 80]
Scene observations
under varying illuminations
3D surface normals
(surface orientations)
PS is an essential technique
for highly detailed 3D shape
recovery in combination
with multiview stereo (MVS)
MVS only
[Park+ 13]
MVS + PS

3
Photometric stereo: shape from varying shading [Woodham, 80]
Challenges
• Real-world objects have various complex reflectance properties (BRDFs)
→ Use of deep learning to model various BRDFs seems promising
but it is actually very inactive because…
Scene observations
under varying illuminations
3D surface normals
(surface orientations)
• Not much training data. Accurately measuring surface normals is difficult.

4
ML perspective: physics-based unsupervised learning
Observed data Hidden dataEstimator
𝑿 𝒀
Synthesized data
𝑿′
𝒁
Physical generative model
𝑿 = 𝑓(𝒀, 𝒁)
• Not directly observable
or annotatable.
• No ground truth for
training data.
Use physics to bypass the issue of lacking training data.
Disentangled representation
Reconstruction loss
𝑿 − 𝑿′
𝑾

5
Talk Overview
• Introduction
• Basics of photometric stereo
• Our approach
• Experimental results

6
Photometric stereo as inverse imaging process
𝒗𝒏
ℓ
Point light source
Object surface
Camera
𝐼: Image intensity (known)
ℓ: Light direction & intensity (known)
𝒗: View direction (known)
𝒏: Surface normal (unknown)
𝜌: BRDF (unknown)
𝜌

7
Photometric stereo as inverse imaging process
𝒗𝒏
ℓ
Point light source
Object surface
Camera
⊙=
max(0, ℓ 𝑇 𝒏)𝐼 = ⊙ 𝜌( 𝒏, ℓ, 𝒗)
Observed pixel Shading Reflectance (BRDF)
Reflectance (rendering) equation
𝐼: Image intensity (known)
ℓ: Light direction & intensity (known)
𝒗: View direction (known)
𝒏: Surface normal (unknown)
𝜌: BRDF (unknown)
Estimate 𝒏 from intensities when changing illuminations ℓ
𝜌
× × ×

8
Lest squares solution for diffuse surfaces [Woodham, 80]
𝒏
ℓ
Point light source
Object surface
𝜌0
A closed-form solution exists if 𝝆 is constant (uniform distribution)

9
Lest squares solution for diffuse surfaces [Woodham, 80]
𝒏
ℓ
Point light source
Object surface
A closed-form solution exists if 𝝆 is constant (uniform distribution)
𝜌0
Lambertian diffuse model
𝐼 = 𝜌0 max(0, ℓ 𝑇 𝒏)
𝐼1 = 𝜌0ℓ1
𝑇
𝒏
𝐼2 = 𝜌0ℓ2
𝑇
𝒏
𝐼 𝑀 = 𝜌0ℓ 𝑀
𝑇
𝒏⋯
Multiple observations by varying illuminations
𝑰 = 𝑳 𝑇(𝜌0 𝒏)
Linear system for
a set of bright pixels
= 𝜌0ℓ 𝑇 𝒏 (for 𝐼 > 0)

10
Our goal: general reflectance photometric stereo
Can we determine 𝒏 from intensities when
• 𝝆 is unknown and spatially-varying
• no training data with ground truth of 𝒏 and 𝝆
Multiple intensity observations
under known illumination patterns
𝐼1 = max 0, ℓ1
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ1, 𝒗)
⋯
𝐼2 = max 0, ℓ2
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ2, 𝒗)
𝐼 𝑀 = max 0, ℓ 𝑀
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ 𝑀, 𝒗)
ℓ
𝜌
Surfaces with unknown and
spatially-varying BRDFs

11
Talk Overview
• Introduction
• Our approach
– Physics-embedded auto-encoder network
– Reconstruction loss
– Test-time learning algorithm

12
Our physics-embedded auto-encoder network (simplified)…
𝚽
𝒀𝑖𝑿𝑖
𝑵
…
…
… …
𝑰1
𝑰𝑖
𝑰 𝑀
𝒁𝑖
Photometric stereo network (PSNet)
Image reconstruction network (IRNet)
𝑀𝐶 x 𝐻 x 𝑊
3 x 𝐻 x 𝑊
𝑰𝑖
𝑀 x 𝐶 x 𝐻 x 𝑊
384 x 𝐻 x 𝑊
𝑀 x 16 x 𝐻 x 𝑊
Surface
normal map
Synthesized
images
Observed
images
𝑰2
Concat
Batch
Rendering equation
𝑵
𝑹𝑖
𝑰
Reflectance
Two-streams network to 1) produce a normal map and 2) re-render images
analyzes all observations to produce a single normal map
processes each observation individually to disentangle and reconstruct an image

13
Physics-embedded auto-encoder network (full)…
𝑺𝑖
𝚽
𝒀𝑖𝑿𝑖
𝑵
𝑓ps1:
3x3 Conv
BatchNorm
ReLU
x 3
𝑓ps2:
3x3 Conv
𝐿2 Norm
𝑓ir1:
3x3 Conv
BatchNorm
ReLU
x 3 𝑓ir2:
1x1 Conv
BatchNorm
ReLU
…
…
… …
𝑰1
𝑰𝑖
𝑰 𝑀
𝒁𝑖
Photometric stereo network (PSNet)
Image reconstruction network (IRNet)
𝑀𝐶 x 𝐻 x 𝑊
3 x 𝐻 x 𝑊
𝑰𝑖
Compute
specular component
using 𝑵, ℓ𝑖, 𝒗
𝑀 x 𝐶+1 x 𝐻 x 𝑊
384 x 𝐻 x 𝑊
𝑀 x 16 x 𝐻 x 𝑊
Surface
normal map
Synthesized
images
𝑓ir3:
3x3 Conv
BatchNorm
ReLU
+ 3x3 Conv
Observed
images
𝑰2
Concat
Batch
Rendering equation
𝑵
𝑹𝑖
𝑰

14
Loss function with early-stage weak supervision
Image reconstruction loss Least squares (LS) prior
𝐿 =
1
𝑀
𝑖=1
𝑀
𝑰𝑖 − 𝑰𝑖 1
+ 𝜆 𝑡 𝑵 − 𝑵′ 2
2
Minimize intensity differences btw
synthesized 𝑰𝑖 and observed 𝑰𝑖 images.
Constrain the output normals 𝑵
to be close to prior normals 𝑵′
obtained by the LS method.
Early-stage weak supervision
• LS prior 𝑵′ has low accuracy, so it is used only for an early-stage of
learning process (i.e., 𝜆 𝑡 ← 0 after some SGD iterations).
• It can stabilize learning of randomly initialized network parameters.

15
Test-time learning algorithm
Input: Pairs of an image and corresponding lighting (𝑰𝑖, ℓ𝑖) of a test scene.
Output: A surface normal map 𝑵 of a test scene.
• Run PSNet to produce a normal map 𝑵.
• Run IRNet to reconstruct all input images as 𝑰𝑖 .
• Compute the loss and update the network parameters.
• Terminate the prior (𝜆 𝑡 ← 0) if iterations > 50.
Until convergence (1000 iterations)
Without any pre-training, we directly fit the network to a given test scene.
Initialize network parameters randomly.
Compute LS solution 𝑵′.
Repeat Adam’s iterations

16
Talk Overview
• Introduction
• Our approach

17
Benchmark on real-world scenes [Shi+ 18]
Outperformed deep learning based [Santo+ 17] and other classical methods
• Totally 10 scenes, each provides 96 images. Evaluated by mean angular errors (degrees).
• [Santo+ 17] is a supervised DNN method pre-trained on synthetic data.
Classicalphysics-based

19
Convergence analysis with early-stage supervision
MeanangularerrorsLoss
Early-stage sup. No sup. All-stage sup.
 Stable & accurate  Unstable  Inaccurate
Terminating supervision

20
Convergence analysis with early-stage supervision
MeanangularerrorsLoss
Early-stage sup. No sup. All-stage sup.
 Stable & accurate  Unstable  Inaccurate
Terminating supervision

21
Summary
We demonstrated
• Physics-based unsupervised learning approach
to general BRDF photometric stereo.
• Use of physics can bypass the issue of lacking
annotated training data.
• SOTA results, outperforming a supervised
deep learning method and other classical
unsupervised methods.
Come to our poster for more details about
our network architecture and experiments.

Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

Similar to Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018) (20)

Recently uploaded

Recently uploaded (20)

Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)

Editor's Notes