Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
1. 1
Neural Inverse Rendering for
General Reflectance Photometric Stereo
Short oral presentation
ICML 2018
July 11, 2018
Tatsunori Taniai
RIKEN AIP
Takanori Maehara
RIKEN AIP
ICML 2018 Paper
2. 2
Photometric stereo: shape from varying shading [Woodham, 80]
Scene observations
under varying illuminations
3D surface normals
(surface orientations)
PS is an essential technique
for highly detailed 3D shape
recovery in combination
with multiview stereo (MVS)
MVS only
[Park+ 13]
MVS + PS
3. 3
Photometric stereo: shape from varying shading [Woodham, 80]
Challenges
• Real-world objects have various complex reflectance properties (BRDFs)
→ Use of deep learning to model various BRDFs seems promising
but it is actually very inactive because…
Scene observations
under varying illuminations
3D surface normals
(surface orientations)
• Not much training data. Accurately measuring surface normals is difficult.
4. 4
ML perspective: physics-based unsupervised learning
Observed data Hidden dataEstimator
𝑿 𝒀
Synthesized data
𝑿′
𝒁
Physical generative model
𝑿 = 𝑓(𝒀, 𝒁)
• Not directly observable
or annotatable.
• No ground truth for
training data.
Use physics to bypass the issue of lacking training data.
Disentangled representation
Reconstruction loss
𝑿 − 𝑿′
𝑾
6. 6
Photometric stereo as inverse imaging process
𝒗𝒏
ℓ
Point light source
Object surface
Camera
𝐼: Image intensity (known)
ℓ: Light direction & intensity (known)
𝒗: View direction (known)
𝒏: Surface normal (unknown)
𝜌: BRDF (unknown)
𝜌
7. 7
Photometric stereo as inverse imaging process
𝒗𝒏
ℓ
Point light source
Object surface
Camera
⊙=
max(0, ℓ 𝑇 𝒏)𝐼 = ⊙ 𝜌( 𝒏, ℓ, 𝒗)
Observed pixel Shading Reflectance (BRDF)
Reflectance (rendering) equation
𝐼: Image intensity (known)
ℓ: Light direction & intensity (known)
𝒗: View direction (known)
𝒏: Surface normal (unknown)
𝜌: BRDF (unknown)
Estimate 𝒏 from intensities when changing illuminations ℓ
𝜌
× × ×
8. 8
Lest squares solution for diffuse surfaces [Woodham, 80]
𝒏
ℓ
Point light source
Object surface
𝜌0
A closed-form solution exists if 𝝆 is constant (uniform distribution)
9. 9
Lest squares solution for diffuse surfaces [Woodham, 80]
𝒏
ℓ
Point light source
Object surface
A closed-form solution exists if 𝝆 is constant (uniform distribution)
𝜌0
Lambertian diffuse model
𝐼 = 𝜌0 max(0, ℓ 𝑇 𝒏)
𝐼1 = 𝜌0ℓ1
𝑇
𝒏
𝐼2 = 𝜌0ℓ2
𝑇
𝒏
𝐼 𝑀 = 𝜌0ℓ 𝑀
𝑇
𝒏⋯
Multiple observations by varying illuminations
𝑰 = 𝑳 𝑇(𝜌0 𝒏)
Linear system for
a set of bright pixels
= 𝜌0ℓ 𝑇 𝒏 (for 𝐼 > 0)
10. 10
Our goal: general reflectance photometric stereo
Can we determine 𝒏 from intensities when
• 𝝆 is unknown and spatially-varying
• no training data with ground truth of 𝒏 and 𝝆
Multiple intensity observations
under known illumination patterns
𝐼1 = max 0, ℓ1
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ1, 𝒗)
⋯
𝐼2 = max 0, ℓ2
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ2, 𝒗)
𝐼 𝑀 = max 0, ℓ 𝑀
𝑇
𝒏 ⊙ 𝜌( 𝒏, ℓ 𝑀, 𝒗)
ℓ
𝜌
Surfaces with unknown and
spatially-varying BRDFs
12. 12
Our physics-embedded auto-encoder network (simplified)…
𝚽
𝒀𝑖𝑿𝑖
𝑵
…
…
… …
𝑰1
𝑰𝑖
𝑰 𝑀
𝒁𝑖
Photometric stereo network (PSNet)
Image reconstruction network (IRNet)
𝑀𝐶 x 𝐻 x 𝑊
3 x 𝐻 x 𝑊
𝑰𝑖
𝑀 x 𝐶 x 𝐻 x 𝑊
𝑀 x 𝐶 x 𝐻 x 𝑊
𝑀 x 𝐶 x 𝐻 x 𝑊
384 x 𝐻 x 𝑊
𝑀 x 16 x 𝐻 x 𝑊
Surface
normal map
Synthesized
images
Observed
images
𝑰2
Concat
Batch
Rendering equation
𝑵
𝑹𝑖
𝑰
Reflectance
Two-streams network to 1) produce a normal map and 2) re-render images
analyzes all observations to produce a single normal map
processes each observation individually to disentangle and reconstruct an image
13. 13
Physics-embedded auto-encoder network (full)…
𝑺𝑖
𝚽
𝒀𝑖𝑿𝑖
𝑵
𝑓ps1:
3x3 Conv
BatchNorm
ReLU
x 3
𝑓ps2:
3x3 Conv
𝐿2 Norm
𝑓ir1:
3x3 Conv
BatchNorm
ReLU
x 3 𝑓ir2:
1x1 Conv
BatchNorm
ReLU
…
…
… …
𝑰1
𝑰𝑖
𝑰 𝑀
𝒁𝑖
Photometric stereo network (PSNet)
Image reconstruction network (IRNet)
𝑀𝐶 x 𝐻 x 𝑊
3 x 𝐻 x 𝑊
𝑰𝑖
𝑀 x 𝐶 x 𝐻 x 𝑊
Compute
specular component
using 𝑵, ℓ𝑖, 𝒗
𝑀 x 𝐶 x 𝐻 x 𝑊
𝑀 x 𝐶+1 x 𝐻 x 𝑊
384 x 𝐻 x 𝑊
𝑀 x 16 x 𝐻 x 𝑊
Surface
normal map
Synthesized
images
𝑓ir3:
3x3 Conv
BatchNorm
ReLU
+ 3x3 Conv
Observed
images
𝑰2
Concat
Batch
Rendering equation
𝑵
𝑹𝑖
𝑰
14. 14
Loss function with early-stage weak supervision
Image reconstruction loss Least squares (LS) prior
𝐿 =
1
𝑀
𝑖=1
𝑀
𝑰𝑖 − 𝑰𝑖 1
+ 𝜆 𝑡 𝑵 − 𝑵′ 2
2
Minimize intensity differences btw
synthesized 𝑰𝑖 and observed 𝑰𝑖 images.
Constrain the output normals 𝑵
to be close to prior normals 𝑵′
obtained by the LS method.
Early-stage weak supervision
• LS prior 𝑵′ has low accuracy, so it is used only for an early-stage of
learning process (i.e., 𝜆 𝑡 ← 0 after some SGD iterations).
• It can stabilize learning of randomly initialized network parameters.
15. 15
Test-time learning algorithm
Input: Pairs of an image and corresponding lighting (𝑰𝑖, ℓ𝑖) of a test scene.
Output: A surface normal map 𝑵 of a test scene.
• Run PSNet to produce a normal map 𝑵.
• Run IRNet to reconstruct all input images as 𝑰𝑖 .
• Compute the loss and update the network parameters.
• Terminate the prior (𝜆 𝑡 ← 0) if iterations > 50.
Until convergence (1000 iterations)
Without any pre-training, we directly fit the network to a given test scene.
Initialize network parameters randomly.
Compute LS solution 𝑵′.
Repeat Adam’s iterations
17. 17
Benchmark on real-world scenes [Shi+ 18]
Outperformed deep learning based [Santo+ 17] and other classical methods
• Totally 10 scenes, each provides 96 images. Evaluated by mean angular errors (degrees).
• [Santo+ 17] is a supervised DNN method pre-trained on synthetic data.
Classicalphysics-based
19. 19
Convergence analysis with early-stage supervision
MeanangularerrorsLoss
Early-stage sup. No sup. All-stage sup.
Stable & accurate Unstable Inaccurate
Terminating supervision
20. 20
Convergence analysis with early-stage supervision
MeanangularerrorsLoss
Early-stage sup. No sup. All-stage sup.
Stable & accurate Unstable Inaccurate
Terminating supervision
21. 21
Summary
We demonstrated
• Physics-based unsupervised learning approach
to general BRDF photometric stereo.
• Use of physics can bypass the issue of lacking
annotated training data.
• SOTA results, outperforming a supervised
deep learning method and other classical
unsupervised methods.
Come to our poster for more details about
our network architecture and experiments.