Human Body Extraction from Images by Gül Varol, PhD Student @INRIA (Willow Team)

Human Body Analysis from
Visual Data
Gul Varol
Inria

3
Early days
O’Rourke 1980Hogg 1983Marr 1978
(1) 3D geometric primitives
Cylinders Spheres

4
Early days
Lee 1985 Ronfard 2002
(2) 2D structures
Hinton 1976

5
Human Body Models Now
Figure 4: Sample registrations from the multipose dataset.
body shapes and poses are created and animated by varying ~ and
SCAPE (Anguelov 2005) SMPL (Loper 2015)

6
SMPL Body Model
posevariation
shape (identity) variation

Input
Image Classification Person Detection Person Segmentation Pose EstimationPart Segmentation
Slide credit: Riza Alp Guler
DensePose
7
Human Body Analysis: From Coarse to Fine

BodyNet: Volumetric Inference of
3D Human Body Shapes
Gul Duygu Bryan Jimei Ersin Ivan Cordelia
Varol1 Ceylan2 Russell2 Yang2 Yumer2 Laptev1 Schmid1
1Inria, 2Adobe Research
BodyNet, ECCV 2018

BodyNet, ECCV 2018
Why?
10
Virtual try-on
Pictoﬁt ToFit
Creation of digital avatars
mPort
Healthcare

BodyNet, ECCV 2018
Challenges - Difﬁcult without Scanner
• Existing methods recovering 3D body shape rely mostly on complex scanners
11
Max Planck Institute, Germany

BodyNet, ECCV 2018
Challenges - Lack of Data
• Full 3D shape data is available either in constrained settings…
12
Ionescu et al. Human3.6M: Large Scale Datasets and Predictive Methods for
3D Human Sensing in Natural Environments, TPAMI 2014

BodyNet, ECCV 2018
13
• Full 3D shape data is available either in constrained settings or synthetic
Varol et al. Learning from Synthetic Humans, CVPR 2017

BodyNet, ECCV 2018
CMU MoCap Database
• Most methods focus on predicting a skeleton
14

BodyNet, ECCV 2018
Challenges - Representation
Point cloud representationParametric representation
+ =
[1] Loper et al. SMPL: A Skinned Multi-Person Linear Model, SIGGRAPH Asia 2015
Loper et al. [1]
643 = 2621446890 x 3 = 2067010 + 72 = 82
Voxel representation
Tatarchenko et al. [2]
[2] Tatarchenko et al. Octree Generating Networks: Efﬁcient Convolutional Architectures for High-resolution 3D Outputs, ICCV 2017
15

BodyNet, ECCV 2018
z
x
y
Proposed method: BodyNet
16

BodyNet, ECCV 2018
RGB input 2D segmentation
Subnetwork 1:
z
x
y
17

BodyNet, ECCV 2018
2D segmentation & 2D pose predictions
Subnetwork 1&2:
RGB input
z
x
y
18

BodyNet, ECCV 2018
RGB input
Subnetwork 3:
3D pose prediction
z
x
y
19

BodyNet, ECCV 2018
RGB input
Subnetwork 4:
Volumetric shape prediction
z
x
y
20

BodyNet, ECCV 2018
RGB input
(Optional) Fitting:
SMPL
21

BodyNet, ECCV 2018
RGB input Output 3D body parts prediction
22

BodyNet, ECCV 2018
BodyNet is an end-to-end trainable network that beneﬁts from:
• a multi-view re-projection loss,
• a volumetric 3D loss,
• intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose.
23

BodyNet, ECCV 2018
Experiments - Datasets
• SURREAL (Varol et al. CVPR 2017)
• synthetically rendered images
• perfect ground truth
24
• Unite the People (Lassner et al. CVPR 2017)
• real images
• noisy 3D ground truth
• manual 2D segmentation annotation

Gul Javier Xavier Naureen Michael J. Ivan Cordelia
Varol1 Romero2 Martin1 Mahmood2 Black2 Laptev1 Schmid1
1Inria, 2MPI
Learning from synthetic humans
SURREAL Dataset
Synthetic hUmans foR REAL tasks
SURREAL, CVPR 2017

26
SURREAL DatasetSynthetic hUmans foR REAL tasks
SURREAL, CVPR 2017

27
SURREAL Dataset
SURREAL, CVPR 2017
available at:
www.di.ens.fr/willow/research/surreal/

BodyNet, ECCV 2018
Experiments
• Effect of additional inputs
input 2D 3D pose 3D voxels SMPL Ground
image predictions prediction prediction ﬁt truth
input
image
28

BodyNet, ECCV 2018
Experiments
• Effect of multi-view re-projection
29
original view other view
Lv Lv Lv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
p
input image original view other view
input image original view other view
Lv Lv Lv + LF V
p + LSV
pLv + LF V
p + LS
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p +Lv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
pLv + LF V
p + LSV
p
without with

BodyNet, ECCV 2018
Experiments
InputInput
Shape
parameter
regression
SMPLify++ BodyNet Ground
truth
Shape
parameter
regression
30
• Comparison with alternative methods

BodyNet, ECCV 2018 31
Potential to capture clothing
If the re-projection loss is supervised with 2D clothed segmentation,
volumetric output captures clothing.
RGB
GT
silhouette
predicted
silhouette
predicted
silhouette
(front view) (other view)
predicted voxels
(front view) (other view)
predicted voxels
trained w/ clothed segmentation trained w/ SMPL projection

BodyNet, ECCV 2018
3D Body Part Segmentation
• Initialize part segmentation network with the foreground segmentation
network weights by copying last layer as many times as the number of parts.
32

3D Body Part Segmentation

BodyNet, ECCV 2018
Results - Volumetric Shape
34

BodyNet, ECCV 2018
Results - SMPL ﬁt
35

BodyNet, ECCV 2018
Conclusions
• Volumetric human body shape representation is a ﬂexible and effective
alternative to deformable model parameters.
• Re-projection loss is critical to obtain conﬁdent body surface.
• Multi-task training of relevant tasks such as 2D/3D pose and 2D
segmentation helps 3D shape estimation.
36

Code and data available at:
www.di.ens.fr/~varol
Thanks

Human Body Extraction from Images by Gül Varol, PhD Student @INRIA (Willow Team)

Recommended

Recommended

More Related Content

Similar to Human Body Extraction from Images by Gül Varol, PhD Student @INRIA (Willow Team)

Similar to Human Body Extraction from Images by Gül Varol, PhD Student @INRIA (Willow Team) (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Human Body Extraction from Images by Gül Varol, PhD Student @INRIA (Willow Team)