Unsupervised Learning of Object Landmarks through Conditional Image Generation

Unsupervised Learning of Object Landmarks
through Conditional Image Generation
Tomas Jakab1∗ Ankush Gupta1∗ Hakan Bilen2 Andrea Vedaldi1
1 Visual Geometry Group, University of Oxford
2 School of Informatics, University of Edinburgh
Advances in Neural Information Processing Systems (NeurIPS) 2018
Bingwen hu
2019-01-20

Goal
Learn semantically meaningful landmarks without any manual annotations.
It automatically learns from images or videos and works across different datasets of faces, humans,
and 3D objects.
Why to learn landmarks?
Low dimensional object representation
Interpretable
Why unsupervised?
Reduce dependency on expensive manual annotations
Leverage vast amount of videos available online

Architecture
Source image
Target image
appearance
encoding
unsupervised keypoint extraction
image
reconstruction
heatmap for each keypoint

(1) Heatmaps bottleneck
Then, each heatmap is replaced with Gaussian-like function centred at u*k with
a small fixed standard deviation

it provides a differentiable and distributed representation of the location of
landmarks.
 it restricts the information from the target image to spatial locations only

(2) Generator network using a perceptual loss
Where Γ(x) is an off-the-shelf pre-trained neural network, for
example VGG-19. Γl denotes the output of the l-th sub-network
 The perceptual loss compares a set of the activations extracted from multiple
layers of a deep network for both the reference and the generated images,
instead of the only raw pixel values.

Model details
• Landmark detection network: ingests the image x' to produce K
landmark heatmaps y'
It is composed of sequential blocks consisting of two convolutional.
The spatial size of the final output, outputting the heatmaps, is set to 16×16.
These K feature channels are then used to render 16×16×K 2D-Gaussian
maps y' (with σ = 0:1)
• Image generation network: input the image x and the landmarks
y' = Φ(x'), reconstructe x'
First, the image x is encoded as a feature tensor Z
Next, the features z and the landmarks y' are stacked to gether and fed to a
regressor that reconstructs the target frame x'.

Experiments——Learning facial landmarks

Experiments——Learning human body landmarks

Experiments——Learning 3D object landmarks

Experiments——Disentangling appearance and geometry

Unsupervised Learning of Object Landmarks through Conditional Image Generation

Unsupervised Learning of Object Landmarks through Conditional Image Generation

More Related Content

What's hot

Similar to Unsupervised Learning of Object Landmarks through Conditional Image Generation

More from 哲东 郑

Recently uploaded

Unsupervised Learning of Object Landmarks through Conditional Image Generation

More from 哲东郑