Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unsupervised Learning of Object Landmarks through Conditional Image Generation


Published on

Bingwen Hu

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Unsupervised Learning of Object Landmarks through Conditional Image Generation

  1. 1. Unsupervised Learning of Object Landmarks through Conditional Image Generation Tomas Jakab1∗ Ankush Gupta1∗ Hakan Bilen2 Andrea Vedaldi1 1 Visual Geometry Group, University of Oxford 2 School of Informatics, University of Edinburgh Advances in Neural Information Processing Systems (NeurIPS) 2018 Bingwen hu 2019-01-20
  2. 2. Goal Learn semantically meaningful landmarks without any manual annotations. It automatically learns from images or videos and works across different datasets of faces, humans, and 3D objects. Why to learn landmarks? Low dimensional object representation Interpretable Why unsupervised? Reduce dependency on expensive manual annotations Leverage vast amount of videos available online
  3. 3. Architecture Source image Target image appearance encoding unsupervised keypoint extraction image reconstruction heatmap for each keypoint
  4. 4. Method
  5. 5. (1) Heatmaps bottleneck Then, each heatmap is replaced with Gaussian-like function centred at u*k with a small fixed standard deviation
  6. 6. it provides a differentiable and distributed representation of the location of landmarks.  it restricts the information from the target image to spatial locations only
  7. 7. (2) Generator network using a perceptual loss Where Γ(x) is an off-the-shelf pre-trained neural network, for example VGG-19. Γl denotes the output of the l-th sub-network  The perceptual loss compares a set of the activations extracted from multiple layers of a deep network for both the reference and the generated images, instead of the only raw pixel values.
  8. 8. Model details • Landmark detection network: ingests the image x' to produce K landmark heatmaps y' It is composed of sequential blocks consisting of two convolutional. The spatial size of the final output, outputting the heatmaps, is set to 16×16. These K feature channels are then used to render 16×16×K 2D-Gaussian maps y' (with σ = 0:1) • Image generation network: input the image x and the landmarks y' = Φ(x'), reconstructe x' First, the image x is encoded as a feature tensor Z Next, the features z and the landmarks y' are stacked to gether and fed to a regressor that reconstructs the target frame x'.
  9. 9.  Experiments
  10. 10. Experiments——Learning facial landmarks
  11. 11. Experiments——Learning human body landmarks
  12. 12. Experiments——Learning 3D object landmarks
  13. 13. Experiments——Disentangling appearance and geometry