This document summarizes an unsupervised learning method for extracting object landmarks from images without manual annotations. The method uses a generator network that takes an input image and learns to generate a target image conditioned on extracted landmark heatmaps. It employs a perceptual loss between the generated and target images to train the landmark detection network in an unsupervised manner. The trained model is shown to learn semantically meaningful landmarks for faces, human bodies, and 3D objects from different datasets in an unsupervised way, demonstrating the ability to disentangle object appearance from geometry.