Abstract: Human shape estimation is an important, but challenging task for video editing, animation and fashion industry. To address this problem, we generate synthetic images of people, the SURREAL dataset, to serve as training data. Then, we propose the BodyNet framework that infers 2D body part segmentation, 2D/3D pose estimation, and volumetric shape prediction in a multi-task neural network architecture. Our results and the dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.