2. Goal
Problem:
Model overly focuses on discriminative parts
rather than the entire object area
Solution:
Retrieving video and generating segmentation
labels from the retrieving video without human
intervention to simulate string supervised for
semantic segmentation
3. Video
Estimate shape and extent of object in
video by motion
Generating segmentation labels
9. Section 3.1
Remove full-connected layer
Place a new convolutional layer after the last
convolutional layer of VGG-16
For better ada-ptation to our task
Global average pooling followed by a fully-
connected layer at bottom