R-FCN is a two-stage object detection network that addresses the translation invariance vs variance dilemma. It uses position-sensitive score maps and RoI pooling to classify objects. Position-sensitive score maps are produced from a convolutional layer and are specialized for different locations within objects. Position-sensitive RoI pooling pools only over the relevant score map for each RoI bin. Bounding box regression is also performed using position-sensitive techniques. R-FCN achieves state-of-the-art object detection performance while being faster than Faster R-CNN since it removes unnecessary RoI pooling layers.