The document discusses methods for depth image keypoint detection using Mask R-CNN and other neural network architectures. Specifically, it mentions using Mask R-CNN with a ResNet extractor to perform instance segmentation and keypoint detection on depth images at 30 frames per second on a CPU. It also discusses using iDeep, a Chainer-based model, for real-time keypoint detection at 15-18 frames per second on an Intel Core i5 CPU.