The document explores contrastive self-supervised learning, discussing its methodologies that reduce human annotation costs while promoting general representation learning. It highlights the effectiveness of various frameworks like MoCo and SimCLR, emphasizing their capabilities in distinguishing features among instances and the importance of both positive and negative samples. Additionally, the results demonstrate significant improvements in video tasks through the proposed inter-intra contrastive learning framework.