The document presents ConvNeXt, a pure convolutional neural network architecture that achieves performance on par with state-of-the-art vision transformers. The authors modernized ResNet architectures with techniques from vision transformers like increased training epochs, data augmentation, and larger kernel sizes. This resulted in ConvNeXt surpassing Swin transformers on various tasks like image classification, object detection, and segmentation while retaining the simplicity and efficiency of convolutional networks. The study challenges beliefs that vision transformers are inherently more accurate and efficient than convolutional networks.