This paper presents InstructGPT, a method for fine-tuning large language models to follow a broad range of written instructions from humans. The researchers first collected a dataset of human demonstrations for various tasks and used it to train an initial supervised model. They then collected human rankings of model outputs to train a reward model and further fine-tuned the supervised model with reinforcement learning to maximize rewards. Evaluation showed the fine-tuned model was preferred by humans over GPT-3 for following instructions while maintaining performance on other tasks.