The document presents Visual Prompt Tuning (VPT), a parameter-efficient method for adapting large vision transformer models to various downstream tasks by introducing task-specific learnable prompts while keeping the pre-trained backbone fixed. The proposed approach is shown to outperform traditional fine-tuning methods, including full fine-tuning, while significantly reducing storage requirements. It discusses various adaptation strategies, including side tuning, bias tuning, and the use of lightweight adapter modules.