For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/08/customizing-vision-language-models-for-real-world-applications-a-presentation-from-nvidia/
Monika Jhuria, Technical Marketing Engineer at NVIDIA, presents the “Customizing Vision-language Models for Real-world Applications” tutorial at the May 2025 Embedded Vision Summit.
Vision-language models (VLMs) have the potential to revolutionize various applications, and their performance can be improved through fine-tuning and customization. In this presentation, Jhuria explores the concept and shares insights on domain adaptation for VLMs. She discusses the factors to consider when fine-tuning a VLM, including dataset requirements and the resources available to developers.
Jhuria explores two key approaches for customization: VLM fine-tuning, encompassing memory-efficient fine-tuning methods such as low-rank adaptation (LoRA) and full fine-tuning, and retrieval-augmented generation (RAG) for enhanced adaptability. Finally, she discusses metrics for validating the performance of VLMs and best practices for testing domain-adapted VLMs in real-world applications. You will gain a practical understanding of VLM fine-tuning and customization and will be equipped to make informed decisions about how to unlock the full potential of these models in your own projects.