Be the first to like this
Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.