Dhruva - Deploying models at scale.pptx

Dhruva
Deep-Dive session
A Standard Interface
to Deploy and Collaborate
on Open language AI
Presented by: Gokul NC,
Project Officer from AI4Bharat

Overview
Why Dhruva?
What is
Dhruva
01 02 03
The current
deployment of
open AI models
has many limitations
Dhruva as a
standardization to
deploy/collaborate on
open AI models
Adding support for
different AI tasks and
model types
Model Registry

Overview
Tech Stack
04 05 06
Performance
Live demo of
available AI services
and other features
Dhruva Architecture,
design and
implementation
Models optimization
and performance
evaluation
Frontend Demo

What does it take to do open AI
Open Data
Open data to train &
benchmark models
OPEN MODEls
Open foundational,
domain models
deployment
Efficient & scalable
deployed models
finetuning
Improving models
with deployment data

What does it take to do open AI
Open Data
Open data to train &
benchmark models
OPEN MODEls
Open foundational,
domain models
deployment
Efficient & scalable
deployed models
finetuning
Improving models
with deployment data
Work-in-progress
Bhashini project has
created a head-start
for open data across
languages, tasks,
and domains
Work-in-progress
AI4Bharat/academic
groups have trained
open models across
languages, tasks,
and domains
To-be-done

Currently Models are
Deployed in Isolation
Developer overhead: Each infra layer requires
custom optimizations for efficiency at both
hardware & cloud infrastructure levels
Infra
Open
model
Use
case
Infra
Open
model
Use
case
Infra
Open
model
Use
case
Challenges:
Cost overhead: Deployments across multiple
isolated instances do not allow for efficiency
of scale in fully utilizing hardware
Accuracy overhead: Isolated instances for
each use-case makes it harder to share data
for fine-tuning modes

What is Dhruva?
02
Open-source platform for deploying language-AI services at scale

The Dhruva Standard
Use
case
Open
optimized
model
directory
Use
case
Use
case
Dhruva is an open standard to manage the
lifecycle of open AI models with the following
Infra 1
Dhruva Standard
Recipes for optimizing open models for
efficient deployment on target hardware
Standardized APIs for managing a model
repository, deploying models, autoscaling
deployment, monitoring deployment, providing
metered access to deployment, logging model
data, …
Graphical and command-line interfaces for the
above APIs, including front-end for viewing
and correcting model logs
Infra 2

Dhruva Features
● Deploy any language AI model
○ (that is packaged in Dhruva standard)
● ULCA standard out-of-the-box
● Support for real-time streaming via socket
● Support for task-pipelines (like speech-to-speech)
● API keys based access for users
● Monitoring and metering the usage precisely
● Auto-scalable deployments
…much more

Supported Tasks
As of now, we support:
● Speech-to-Text
🗣️🎙️→📃
○ called Automatic Speech Recognition (ASR)
● Translation
India → भारत
○ called Neural Machine Translation (NMT)
● Text-To-Speech (TTS)
📝→️🔊
○ called Voice Synthesis
● Transliteration

Neural
Machine
Translation
Text to Speech
Automatic
Speech
Recognition
Bhashaverse
Request
(ULCA format)
Response
(ULCA format)
Pipeline: Speech-to-Speech Translation

Currently deployed models
Models from AI4Bharat
Task-type Model name Indian Languages supported
ASR Indic-Conformer 12
Translation IndicTrans2 22
TTS Indic-FastTTS 14
Transliteration Indic-Xlit 21

Currently deployed models
Models from other open-source researchers
Task-type Model Owner Model name Languages supported
ASR OpenAI Whisper English
Translation IIT-Bombay v1 English, Hindi, Marathi

On-boarding a model: Stage-1
1. Open-source the models
a. These models will be used later for Dhruva deployment as well
2. Submit the models to Bhashini-ULCA registry with all details
a. This will generate an unique Model-ID
3. Connect the ULCA-registered model information to Dhruva
a. ..under your organization name and unique model-name

On-boarding a model: Stage-2
1. Optimize the models (optional)
a. This enables faster inference
2. Package it for deployment
a. ..as per Dhruva standard format
3. Deploy it on infrastructure with authentication
4. Register the endpoint as an inference service on Dhruva
a. with reference to the Model-ID from stage-1

Walkthrough features
● Inference services and pipelines
○ Demo, documentation and feedback
○ Monitoring recent usage and usage tracking
● Generating API keys
○ via Dhruva
○ via ULCA
● Sample code usage

Dhruva Layers
Jugalbandi
User Interface / CLI Bhasaverse Anuvaad
Infrastructure Layer (e.g. Azure ML)
ULCA Standardization Layer
Model Layer - Triton
Inference Management Layer
External Applications
Dhruva Protocol
Dhruva Frontend
Dhruva Backend

Dhruva Tech Architecture
Main Server
DB
Queue
Job processing workers
Metrics Logging
Data dumps
Frontend
Dashboards
Model deployments
Cache
Async tasks
Marketplace
Inference Calls

Dhruva Tech Stack Implementation
Component Technology Deployment
Main server FastAPI & Uvicorn Azure AppService
Database MongoDB Azure Cosmos
Cache Redis Azure Cache for Redis
Frontend NextJS & ChakraUI Azure CDN Pages
Model deployments Triton server Azure Machine Learning
Data dumps Object Storage Azure Blobs
Queue & Job workers RabbitMQ & Celery Azure Virtual Machine
Metrics and dashboards Prometheus & Grafana Azure Virtual Machine
Marketplace ULCA

Models Optimization
● Multilingual models
○ to reduce the number of deployments compared to monolingual
● Efficient inference framework
○ instead of using the same framework from training
● Batching
○ Processing multiple inputs at a time
● Dynamic Batching
○ Batching the inputs from different requests
● Multiple replicas of models per GPU
○ to make full usage of each machine
● Production-grade inference server
○ instead of writing our own server
● Other graph optimizations: Quantization, Kernel Fusion, …

AI4Bharat Models Case-study
● Inference Engine: Nvidia Triton Server
○ Fully open-source
○ Best performing, especially on Nvidia GPUs
● Example optimized models:
Model Training Framework Inference Framework
AI4Bharat ASR NeMo Nvidia TensorRT
AI4Bharat IndicTrans2 Fairseq cTranslate2
AI4Bharat TTS Coqui ONNX

Cost and Performance
AI4Bharat Model Requests per sec Processing Power Raw Cost
Indic-ASR 40 1hr in 13secs ₹ 0.18
IndicTrans2 60 1L chars in 30secs ₹ 0.41
Indic-TTS 11 1hr audio in 50secs ₹ 0.67
(On a single machine, with Nvidia T4 GPU)

Dhruva - Deploying models at scale.pptx

Recommended

Recommended

More Related Content

Similar to Dhruva - Deploying models at scale.pptx

Similar to Dhruva - Deploying models at scale.pptx (20)

Recently uploaded

Recently uploaded (20)

Dhruva - Deploying models at scale.pptx

Editor's Notes