SlideShare a Scribd company logo
1 of 27
Dhruva
Deep-Dive session
A Standard Interface
to Deploy and Collaborate
on Open language AI
Presented by: Gokul NC,
Project Officer from AI4Bharat
Overview
Why Dhruva?
What is
Dhruva
01 02 03
The current
deployment of
open AI models
has many limitations
Dhruva as a
standardization to
deploy/collaborate on
open AI models
Adding support for
different AI tasks and
model types
Model Registry
Overview
Tech Stack
04 05 06
Performance
Live demo of
available AI services
and other features
Dhruva Architecture,
design and
implementation
Models optimization
and performance
evaluation
Frontend Demo
Why Dhruva?
01
What does it take to do open AI
Open Data
Open data to train &
benchmark models
OPEN MODEls
Open foundational,
domain models
deployment
Efficient & scalable
deployed models
finetuning
Improving models
with deployment data
What does it take to do open AI
Open Data
Open data to train &
benchmark models
OPEN MODEls
Open foundational,
domain models
deployment
Efficient & scalable
deployed models
finetuning
Improving models
with deployment data
Work-in-progress
Bhashini project has
created a head-start
for open data across
languages, tasks,
and domains
Work-in-progress
AI4Bharat/academic
groups have trained
open models across
languages, tasks,
and domains
To-be-done
Currently Models are
Deployed in Isolation
Developer overhead: Each infra layer requires
custom optimizations for efficiency at both
hardware & cloud infrastructure levels
Infra
Open
model
Use
case
Infra
Open
model
Use
case
Infra
Open
model
Use
case
Challenges:
Cost overhead: Deployments across multiple
isolated instances do not allow for efficiency
of scale in fully utilizing hardware
Accuracy overhead: Isolated instances for
each use-case makes it harder to share data
for fine-tuning modes
What is Dhruva?
02
Open-source platform for deploying language-AI services at scale
The Dhruva Standard
Use
case
Open
optimized
model
directory
Use
case
Use
case
Dhruva is an open standard to manage the
lifecycle of open AI models with the following
Infra 1
Dhruva Standard
Recipes for optimizing open models for
efficient deployment on target hardware
Standardized APIs for managing a model
repository, deploying models, autoscaling
deployment, monitoring deployment, providing
metered access to deployment, logging model
data, …
Graphical and command-line interfaces for the
above APIs, including front-end for viewing
and correcting model logs
Infra 2
Dhruva Features
● Deploy any language AI model
○ (that is packaged in Dhruva standard)
● ULCA standard out-of-the-box
● Support for real-time streaming via socket
● Support for task-pipelines (like speech-to-speech)
● API keys based access for users
● Monitoring and metering the usage precisely
● Auto-scalable deployments
…much more
Supported Tasks
As of now, we support:
● Speech-to-Text
🗣️🎙️→📃
○ called Automatic Speech Recognition (ASR)
● Translation
India → भारत
○ called Neural Machine Translation (NMT)
● Text-To-Speech (TTS)
📝→️🔊
○ called Voice Synthesis
● Transliteration
Neural
Machine
Translation
Text to Speech
Automatic
Speech
Recognition
Bhashaverse
Request
(ULCA format)
Response
(ULCA format)
Pipeline: Speech-to-Speech Translation
Models Registry
03
Currently deployed models
Models from AI4Bharat
Task-type Model name Indian Languages supported
ASR Indic-Conformer 12
Translation IndicTrans2 22
TTS Indic-FastTTS 14
Transliteration Indic-Xlit 21
Currently deployed models
Models from other open-source researchers
Task-type Model Owner Model name Languages supported
ASR OpenAI Whisper English
Translation IIT-Bombay v1 English, Hindi, Marathi
On-boarding a model: Stage-1
1. Open-source the models
a. These models will be used later for Dhruva deployment as well
2. Submit the models to Bhashini-ULCA registry with all details
a. This will generate an unique Model-ID
3. Connect the ULCA-registered model information to Dhruva
a. ..under your organization name and unique model-name
On-boarding a model: Stage-2
1. Optimize the models (optional)
a. This enables faster inference
2. Package it for deployment
a. ..as per Dhruva standard format
3. Deploy it on infrastructure with authentication
4. Register the endpoint as an inference service on Dhruva
a. with reference to the Model-ID from stage-1
Dhruva Frontend Demo
04
Walkthrough features
● Inference services and pipelines
○ Demo, documentation and feedback
○ Monitoring recent usage and usage tracking
● Generating API keys
○ via Dhruva
○ via ULCA
● Sample code usage
Tech Stack
05
Dhruva Layers
Jugalbandi
User Interface / CLI Bhasaverse Anuvaad
Infrastructure Layer (e.g. Azure ML)
ULCA Standardization Layer
Model Layer - Triton
Inference Management Layer
External Applications
Dhruva Protocol
Dhruva Frontend
Dhruva Backend
Dhruva Tech Architecture
Main Server
DB
Queue
Job processing workers
Metrics Logging
Data dumps
Frontend
Dashboards
Model deployments
Cache
Async tasks
Marketplace
Inference Calls
Dhruva Tech Stack Implementation
Component Technology Deployment
Main server FastAPI & Uvicorn Azure AppService
Database MongoDB Azure Cosmos
Cache Redis Azure Cache for Redis
Frontend NextJS & ChakraUI Azure CDN Pages
Model deployments Triton server Azure Machine Learning
Data dumps Object Storage Azure Blobs
Queue & Job workers RabbitMQ & Celery Azure Virtual Machine
Metrics and dashboards Prometheus & Grafana Azure Virtual Machine
Marketplace ULCA
Dhruva Performance
06
Models Optimization
● Multilingual models
○ to reduce the number of deployments compared to monolingual
● Efficient inference framework
○ instead of using the same framework from training
● Batching
○ Processing multiple inputs at a time
● Dynamic Batching
○ Batching the inputs from different requests
● Multiple replicas of models per GPU
○ to make full usage of each machine
● Production-grade inference server
○ instead of writing our own server
● Other graph optimizations: Quantization, Kernel Fusion, …
AI4Bharat Models Case-study
● Inference Engine: Nvidia Triton Server
○ Fully open-source
○ Best performing, especially on Nvidia GPUs
● Example optimized models:
Model Training Framework Inference Framework
AI4Bharat ASR NeMo Nvidia TensorRT
AI4Bharat IndicTrans2 Fairseq cTranslate2
AI4Bharat TTS Coqui ONNX
Cost and Performance
AI4Bharat Model Requests per sec Processing Power Raw Cost
Indic-ASR 40 1hr in 13secs ₹ 0.18
IndicTrans2 60 1L chars in 30secs ₹ 0.41
Indic-TTS 11 1hr audio in 50secs ₹ 0.67
(On a single machine, with Nvidia T4 GPU)

More Related Content

Similar to Dhruva - Deploying models at scale.pptx

Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...Edureka!
 
Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowAditya Bhattacharya
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Lviv Startup Club
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Startup Club
 
Azure DevOps työkalut - Roundtable 14.3.2019
Azure DevOps työkalut - Roundtable 14.3.2019Azure DevOps työkalut - Roundtable 14.3.2019
Azure DevOps työkalut - Roundtable 14.3.2019Janne Mattila
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Clarisse Hedglin
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in AzureBruno Capuano
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...DataScienceConferenc1
 
EclipseCon 2015 - Generating business applications from executable models
EclipseCon 2015 - Generating business applications from executable modelsEclipseCon 2015 - Generating business applications from executable models
EclipseCon 2015 - Generating business applications from executable modelsRafael Chaves
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup Databricks
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 

Similar to Dhruva - Deploying models at scale.pptx (20)

Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
AML_service.pptx
AML_service.pptxAML_service.pptx
AML_service.pptx
 
Bhashini (NLTM) Tools
Bhashini (NLTM) ToolsBhashini (NLTM) Tools
Bhashini (NLTM) Tools
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlow
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Azure DevOps työkalut - Roundtable 14.3.2019
Azure DevOps työkalut - Roundtable 14.3.2019Azure DevOps työkalut - Roundtable 14.3.2019
Azure DevOps työkalut - Roundtable 14.3.2019
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...
[DSC DACH 23] Go with the flow – Track your machine learning lifecycle using ...
 
EclipseCon 2015 - Generating business applications from executable models
EclipseCon 2015 - Generating business applications from executable modelsEclipseCon 2015 - Generating business applications from executable models
EclipseCon 2015 - Generating business applications from executable models
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Dhruva - Deploying models at scale.pptx

  • 1. Dhruva Deep-Dive session A Standard Interface to Deploy and Collaborate on Open language AI Presented by: Gokul NC, Project Officer from AI4Bharat
  • 2. Overview Why Dhruva? What is Dhruva 01 02 03 The current deployment of open AI models has many limitations Dhruva as a standardization to deploy/collaborate on open AI models Adding support for different AI tasks and model types Model Registry
  • 3. Overview Tech Stack 04 05 06 Performance Live demo of available AI services and other features Dhruva Architecture, design and implementation Models optimization and performance evaluation Frontend Demo
  • 5. What does it take to do open AI Open Data Open data to train & benchmark models OPEN MODEls Open foundational, domain models deployment Efficient & scalable deployed models finetuning Improving models with deployment data
  • 6. What does it take to do open AI Open Data Open data to train & benchmark models OPEN MODEls Open foundational, domain models deployment Efficient & scalable deployed models finetuning Improving models with deployment data Work-in-progress Bhashini project has created a head-start for open data across languages, tasks, and domains Work-in-progress AI4Bharat/academic groups have trained open models across languages, tasks, and domains To-be-done
  • 7. Currently Models are Deployed in Isolation Developer overhead: Each infra layer requires custom optimizations for efficiency at both hardware & cloud infrastructure levels Infra Open model Use case Infra Open model Use case Infra Open model Use case Challenges: Cost overhead: Deployments across multiple isolated instances do not allow for efficiency of scale in fully utilizing hardware Accuracy overhead: Isolated instances for each use-case makes it harder to share data for fine-tuning modes
  • 8. What is Dhruva? 02 Open-source platform for deploying language-AI services at scale
  • 9. The Dhruva Standard Use case Open optimized model directory Use case Use case Dhruva is an open standard to manage the lifecycle of open AI models with the following Infra 1 Dhruva Standard Recipes for optimizing open models for efficient deployment on target hardware Standardized APIs for managing a model repository, deploying models, autoscaling deployment, monitoring deployment, providing metered access to deployment, logging model data, … Graphical and command-line interfaces for the above APIs, including front-end for viewing and correcting model logs Infra 2
  • 10. Dhruva Features ● Deploy any language AI model ○ (that is packaged in Dhruva standard) ● ULCA standard out-of-the-box ● Support for real-time streaming via socket ● Support for task-pipelines (like speech-to-speech) ● API keys based access for users ● Monitoring and metering the usage precisely ● Auto-scalable deployments …much more
  • 11. Supported Tasks As of now, we support: ● Speech-to-Text 🗣️🎙️→📃 ○ called Automatic Speech Recognition (ASR) ● Translation India → भारत ○ called Neural Machine Translation (NMT) ● Text-To-Speech (TTS) 📝→️🔊 ○ called Voice Synthesis ● Transliteration
  • 12. Neural Machine Translation Text to Speech Automatic Speech Recognition Bhashaverse Request (ULCA format) Response (ULCA format) Pipeline: Speech-to-Speech Translation
  • 14. Currently deployed models Models from AI4Bharat Task-type Model name Indian Languages supported ASR Indic-Conformer 12 Translation IndicTrans2 22 TTS Indic-FastTTS 14 Transliteration Indic-Xlit 21
  • 15. Currently deployed models Models from other open-source researchers Task-type Model Owner Model name Languages supported ASR OpenAI Whisper English Translation IIT-Bombay v1 English, Hindi, Marathi
  • 16. On-boarding a model: Stage-1 1. Open-source the models a. These models will be used later for Dhruva deployment as well 2. Submit the models to Bhashini-ULCA registry with all details a. This will generate an unique Model-ID 3. Connect the ULCA-registered model information to Dhruva a. ..under your organization name and unique model-name
  • 17. On-boarding a model: Stage-2 1. Optimize the models (optional) a. This enables faster inference 2. Package it for deployment a. ..as per Dhruva standard format 3. Deploy it on infrastructure with authentication 4. Register the endpoint as an inference service on Dhruva a. with reference to the Model-ID from stage-1
  • 19. Walkthrough features ● Inference services and pipelines ○ Demo, documentation and feedback ○ Monitoring recent usage and usage tracking ● Generating API keys ○ via Dhruva ○ via ULCA ● Sample code usage
  • 21. Dhruva Layers Jugalbandi User Interface / CLI Bhasaverse Anuvaad Infrastructure Layer (e.g. Azure ML) ULCA Standardization Layer Model Layer - Triton Inference Management Layer External Applications Dhruva Protocol Dhruva Frontend Dhruva Backend
  • 22. Dhruva Tech Architecture Main Server DB Queue Job processing workers Metrics Logging Data dumps Frontend Dashboards Model deployments Cache Async tasks Marketplace Inference Calls
  • 23. Dhruva Tech Stack Implementation Component Technology Deployment Main server FastAPI & Uvicorn Azure AppService Database MongoDB Azure Cosmos Cache Redis Azure Cache for Redis Frontend NextJS & ChakraUI Azure CDN Pages Model deployments Triton server Azure Machine Learning Data dumps Object Storage Azure Blobs Queue & Job workers RabbitMQ & Celery Azure Virtual Machine Metrics and dashboards Prometheus & Grafana Azure Virtual Machine Marketplace ULCA
  • 25. Models Optimization ● Multilingual models ○ to reduce the number of deployments compared to monolingual ● Efficient inference framework ○ instead of using the same framework from training ● Batching ○ Processing multiple inputs at a time ● Dynamic Batching ○ Batching the inputs from different requests ● Multiple replicas of models per GPU ○ to make full usage of each machine ● Production-grade inference server ○ instead of writing our own server ● Other graph optimizations: Quantization, Kernel Fusion, …
  • 26. AI4Bharat Models Case-study ● Inference Engine: Nvidia Triton Server ○ Fully open-source ○ Best performing, especially on Nvidia GPUs ● Example optimized models: Model Training Framework Inference Framework AI4Bharat ASR NeMo Nvidia TensorRT AI4Bharat IndicTrans2 Fairseq cTranslate2 AI4Bharat TTS Coqui ONNX
  • 27. Cost and Performance AI4Bharat Model Requests per sec Processing Power Raw Cost Indic-ASR 40 1hr in 13secs ₹ 0.18 IndicTrans2 60 1L chars in 30secs ₹ 0.41 Indic-TTS 11 1hr audio in 50secs ₹ 0.67 (On a single machine, with Nvidia T4 GPU)

Editor's Notes

  1. Upcoming: Automate step-3
  2. Upcoming: Automate step-3 and step-4
  3. Standardizes inter and intra communication protocol