SlideShare a Scribd company logo
1 of 20
Bhashini Tools
BhashaDaan & ULCA
Agenda
● Bhashini Mission
● Need for Digital Infrastructure
● NLTM Architecture
● Datasets & Models
● BhashaDaan
● ULCA
● Contributing Datasets & Models to ULCA
● Roadmap
Bhashini Mission Statement
Create a knowledge-based
society by transcending the
language barriers ;
Providing content and services
to citizens, in their own
language.
Digital Infrastructure Educational
eBooks
Digital Web
Content
Tele
Services
Communication
Services
Knowledge
Base
Search
Datasets
Datasets
AI/ML Models
NLTM Architecture
In simple words…
Contributors of
Datasets
Development of
AI Models
Development of
End user Applications
Datasets & Contributors
Contributors Datasets
Public
- Crowdsourced
- Free
Dedicated Teams
- Language Experts
- Specific Tasks
- Paid
Parallel
Monolingual
ASR
TTS
OCR & more…
Data Collection
Help to build an open repository of data to digitally enrich your language
ASR Datasets
TTS Datasets
Parallel Datasets
OCR Datasets
BashaDhaan - A short video
AI Models
Task Types Contributors
Translation
ASR
TTS
Transliteration
OCR
Models
EkStep
AI4Bharat
IITs
IIITs
CDAC
IndicTrans
Vakyansh
IndicXlit
IndicTTS
Anuvaad
and more… and more… and more…
ULCA stands for Universal Language Contribution APIs
ULCA
ULCA is a standard API and open scalable data platform (supporting
various types of datasets) for Indian language datasets and models.
World’s largest Indic language data and models platform for Open AI
innovation
ULCA - Components
Open and scalable data platform
● Parallel text corpus in two or more languages
● Monolingual text corpus
● Automatic Speech Recognition (ASR) corpus
● Text to Speech (TTS) corpus
● Optical Character Recognition (OCR) corpus
● Natural Language Understanding (NLU) datasets
● Machine Translation (MT)
● Automatic Speech Recognition (ASR)
● Text to Speech (TTS)
● Optical Character Recognition (OCR)
● Transliteration
● Large, diverse and task specific benchmarks
● Research community approved metric system
Inclusive Indian language Models
Automated Transparent Benchmarking
ULCA - Current Status
Datasets
● 215 Million Parallel sentences in 13 languages
● 14k Hours of Audio recording in 14 languages
● 2.5 Million Images for OCR in 12 languages
● 10 Million Transliteration pairs in 19 languages
World's largest Indic language data and models platform for open AI innovation
Models ● 240 State of the Art Models in 21 Indian
languages across Translation, speech (ASR/TTS),
OCR & Transliteration
Benchmarks ● 135 Open Benchmarks across Translation, ASR
& Transliteration in 20 Indian languages
ULCA- Actions
Datasets
Submission My Contribution
Search & Download
My Searches
Models
Benchmarking
Submission My Contribution
Explore Models
Try Model
Metrics Benchmark Dataset
Explore Models
Try Model
Model Feedback
Model Leaderboard
Contributing Datasets to ULCA
Contributing Models to ULCA
ULCA - Language AI Models Demo
ULCA - Roadmap
Datasets
POS, NER
Multi-lingual Multi-speaker
Mobile APK
Models
POS, NER
Benchmark
OCR Benchmark dataset
User Analytics
Ex : En-Hi Legal
Readymade Datasets
Realtime Inference
for Models
ULCA - Roadmap (Contd.)
ULCA
Automated Ingestion of verified contents from external sources to ULCA
Thank you!
Questions ?

More Related Content

What's hot

TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
Nsaroj kumar
 

What's hot (20)

TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Chatbot FAQs – The Most Common Chatbot Questions Answered!
Chatbot FAQs – The Most Common Chatbot Questions Answered!Chatbot FAQs – The Most Common Chatbot Questions Answered!
Chatbot FAQs – The Most Common Chatbot Questions Answered!
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Chat_GPT_Presentation
Chat_GPT_PresentationChat_GPT_Presentation
Chat_GPT_Presentation
 
E learning project report (Yashraj Nigam)
E learning project report (Yashraj Nigam)E learning project report (Yashraj Nigam)
E learning project report (Yashraj Nigam)
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 
Voice interfaces
Voice interfacesVoice interfaces
Voice interfaces
 
CHATBOT PPT-2.pptx
CHATBOT PPT-2.pptxCHATBOT PPT-2.pptx
CHATBOT PPT-2.pptx
 
The Top Trends in Artificial Intelligence
The Top Trends in Artificial IntelligenceThe Top Trends in Artificial Intelligence
The Top Trends in Artificial Intelligence
 
Introduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPathIntroduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPath
 
Artificial Intelligence (AI) in Education.pdf
Artificial Intelligence (AI) in Education.pdfArtificial Intelligence (AI) in Education.pdf
Artificial Intelligence (AI) in Education.pdf
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
ChatGPT - AI.pdf
ChatGPT - AI.pdfChatGPT - AI.pdf
ChatGPT - AI.pdf
 
Introduction to Machine Learning & AI
Introduction to Machine Learning & AIIntroduction to Machine Learning & AI
Introduction to Machine Learning & AI
 
introduction Azure OpenAI by Usama wahab khan
introduction  Azure OpenAI by Usama wahab khanintroduction  Azure OpenAI by Usama wahab khan
introduction Azure OpenAI by Usama wahab khan
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Introduction to Chatbots
Introduction to ChatbotsIntroduction to Chatbots
Introduction to Chatbots
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 

Similar to Bhashini (NLTM) Tools

GDSC career guide presentation.pptx
GDSC career guide presentation.pptxGDSC career guide presentation.pptx
GDSC career guide presentation.pptx
DishaSharma737984
 

Similar to Bhashini (NLTM) Tools (20)

Dhruva - Deploying models at scale.pptx
Dhruva - Deploying models at scale.pptxDhruva - Deploying models at scale.pptx
Dhruva - Deploying models at scale.pptx
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
How AI can help you build better customer relationships?
How AI can help you build better customer relationships?How AI can help you build better customer relationships?
How AI can help you build better customer relationships?
 
The Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New TechnologiesThe Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New Technologies
 
2010 tool forum ata handout
2010 tool forum ata handout2010 tool forum ata handout
2010 tool forum ata handout
 
AI as a service
AI as a serviceAI as a service
AI as a service
 
GDSC career guide presentation.pptx
GDSC career guide presentation.pptxGDSC career guide presentation.pptx
GDSC career guide presentation.pptx
 
GDSC career guide presentation.pptx
GDSC career guide presentation.pptxGDSC career guide presentation.pptx
GDSC career guide presentation.pptx
 
Improving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsImproving the User Experience of UiPath Apps
Improving the User Experience of UiPath Apps
 
Company Overview
Company OverviewCompany Overview
Company Overview
 
Ai/ML services
Ai/ML servicesAi/ML services
Ai/ML services
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without ia
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdf
 
Google Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow StuttgartGoogle Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow Stuttgart
 
NLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdfNLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdf
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
 
Translation as a professional activity
Translation as a professional activityTranslation as a professional activity
Translation as a professional activity
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdf
 
Google Cloud Platform Munich
Google Cloud Platform MunichGoogle Cloud Platform Munich
Google Cloud Platform Munich
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 

Bhashini (NLTM) Tools

  • 2. Agenda ● Bhashini Mission ● Need for Digital Infrastructure ● NLTM Architecture ● Datasets & Models ● BhashaDaan ● ULCA ● Contributing Datasets & Models to ULCA ● Roadmap
  • 3. Bhashini Mission Statement Create a knowledge-based society by transcending the language barriers ; Providing content and services to citizens, in their own language.
  • 4. Digital Infrastructure Educational eBooks Digital Web Content Tele Services Communication Services Knowledge Base Search Datasets Datasets AI/ML Models
  • 6. In simple words… Contributors of Datasets Development of AI Models Development of End user Applications
  • 7. Datasets & Contributors Contributors Datasets Public - Crowdsourced - Free Dedicated Teams - Language Experts - Specific Tasks - Paid Parallel Monolingual ASR TTS OCR & more…
  • 8. Data Collection Help to build an open repository of data to digitally enrich your language ASR Datasets TTS Datasets Parallel Datasets OCR Datasets
  • 9. BashaDhaan - A short video
  • 10. AI Models Task Types Contributors Translation ASR TTS Transliteration OCR Models EkStep AI4Bharat IITs IIITs CDAC IndicTrans Vakyansh IndicXlit IndicTTS Anuvaad and more… and more… and more…
  • 11. ULCA stands for Universal Language Contribution APIs ULCA ULCA is a standard API and open scalable data platform (supporting various types of datasets) for Indian language datasets and models. World’s largest Indic language data and models platform for Open AI innovation
  • 12. ULCA - Components Open and scalable data platform ● Parallel text corpus in two or more languages ● Monolingual text corpus ● Automatic Speech Recognition (ASR) corpus ● Text to Speech (TTS) corpus ● Optical Character Recognition (OCR) corpus ● Natural Language Understanding (NLU) datasets ● Machine Translation (MT) ● Automatic Speech Recognition (ASR) ● Text to Speech (TTS) ● Optical Character Recognition (OCR) ● Transliteration ● Large, diverse and task specific benchmarks ● Research community approved metric system Inclusive Indian language Models Automated Transparent Benchmarking
  • 13. ULCA - Current Status Datasets ● 215 Million Parallel sentences in 13 languages ● 14k Hours of Audio recording in 14 languages ● 2.5 Million Images for OCR in 12 languages ● 10 Million Transliteration pairs in 19 languages World's largest Indic language data and models platform for open AI innovation Models ● 240 State of the Art Models in 21 Indian languages across Translation, speech (ASR/TTS), OCR & Transliteration Benchmarks ● 135 Open Benchmarks across Translation, ASR & Transliteration in 20 Indian languages
  • 14. ULCA- Actions Datasets Submission My Contribution Search & Download My Searches Models Benchmarking Submission My Contribution Explore Models Try Model Metrics Benchmark Dataset Explore Models Try Model Model Feedback Model Leaderboard
  • 17. ULCA - Language AI Models Demo
  • 18. ULCA - Roadmap Datasets POS, NER Multi-lingual Multi-speaker Mobile APK Models POS, NER Benchmark OCR Benchmark dataset User Analytics Ex : En-Hi Legal Readymade Datasets Realtime Inference for Models
  • 19. ULCA - Roadmap (Contd.) ULCA Automated Ingestion of verified contents from external sources to ULCA

Editor's Notes

  1. <a href="https://www.freepik.com/vectors/robot-head">Robot head vector created by pch.vector - www.freepik.com</a>