2. Agenda
● Bhashini Mission
● Need for Digital Infrastructure
● NLTM Architecture
● Datasets & Models
● BhashaDaan
● ULCA
● Contributing Datasets & Models to ULCA
● Roadmap
3. Bhashini Mission Statement
Create a knowledge-based
society by transcending the
language barriers ;
Providing content and services
to citizens, in their own
language.
10. AI Models
Task Types Contributors
Translation
ASR
TTS
Transliteration
OCR
Models
EkStep
AI4Bharat
IITs
IIITs
CDAC
IndicTrans
Vakyansh
IndicXlit
IndicTTS
Anuvaad
and more… and more… and more…
11. ULCA stands for Universal Language Contribution APIs
ULCA
ULCA is a standard API and open scalable data platform (supporting
various types of datasets) for Indian language datasets and models.
World’s largest Indic language data and models platform for Open AI
innovation
12. ULCA - Components
Open and scalable data platform
● Parallel text corpus in two or more languages
● Monolingual text corpus
● Automatic Speech Recognition (ASR) corpus
● Text to Speech (TTS) corpus
● Optical Character Recognition (OCR) corpus
● Natural Language Understanding (NLU) datasets
● Machine Translation (MT)
● Automatic Speech Recognition (ASR)
● Text to Speech (TTS)
● Optical Character Recognition (OCR)
● Transliteration
● Large, diverse and task specific benchmarks
● Research community approved metric system
Inclusive Indian language Models
Automated Transparent Benchmarking
13. ULCA - Current Status
Datasets
● 215 Million Parallel sentences in 13 languages
● 14k Hours of Audio recording in 14 languages
● 2.5 Million Images for OCR in 12 languages
● 10 Million Transliteration pairs in 19 languages
World's largest Indic language data and models platform for open AI innovation
Models ● 240 State of the Art Models in 21 Indian
languages across Translation, speech (ASR/TTS),
OCR & Transliteration
Benchmarks ● 135 Open Benchmarks across Translation, ASR
& Transliteration in 20 Indian languages
14. ULCA- Actions
Datasets
Submission My Contribution
Search & Download
My Searches
Models
Benchmarking
Submission My Contribution
Explore Models
Try Model
Metrics Benchmark Dataset
Explore Models
Try Model
Model Feedback
Model Leaderboard
18. ULCA - Roadmap
Datasets
POS, NER
Multi-lingual Multi-speaker
Mobile APK
Models
POS, NER
Benchmark
OCR Benchmark dataset
User Analytics
Ex : En-Hi Legal
Readymade Datasets
Realtime Inference
for Models
19. ULCA - Roadmap (Contd.)
ULCA
Automated Ingestion of verified contents from external sources to ULCA