SlideShare a Scribd company logo
1 of 8
Building an Odia Wordlist for
Everyone
Jnanaranjan Sahu
Mrutyunjaya Kar
Good Wordlist
Should contain both new and old words.
Should be updated regularly
Translation is a plus.
Where can be used?
Predictive text input
Spell checker
Machine Translation
Text to speech engine
Optical Character Recognition
Odia Wordlist: Our Collection
The word dump from Odia Wikipedia
A wordlist created by Dr. Debiprasanna Pattanayak Which later Cleaned up
by Srujanika
A Collection of words from the digital version of Purnachandra Odia
Bhashakosha
Where we are lacking?
A gap of non-availability of words that have been added to Odia
The new scientific and technical words are yet to be added to make a more
exhaustive wordlist
There are many Words which needs to be Improved
A proofread by experienced person
What we have done with the wordlist
Created a spell check add on for mozilla*
Used it to make 10-20% improvement in Odia OCR
Plans to create a text-to-speech engine
More...
*https://addons.mozilla.org/en-US/firefox/addon/odia-spelling-checker/
Aim
Our aim is to make a wordlist and keep it open or in free license so that
anyone can use it.
We also want to keep it in open sharing platform like github so that anyone
can make any suggestion for improvement.
Let’s make Odia language bigger and better
Thank You
Contact: gyana111@gmail.com

More Related Content

Similar to Building an Open Wordlist for Everyone

What is the Sketch Engine corpus query system?
What is the Sketch Engine corpus query system?What is the Sketch Engine corpus query system?
What is the Sketch Engine corpus query system?Ondřej Matuška
 
Corpora, tracked changes, and PDFs: some useful tips, at no cost!
Corpora, tracked changes, and PDFs: some useful tips, at no cost!Corpora, tracked changes, and PDFs: some useful tips, at no cost!
Corpora, tracked changes, and PDFs: some useful tips, at no cost!Patricia Maria Ferreira Larrieux
 
IndianTTS Product Details
IndianTTS Product DetailsIndianTTS Product Details
IndianTTS Product DetailsTts India
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Jaganadh Gopinadhan
 
Caption.Ed Pro - Information for DSA Assessors
Caption.Ed Pro - Information for DSA AssessorsCaption.Ed Pro - Information for DSA Assessors
Caption.Ed Pro - Information for DSA AssessorsCareScribe
 
Junaid Dogar_Curriculum Vitae
Junaid Dogar_Curriculum VitaeJunaid Dogar_Curriculum Vitae
Junaid Dogar_Curriculum VitaeJunaid Dogar
 
Enosis Technology
Enosis TechnologyEnosis Technology
Enosis TechnologyENOSIS
 
Transcription Services
 Transcription Services Transcription Services
Transcription ServicesNitishkp
 
NTLM - Open Source Language AI Tools
NTLM - Open Source Language AI ToolsNTLM - Open Source Language AI Tools
NTLM - Open Source Language AI ToolsAravinth Bheemaraj
 
Outsourcing company in pakistan
Outsourcing company in pakistanOutsourcing company in pakistan
Outsourcing company in pakistanmonaesolpk
 
Practical implementation of Natural language processing with python
Practical implementation of Natural language processing with pythonPractical implementation of Natural language processing with python
Practical implementation of Natural language processing with pythonAbdulkereemKereem
 
Voice over coding
Voice over codingVoice over coding
Voice over codingSai Subu
 

Similar to Building an Open Wordlist for Everyone (20)

Hindi to English translation - sourcecode
Hindi to English translation - sourcecodeHindi to English translation - sourcecode
Hindi to English translation - sourcecode
 
Php development
Php developmentPhp development
Php development
 
Leave Behind
Leave BehindLeave Behind
Leave Behind
 
What is the Sketch Engine corpus query system?
What is the Sketch Engine corpus query system?What is the Sketch Engine corpus query system?
What is the Sketch Engine corpus query system?
 
Corpora, tracked changes, and PDFs: some useful tips, at no cost!
Corpora, tracked changes, and PDFs: some useful tips, at no cost!Corpora, tracked changes, and PDFs: some useful tips, at no cost!
Corpora, tracked changes, and PDFs: some useful tips, at no cost!
 
IndianTTS Product Details
IndianTTS Product DetailsIndianTTS Product Details
IndianTTS Product Details
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Speech Dubbing Software
Speech Dubbing SoftwareSpeech Dubbing Software
Speech Dubbing Software
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
What is Python?
What is Python?What is Python?
What is Python?
 
Caption.Ed Pro - Information for DSA Assessors
Caption.Ed Pro - Information for DSA AssessorsCaption.Ed Pro - Information for DSA Assessors
Caption.Ed Pro - Information for DSA Assessors
 
Junaid Dogar_Curriculum Vitae
Junaid Dogar_Curriculum VitaeJunaid Dogar_Curriculum Vitae
Junaid Dogar_Curriculum Vitae
 
Enosis Technology
Enosis TechnologyEnosis Technology
Enosis Technology
 
Transcription Services
 Transcription Services Transcription Services
Transcription Services
 
NTLM - Open Source Language AI Tools
NTLM - Open Source Language AI ToolsNTLM - Open Source Language AI Tools
NTLM - Open Source Language AI Tools
 
Reverie Language Technologies
Reverie Language TechnologiesReverie Language Technologies
Reverie Language Technologies
 
Outsourcing company in pakistan
Outsourcing company in pakistanOutsourcing company in pakistan
Outsourcing company in pakistan
 
Add more Speech API to your bot
Add more Speech API to your botAdd more Speech API to your bot
Add more Speech API to your bot
 
Practical implementation of Natural language processing with python
Practical implementation of Natural language processing with pythonPractical implementation of Natural language processing with python
Practical implementation of Natural language processing with python
 
Voice over coding
Voice over codingVoice over coding
Voice over coding
 

Building an Open Wordlist for Everyone

  • 1. Building an Odia Wordlist for Everyone Jnanaranjan Sahu Mrutyunjaya Kar
  • 2. Good Wordlist Should contain both new and old words. Should be updated regularly Translation is a plus.
  • 3. Where can be used? Predictive text input Spell checker Machine Translation Text to speech engine Optical Character Recognition
  • 4. Odia Wordlist: Our Collection The word dump from Odia Wikipedia A wordlist created by Dr. Debiprasanna Pattanayak Which later Cleaned up by Srujanika A Collection of words from the digital version of Purnachandra Odia Bhashakosha
  • 5. Where we are lacking? A gap of non-availability of words that have been added to Odia The new scientific and technical words are yet to be added to make a more exhaustive wordlist There are many Words which needs to be Improved A proofread by experienced person
  • 6. What we have done with the wordlist Created a spell check add on for mozilla* Used it to make 10-20% improvement in Odia OCR Plans to create a text-to-speech engine More... *https://addons.mozilla.org/en-US/firefox/addon/odia-spelling-checker/
  • 7. Aim Our aim is to make a wordlist and keep it open or in free license so that anyone can use it. We also want to keep it in open sharing platform like github so that anyone can make any suggestion for improvement.
  • 8. Let’s make Odia language bigger and better Thank You Contact: gyana111@gmail.com