Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CHALLENGES IN BUILDING
NATURAL LANGUAGE PROCESSING
APPLICATIONS FOR
!पाली LANGUAGE
- Chandan Goopta
Unicode number: U+0915...
NATURAL LANGUAGE PROCESSING
NLP Task English Indic Languages Nepali
Machine Translation Very Good Good
Very Poor
(Google/M$)
Named Entity
Recognition
...
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
...
Build Everything from Scratch
OR
I CAN USE ENGLISH
LANGUAGE
RESOURCES FOR
NEPALI
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
...
I am like Others are Like Professors are Like
BACK TO CHALLENGES
• Unicode Rendering in
Dev-tools
• Lack of Resources
• Very Less Previous 

Works/Research
WHY PYTHON?
–Prof. James A. Hendler

University of Maryland
“I have the students learn Python in our
undergraduate and graduate Semant...
WHY PYTHON?
• NLTK, although not the most efficient
implementation, provides a lot of awesome tools
to quickly prototype a ...
WHY PYTHON?
• Scipy + Numpy: Everything that isn't in NLTK is
definitely in these libraries. If you want to use more
advanc...
WHY PYTHON?
• Python has really great XML/HTML parsing
libraries such as Beautiful Soup and Scrape.py. 



You can use the...
WHY PYTHON?
• Python has great web-frameworks like Django/
Pylons/Tornado. 



If you invent a revolutionary sarcasm detec...
WHY PYTHON?
• Consider your other options: It would not make
sense to use a compiled language like C++/Java
for this type ...
THANK YOU
Challenges in Building NLP Applications in Nepali Language
Upcoming SlideShare
Loading in …5
×

Challenges in Building NLP Applications in Nepali Language

1,259 views

Published on

This presentation gives an overview of challenges in building Natual Language Processing for Nepali Language and why python is good for NLP developments.

Published in: Technology

Challenges in Building NLP Applications in Nepali Language

  1. 1. CHALLENGES IN BUILDING NATURAL LANGUAGE PROCESSING APPLICATIONS FOR !पाली LANGUAGE - Chandan Goopta Unicode number: U+0915 HTML-code: क
  2. 2. NATURAL LANGUAGE PROCESSING
  3. 3. NLP Task English Indic Languages Nepali Machine Translation Very Good Good Very Poor (Google/M$) Named Entity Recognition Very Good Fair None (Few Ground work) Optical Character Recognition Very Good Poor Very Poor POS Tagging Good Poor Very Poor Sentiment Analysis Very Good Fair Poor (works on-going) Speech Recognition Good Poor None (Google’s on-work) What So Far?
  4. 4. SENTIMENT ANALYSIS • Chunking | Sentence Chunker • Tagging | POS Tagger • Resources | SentiWordNet, Subjectivity WordList • Machine Learning | Corpus, Tagged Samples
  5. 5. Build Everything from Scratch
  6. 6. OR I CAN USE ENGLISH LANGUAGE RESOURCES FOR NEPALI
  7. 7. SENTIMENT ANALYSIS • Chunking | Sentence Chunker • Tagging | POS Tagger • Resources | SentiWordNet, Subjectivity WordList • Machine Learning | Corpus, Tagged Samples
  8. 8. I am like Others are Like Professors are Like
  9. 9. BACK TO CHALLENGES • Unicode Rendering in Dev-tools • Lack of Resources • Very Less Previous 
 Works/Research
  10. 10. WHY PYTHON?
  11. 11. –Prof. James A. Hendler
 University of Maryland “I have the students learn Python in our undergraduate and graduate Semantic Web courses. Why? Because basically there's nothing else with the flexibility and as many web libraries”
  12. 12. WHY PYTHON? • NLTK, although not the most efficient implementation, provides a lot of awesome tools to quickly prototype a hypothesis Source: Quora
  13. 13. WHY PYTHON? • Scipy + Numpy: Everything that isn't in NLTK is definitely in these libraries. If you want to use more advanced algorithms like Latent Semantic Indexing or Latent Dirichlet Allocation, Python has libraries to do that. Source: Quora
  14. 14. WHY PYTHON? • Python has really great XML/HTML parsing libraries such as Beautiful Soup and Scrape.py. 
 
 You can use these libraries to quickly scrape the web and generate large data sets to improve the performance of your models (because lets face it, big data trumps complexity) Source: Quora
  15. 15. WHY PYTHON? • Python has great web-frameworks like Django/ Pylons/Tornado. 
 
 If you invent a revolutionary sarcasm detector that can predict trends in the stock market, you can quickly integrated it into a web service, make millions, and buy a large island in a third-world country. Source: Quora
  16. 16. WHY PYTHON? • Consider your other options: It would not make sense to use a compiled language like C++/Java for this type of work unless you needed to increase performance (computational speed, not model accuracy). 
 
 As far as I can tell, Ruby is completely useless for any Machine Learning, Data Mining, or Natural Language Processing task. Maybe you could use Lisp, but at this point, Python has a larger eco-system. Source: Quora
  17. 17. THANK YOU

×