Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MoM2010: Arabic natural language processing


Published on

Published in: Technology, Education
  • Be the first to comment

MoM2010: Arabic natural language processing

  1. 1. Arabic Natural Language Processing )ANLP(and KACST research efforts in this field<br />ImanALOdah<br />King Abdul-Aziz City for Science and Technology<br />
  2. 2. Introduction.<br />NLP cross-languages challenges.<br />ANLP challenges.<br />KACST efforts in ANLP. <br />Conclusion.<br />Refernces.<br />Outlines<br />
  3. 3. Vision (HAL 9000) [1]<br />
  4. 4. Natural language processing (NLP) is defined as a software or hardware components in a computer system which analyze or synthesize spoken or written language”[3].<br />NLP definition <br />
  5. 5. Spoken or written text is processed by applying several levels of analysis [2].<br />Each level is meant to clarify some sort of ambiguity. <br />NLP cross-languages challenges<br />
  6. 6. Levels of analysis in NLP systems<br />
  7. 7. Levels of analysis in NLP systems<br />e.g.<br />الطالبات<br />ذهبن<br />إلى<br />المحاضرة<br />
  8. 8. Levels of analysis in NLP systems<br />e.g.<br />IchheisseIman.<br />
  9. 9. Levels of analysis in NLP systems<br />e.g.<br />أعطيتها الضوء الأخضر<br />
  10. 10. Levels of analysis in NLP systems<br />e.g.<br />اشتكت ريم من تأخر المشروع فأعطيتها الضوء الأخضر لتبدأ العمل.<br />
  11. 11. Arabic language has its own features that add extra challenges when dealing with its written text.<br />Some problems that face Arabic language processing:<br /><ul><li>Arabic language has many translated and transliterated named entities.
  12. 12. The lack of large corpus is another problem.
  13. 13. Implemented solution can’t be adapted to the language due to some specific features that Arabic language possesses.</li></ul>Arabic language challenges [4]<br />
  14. 14. 1- Diglossia: which means two or more varieties of the same language exist side-by-side in the same speech community [4].<br />Arabic has three varieties:<br /><ul><li>Traditional (Classic) Arabic
  15. 15. Modern Standard Arabic
  16. 16. Dialect.</li></ul>Arabic Language Features<br />
  17. 17. 2- The Arabic script [4]: <br />Arabic does not have capitalization.<br />Arabic does not have letters to represent short vowels. <br />Arabic does not have minimal punctuation.<br /> Changes in the form of the letter depending on its place in the word.<br />Normalization solves recognition but increases ambiguity.<br />Arabic Language Features<br />
  18. 18. وهم و هُم و هَمّ وَهمْ و هَمٌ<br />إن من يفلح في مخاطبتها ويلوح لها بطوق النجاة ويعدها بحل مشكلاتها ويهيمن على عقولها وقلوبها يمتلك زمامها فتذعن له وتندفع في الطريق الذي يشير إليه، وقناعتها في الأشخاص والرموز تسبق قناعاتها الموضوعية المبنية على الدراسة والتأمل والتفكير وهذه مسألة يجب أن يفقهها العاملون للإسلام ويدركوها من شأن العامة في كل زمان ومكان، فهم لن يتحولوا إلى علماء أومفكرينأودعاةولكنهم " أدوات " صالحة نافعة مثمرة متى أحسن استثمارها وأحسن صقلها .<br />Examples<br />
  19. 19. 4- Arabic morphology [4]:<br /><ul><li>Agglutinative language.</li></ul>أعطيته الكتاب<br /><ul><li>Nonconcatenative language.</li></ul>كتب – كتاب – كاتب<br /><ul><li>Pro-drop language.</li></ul>ناول الكتاب<br />Arabic Language Features<br />
  20. 20. 5- Syntax [4]: <br />Free word order. <br />VSO or SVO or OVS<br />يلعب محمد بالكرة<br />محمد يلعب بالكرة<br />بالكرة يلعب محمد<br />Strict agreement system.<br />تلعب محمد بالكرة ؟!<br />Question formation.<br />Traditional grammar doesn’t serve ANLP goals. <br />Arabic Language Features<br />
  21. 21. KACST – Computer research institute departments [5]:<br />Acoustics and Linguistics<br />Scientific Computing<br />Computer Networks &  Systems<br />Software Engineering & Innovated Systems<br />KACST research efforts in ANLP<br />
  22. 22.  Arabic Text to Speech (ATTS).<br /> Arabic Language Morphological Analyzer.<br /> The Saudi Voice Bank.<br /> Speech Recognition for telephony application.<br /> Arabic Phonetic Database.<br /> Arabic Diacritizer.<br /> Arabic Names Romanization.<br /> Arabic Stemmer.<br /> Arabic Name Translator.<br />Some of KACST Projects in ANLP<br />
  23. 23.  Arabic Text Recognition Using HMMs.<br /> Machine classification of Arabic texts.<br /> Arabic IVR System.<br /> Sound Source Recognition.<br /> Machine Translation (from and to the Arabic Language).<br /> Language Identification.<br /> Online Arabic Handwritten OCR.<br /> Arabic Parser.<br /> Arabic Numerical Analyzer.<br />Some of KACST Projects in ANLP<br />
  24. 24. NLP is a step towards human-centric computing.<br />The rapid increase in information needs a fast way to translate this information into our language.<br />Achieving good results in ANLP requires collaboration between linguists, Arabic language specialists and computer scientists.<br />We made a big mistake 300 years ago when we separated technology and humanism. ... It's time to put the two back together.<br />—Michael Dertouzos, Scientific American, July 1997<br />Conclusion<br />
  25. 25.<br />Daniel Jurafsky, James H. Martin, Speech and Language Processing, New Jersey: Prentice Hall, 2008.<br />Peter Jackson and Isabelle Moulinier, Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, Amsterdam / Philadelphia: John Benjamins Publishing Company, 2002.<br />Farghaly, A. and Shaalan, K. 2009. Arabic Natural Language Processing: Challenges and Solutions. ACM Transactions on Asian Language Information Processing (TALIP) 8, 4 (Dec. 2009), 1-22. DOI=<br /><br />References:<br />
  26. 26. شكرا على حسن استماعكم<br />