Successfully reported this slideshow.

"Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

1,002 views

Published on

This presentation was shared on PAS Digital Marketing Conference "Dig-It 2.0"
Session name: Urdu Internet - Leveraging Technologies
Presentation: Computing support for Pakistani Languages, Challenges & Practices
Speaker: Dr. Sarmad Hussain, Professor and Head, Center for Language Engineering, University of Engineering and Technology, Pakistan

Published in: Marketing, Technology, Business
  • Be the first to comment

  • Be the first to like this

"Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain

  1. 1. Computing Support for Pakistani Languages – Challenges and Practice Unlocking Information for Human Development www.CLE.org.pk Sarmad Hussain Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology Lahore sarmad@cantab.net www.cle.org.pk 1
  2. 2. Need ICTs promise significant socio-economic impact Impact dependent on size of population which can use ICTs 180 Million citizens need access 66+ languages 10% understand English 58% literate 11% have access to computers 70% have access to mobile phones ITU IDI: Pakistan ranked 127 of 155 nations Human Language Technology necessary to bridge the gap www.cle.org.pk 2
  3. 3. Languages of Pakistan Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60) Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66 Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53 Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93 Percent Population of Pakistan by Mother Tongue www.cle.org.pk 3
  4. 4. Languages of Pakistan Sociocultural Economic Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60) Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66 Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53 Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93 Percent Population of Pakistan by Mother Tongue www.cle.org.pk 4
  5. 5. Sociocultural Languages of Pakistan Economic Urdu Punjabi Sindhi Pushto Balochi Saraiki Others (60) Total 7.57 44.15 14.1 15.42 3.57 10.53 4.66 Rural 1.48 42.51 16.46 18.06 3.99 12.97 4.53 Urban 20.22 47.56 9.20 9.94 2.69 5.46 4.93 Percent Population of Pakistan by Mother Tongue Languages of Pakistan in Danger (UNESCO) Vulnerable definitely endangered www.cle.org.pk severely endangered 5
  6. 6. How? Human Language Technology Linguistic Research Standards Applications Materials Training Adoption USE Relevant Content Access Relevant Content Generation www.cle.org.pk 6
  7. 7. Human Language Technology – Bridging Barriers • • • • Interfacing Assisting Enabling Empowering www.cle.org.pk 7
  8. 8. Interfacing Language – Character Set • Input Methods • Writing • Collation Standards – National – International – Terminology Translation • ISO 639 • ISO 3166 • ISO 10646/Unicode Technology – Applications – Platforms: Computers and Phones • Fonts • Linux/Unix and Symbian • Keyboards, Keypads and Other Input Methods • Microsoft Windows and Phone • Collation Methods • iOS – iPAD, iPhone, Macbook, … • Localized Platform • Google – Gmail, Docs, …Android www.cle.org.pk 8
  9. 9. Software Localization SeaMonkey Navigator OpenOffice.org Writer
  10. 10. Terminology and Content www.cle.org.pk 10
  11. 11. Assisting • Text – Assistive input/auto-complete methods – Thesaurus, Spelling and Grammar Checking – Machine Translation, Language Identification, Text Summarization … • Speech – Speech Recognition – Text to Speech – Emotion Detection, … • Image – Optical Character Recognition – www.UrduOCR.net – Handwriting Recognition www.cle.org.pk 11
  12. 12. www.cle.org.pk 12
  13. 13. www.cle.org.pk 13
  14. 14. Enabling • Hybrid – Online Content Sharing Tools – CMS, Social Networks – Screen Readers – Book Readers – Text based Search Engines – Dialogue Systems – Speech to Speech Translation – Multi-modal Search Engines www.cle.org.pk 14
  15. 15. Dialogue System www.cle.org.pk 15
  16. 16. Empowering • ICT for ICT - Focused on infrastructure • ICT for Development - Focused on content and applications • ICT for Human Development - Focused on participatory process www.cle.org.pk 16
  17. 17. www.cle.org.pk 17
  18. 18. LANGUAGE AND ICT TRAINING 100% Preference for Urdu 80% Preference for English 100 60% 80 20% 0% Before Training Software Percent Teachers 40% Preference for Urdu Preference for English 60 40 After Training Before Training 20 After Training Training Material 0 Before Training After Training Software www.cle.org.pk Before Training After Training Training Material 18
  19. 19. LANGUAGE AND ICT TRAINING Icon Identification by Students Urdu Icons SubTotal Total F M English Transliterat Didn't English ed into Recognize Urdu F M 330 16% M F 691 656 132 198 150 1347 4% F M SubTotal 183 49 40 2099 333 www.cle.org.pk 89 16% 64% 2099 19
  20. 20. ACCESSING INFO ONLINE Language Used Students Female Male Total Urdu English 44 45 89 2 2 4 Total 46 47 93 Language Preference for Searching on the Internet Preferred Language for Setting a Homepage Participant English Urdu Students 0 138 Teachers 5 13 Total 5 151 www.cle.org.pk 20
  21. 21. LANGUAGE IN ONLINE COMMUNICATION 9% 1% 2% 1467 emails and 363 chats Urdu English Punjabi Others 89% www.cle.org.pk 21
  22. 22. [1] One school did not participate, and one school website was disqualified as the team took significant external assistance. LANGUAGE FOR CONTENT DEVELOPMENT Website Competition Category Language of Website Urdu English Total School Website (by 10 School Teacher Teams) 9 1 10 Local Village Website (by 10 School Student Teams) 8 0 8 Open Category (Individual Students) 38 0 38 Total 55 1 56 www.cle.org.pk 22
  23. 23. CONTENT www.cle.org.pk 23
  24. 24. Development Process of Human Language Technology Select Language Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Localization of Existing Applications Development of Advanced HLT Application Extension of Localization Applications 24
  25. 25. Status of Human Language Technology URDU Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 25
  26. 26. Status of Human Language Technology SINDHI Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 26
  27. 27. Status of Human Language Technology PUSHTO Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 27
  28. 28. Status of Human Language Technology PUNJABI Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 28
  29. 29. Status of Human Language Technology BALOCHI Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 29
  30. 30. Status of Human Language Technology SARAIKI Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 30
  31. 31. Status of Human Language Technology OTHERS Linguistic Data Collection Core Linguistic Analysis and Definition Publishing Language Computing Standards Development of Localization Utilities Detailed Linguistic Analysis Publishing Data Annotations Schema Annotation of Linguistic Data Development of Linguistic Utilities Publishing Annotated Linguistic Resources Development of Advanced HLT Application Localization of Existing Applications Extension of Localization Applications Reasonable Support Some Support Minimal Support 31
  32. 32. www.cle.org.pk 32

×