Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

2,862 views

Published on

My talk on Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages at JCDL2015.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

  1. 1. Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages Computer Science Department, Old Dominion University Norfolk, Virginia - 23529 Sawood Alam National University of Sciences and Technology Islamabad, Pakistan Fateh ud din B Mehmood Computer Science Department, Old Dominion University Norfolk, Virginia - 23529 Michael L. Nelson
  2. 2. The Time Travel
  3. 3. OK Google, Define Dictionary a book or electronic resource that lists the words of a language (typically in alphabetical order) and gives their meaning, or gives the equivalent words in a different language, often also providing information about pronunciation, origin, and usage.
  4. 4. Dictionaries Are Different Read: random access Write: maintain sort order The most compact mode to preserve a language
  5. 5. Problem: English Dictionary Johnson's English dictionary
  6. 6. Problem: Urdu Dictionary Farhang-e-Asifiyah
  7. 7. Related Work
  8. 8. Unicode Collation Ordered assembly of written information Unicode values != natural collation Arabic script: U+0600 to U+06FF Out of order alphabets in derived languages Common Locale Data Repository (CLDR)
  9. 9. Collation Discrepancies Compound letters Diacritical marks Half letters Prefixes
  10. 10. Nested Ordering Root word sorting (Arabic) Morphological derivation Derived word simplification Radicals and strokes (Chinese)
  11. 11. Indexing: Ordered Pages
  12. 12. Indexing: Sparse Index
  13. 13. Indexing: Full Index
  14. 14. Indexing: Location Index
  15. 15. Indexing State Transition
  16. 16. Annotation
  17. 17. Digitization
  18. 18. Dictionary Explorer Multilingual Multi-dictionary Lookup Searching and Exploring Annotation and digitization User Contribution and Feedback Open Source => GitHub:/urduweb/DictionaryExplorer
  19. 19. Dictionary Explorer: English Dictionary Explorer: English
  20. 20. Dictionary Explorer: Urdu Dictionary Explorer: Urdu
  21. 21. Indexing Time Dictionary Pages Index Mode Time English to Urdu 180 Sparse Manual and Script 10 minutes Monolingual Urdu 2,500 Sparse Manual 2 hours Monolingual Classic Urdu 3,200 Full* Crowdsource** 60 days * 75,000 words, phrases, proverbs, and idioms ** 13 contributors
  22. 22. Prefix Permutations
  23. 23. Prefix: One
  24. 24. Prefix: Two
  25. 25. Prefix: Three
  26. 26. Prefix: Four
  27. 27. Prefix: Five
  28. 28. Prefix: Six
  29. 29. Conclusions and Future Work Identified issues Too many matches Lack of fielded searching Lack of OCR support No input method assistance Collation chalanges Accessibility levels: Ordered Pages, Sparse, Full, and Location indexes, annotation, and digitization Implemented a multi-lingual multi-dictionary explorer Effort and prefix evaluation In future: elastic index and automatic region estimste GitHub:/urduweb/DictionaryExplorer
  30. 30. Sawood Alam @ibnesayeed

×