Chinese Minority Language Support in OpenOffice.org Institute of Software, Chinese Academy of Sciences Lead / Tibetan Nati...
Agenda <ul><li>Status quo </li></ul><ul><li>Fund support from government  </li></ul><ul><li>Language computing features  <...
Status quo <ul><li>China is a multi-lingual and multi-cultural country </li></ul><ul><ul><li>29 minorities have their own ...
Fund support from government <ul><li>863 Hi-Tech Research and Development Program of China </li></ul><ul><ul><li>“Linux Op...
Language computing features <ul><li>Writing Direction & formatting style  </li></ul><ul><ul><li>Uighur: bidirectional text...
<ul><li>Complex Script  </li></ul><ul><ul><li>Character shaping </li></ul></ul><ul><ul><ul><li>The same character takes di...
Character set and encoding <ul><li>In 1980's </li></ul><ul><ul><li>Software provider take their own designed character set...
Support in OpenOffice.org <ul><li>Rendering </li></ul><ul><ul><li>Smart font  —  OpenType </li></ul></ul><ul><ul><ul><li>G...
Support in OpenOffice.org(Cont.) <ul><li>Rendering </li></ul><ul><ul><li>Mongolian support in ICU LayoutEngine </li></ul><...
Support in OpenOffice.org(Cont.) <ul><li>Rendering  </li></ul><ul><ul><li>Bypass Uniscribe on Windows </li></ul></ul><ul><...
Support in OpenOffice.org(Cont.) <ul><li>Vertical Text Formatting for Mongolian </li></ul><ul><ul><li>Map the vertical tex...
Support in OpenOffice.org(Cont.) <ul><li>Vertical text formatting for Mongolian </li></ul><ul><ul><li>Three functions in  ...
Support in OpenOffice.org(Cont.) <ul><li>Locale </li></ul><ul><ul><li>ICU </li></ul></ul><ul><ul><li>I18n/l10n </li></ul><...
Support in OpenOffice.org(Cont.) <ul><li>GUI translation  </li></ul><ul><ul><li>Step 1: Add the New Language to the Resour...
Demo
Demo (Cont.)
Demo (Cont.)
Problem & future work <ul><li>Problems  </li></ul><ul><ul><li>It's impossible to benefit from developing software for mino...
Conclusion <ul><li>Minority language is an amazing world </li></ul><ul><ul><li>The languages will be lost if they aren't s...
<ul><li>Thanks! </li></ul>
Upcoming SlideShare
Loading in...5
×

Chinese Minority Language Support in OpenOffice.org

2,128

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,128
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Chinese Minority Language Support in OpenOffice.org"

  1. 1. Chinese Minority Language Support in OpenOffice.org Institute of Software, Chinese Academy of Sciences Lead / Tibetan Native-lang Project Yanmin Jia 贾彦民 [email_address]
  2. 2. Agenda <ul><li>Status quo </li></ul><ul><li>Fund support from government </li></ul><ul><li>Language computing features </li></ul><ul><li>Character set and encoding </li></ul><ul><li>Support in OpenOffice.org </li></ul><ul><li>Demo </li></ul><ul><li>Problem & future work </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Status quo <ul><li>China is a multi-lingual and multi-cultural country </li></ul><ul><ul><li>29 minorities have their own languages/scripts </li></ul></ul><ul><ul><ul><li>Tibetan , Mongolian , Uighur , Kazak, Khalkhas, Yi, and Tai Le </li></ul></ul></ul><ul><ul><li>Population of native speakers </li></ul></ul><ul><ul><ul><li>Tibetan: more than 2 millions in China </li></ul></ul></ul><ul><ul><ul><ul><li>Ladakh, Nepal, Bhutan, and north area of India bordering Tibet </li></ul></ul></ul></ul><ul><ul><ul><li>Mongolian: more than 3 millions </li></ul></ul></ul><ul><ul><ul><ul><li>Inner Mongolia & outer Mongolia </li></ul></ul></ul></ul><ul><ul><ul><li>Uighur: more than 8 millions </li></ul></ul></ul><ul><ul><ul><ul><li>Sinkiang </li></ul></ul></ul></ul><ul><li>Language computing </li></ul><ul><ul><ul><li>Microsoft platform </li></ul></ul></ul><ul><ul><ul><ul><li>Unscribe in Vista </li></ul></ul></ul></ul><ul><ul><ul><li>Institute of Software Chinese Academy of Sciences </li></ul></ul></ul><ul><ul><ul><ul><li>Red Flag & OpenOffice.org </li></ul></ul></ul></ul>
  4. 4. Fund support from government <ul><li>863 Hi-Tech Research and Development Program of China </li></ul><ul><ul><li>“Linux Operating System and Office Suite for Minority Scripts” (2003AA1Z2110) </li></ul></ul><ul><li>Knowledge Innovation Project sponsored by Chinese Academy of Sciences </li></ul><ul><ul><li>“Platform-Independent Tibetan Information Processing System Based on Linux” (KGCX2-SW-504) </li></ul></ul><ul><li>Electronic Product Development Fund sponsored Ministry of Information Industry </li></ul><ul><ul><li>Cross-platform Tibetan Office suite </li></ul></ul>
  5. 5. Language computing features <ul><li>Writing Direction & formatting style </li></ul><ul><ul><li>Uighur: bidirectional text </li></ul></ul><ul><ul><ul><li>Right to left horizontally </li></ul></ul></ul><ul><ul><li>Mongolian </li></ul></ul><ul><ul><ul><li>From left to right vertically </li></ul></ul></ul><ul><ul><li>Tibetan </li></ul></ul><ul><ul><ul><li>From left to right horizontally </li></ul></ul></ul><ul><ul><ul><li>Special Line breaking behavior </li></ul></ul></ul>
  6. 6. <ul><li>Complex Script </li></ul><ul><ul><li>Character shaping </li></ul></ul><ul><ul><ul><li>The same character takes different shapes depending on the context </li></ul></ul></ul><ul><ul><li>Ligature </li></ul></ul><ul><ul><ul><li>Certain character sequences is rendered as one single shape </li></ul></ul></ul><ul><ul><li>Character positioning </li></ul></ul><ul><ul><ul><li>Grapheme & pre-composed character </li></ul></ul></ul>Language computing features (Cont.)
  7. 7. Character set and encoding <ul><li>In 1980's </li></ul><ul><ul><li>Software provider take their own designed character set </li></ul></ul><ul><li>Since 1997 </li></ul><ul><ul><li>Unicode standard </li></ul></ul><ul><ul><ul><li>Tibetan 1997 (U+0F00~U+0FFFF) </li></ul></ul></ul><ul><ul><ul><li>Mongolian 2000 (U+1800~U+18AF) </li></ul></ul></ul><ul><ul><ul><li>Uighur use Arabic as its writing system </li></ul></ul></ul><ul><li>Tibetan pre-composed character set standard </li></ul><ul><ul><li>Tibetan coded character set Extension A (GB/T20542-2006) </li></ul></ul><ul><ul><ul><li>popular used Tibetan BrdaRten (Pre-composed character) </li></ul></ul></ul><ul><ul><li>Tibetan coded character set Extension B </li></ul></ul><ul><ul><ul><li>Devanagari transliteration of Tibetan </li></ul></ul></ul><ul><ul><ul><li>Non-BMP </li></ul></ul></ul><ul><li>Tibetan and Himalayan Digital Library </li></ul><ul><ul><li>Jomolhari (Tibetan font) </li></ul></ul>
  8. 8. Support in OpenOffice.org <ul><li>Rendering </li></ul><ul><ul><li>Smart font — OpenType </li></ul></ul><ul><ul><ul><li>GSUB </li></ul></ul></ul><ul><ul><ul><li>GPOS </li></ul></ul></ul><ul><ul><li>Complex Layout Engine </li></ul></ul><ul><ul><ul><li>Input </li></ul></ul></ul><ul><ul><ul><ul><li>an array of Unicode characters in logical order </li></ul></ul></ul></ul><ul><ul><ul><li>Output </li></ul></ul></ul><ul><ul><ul><ul><li>an array of glyph indices </li></ul></ul></ul></ul><ul><ul><ul><ul><li>an array of character indices for the glyphs </li></ul></ul></ul></ul><ul><ul><ul><ul><li>an array of glyph positions </li></ul></ul></ul></ul><ul><ul><li>Vcl </li></ul></ul><ul><ul><ul><li>ICU LayoutEngine (Linux) </li></ul></ul></ul><ul><ul><ul><li>Uniscribe (Windows) </li></ul></ul></ul>
  9. 9. Support in OpenOffice.org(Cont.) <ul><li>Rendering </li></ul><ul><ul><li>Mongolian support in ICU LayoutEngine </li></ul></ul><ul><ul><ul><li>MonglianOpenTypeLayoutEngine </li></ul></ul></ul><ul><ul><ul><li>Mongolian text layout process goes through the following steps </li></ul></ul></ul><ul><ul><ul><ul><li>Subdivide string of characters into runs </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Further divide each run into clusters </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Each cluster is labeled by feature tag </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Apply feature tag information to each cluster </li></ul></ul></ul></ul>
  10. 10. Support in OpenOffice.org(Cont.) <ul><li>Rendering </li></ul><ul><ul><li>Bypass Uniscribe on Windows </li></ul></ul><ul><ul><ul><li>most minority language speakers are windows users </li></ul></ul></ul><ul><ul><ul><li>UniscribeLayout is replaced by IcuLayoutEngine </li></ul></ul></ul><ul><ul><ul><li>the Mongolian, Uighur and Tibetan can be rendered correctly </li></ul></ul></ul>
  11. 11. Support in OpenOffice.org(Cont.) <ul><li>Vertical Text Formatting for Mongolian </li></ul><ul><ul><li>Map the vertical text frame to horizontal text frame by rotation </li></ul></ul><ul><ul><li>Normal horizontal text formatting is performed </li></ul></ul><ul><ul><li>text frame is mapped back to its vertical origin </li></ul></ul><ul><ul><li>It's reversible the map between various direction frames </li></ul></ul>
  12. 12. Support in OpenOffice.org(Cont.) <ul><li>Vertical text formatting for Mongolian </li></ul><ul><ul><li>Three functions in sw determine the location and exchange the width and height of the embedded frames </li></ul></ul><ul><ul><ul><li>SwitchVerticalToHorizontal </li></ul></ul></ul><ul><ul><ul><li>SwitchHorizontalToVertical </li></ul></ul></ul><ul><ul><ul><li>SwapWidthAndHeight </li></ul></ul></ul>
  13. 13. Support in OpenOffice.org(Cont.) <ul><li>Locale </li></ul><ul><ul><li>ICU </li></ul></ul><ul><ul><li>I18n/l10n </li></ul></ul>
  14. 14. Support in OpenOffice.org(Cont.) <ul><li>GUI translation </li></ul><ul><ul><li>Step 1: Add the New Language to the Resource System; </li></ul></ul><ul><ul><li>Step 2: Add the New Language to the Build Environment; </li></ul></ul><ul><ul><li>Step 3: Add the New Language to the Localization Tools; </li></ul></ul><ul><ul><li>Step 4: Extract Strings and Messages from the Source Code; </li></ul></ul><ul><ul><li>Step 5: Translate Extracted Strings and Messages to the New Language; </li></ul></ul><ul><ul><li>Step 6: Merge Translated Strings and Messages to Source Code; </li></ul></ul><ul><ul><li>Step 7: Add new language to the installation set project; </li></ul></ul><ul><ul><li>Step 8: Adding new language to the module “helpcontent” and “readlicense_oo”. </li></ul></ul><ul><ul><li>(http://l10n.openoffice.org/adding_language.html) </li></ul></ul>
  15. 15. Demo
  16. 16. Demo (Cont.)
  17. 17. Demo (Cont.)
  18. 18. Problem & future work <ul><li>Problems </li></ul><ul><ul><li>It's impossible to benefit from developing software for minority language </li></ul></ul><ul><ul><li>Translation is not easy for Chinese minority language </li></ul></ul><ul><ul><ul><li>No uniform glossary </li></ul></ul></ul><ul><ul><ul><li>No enough people mastering not only programming but also minority language </li></ul></ul></ul><ul><ul><li>More fund support </li></ul></ul><ul><li>Future work </li></ul><ul><ul><li>more features </li></ul></ul><ul><ul><ul><li>Transliteration & sorting </li></ul></ul></ul><ul><ul><li>Training & application </li></ul></ul><ul><ul><ul><li>Money </li></ul></ul></ul><ul><ul><li>Work together with OpenOffice.org community </li></ul></ul><ul><ul><li>Strong collaboration with software provider </li></ul></ul>
  19. 19. Conclusion <ul><li>Minority language is an amazing world </li></ul><ul><ul><li>The languages will be lost if they aren't saved </li></ul></ul><ul><li>OpenOffice.org is much valuable for minority Language </li></ul><ul><ul><li>OpenOffice.org should pay more attention on minority language </li></ul></ul><ul><li>Welcome software corporations Keep an eye on Chinese minority Languages </li></ul><ul><ul><li>SUN, Chinese 2000, Novell and so on </li></ul></ul><ul><li>More developing document on OpenOffice.org </li></ul><ul><li>Establish Chinese minority language federation based on OpenOffice.org </li></ul>
  20. 20. <ul><li>Thanks! </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×