Your SlideShare is downloading. ×
P1120625101
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

P1120625101

372
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
372
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Uyghur language processing on the Web Dr. Waris Abdukerim Janbaz , Prof. Imad Saleh Paragraphe Laboratory, University of Paris VIII, France warisabdukerim@yahoo.com, isaleh@wanadoo.fr http://paragraphe.univ-paris8.fr Abstract navigators) and correctly displaying Uyghur characters In this paper, we discuss some important issues related to presented huge difficulties. In spite of the fairly passive web processing of an agglutinative Turkic language – attitude of Government authorities to the development of Uyghur. Especially, we will discuss the advent of Uyghur information technology, many individuals started grassroots efforts on Uyghur Unicode font developing, creating Uyghur websites using the three above Uyghur character displaying, font embedding and mentioned script. ASU, used by the most populous Uyghur character inputting method within Uyghur- segment of XUAR Uyghurs caused special coding support-less environment. We will also introduce a problems given that it uses a non-standard set of Arabic- multiscript conversion application to further use the based glyphs. Unicode standard for Uyghur language processing. 2. Background Keywords: Unicode, Font, Turkic Language, multiscript, For ASU, before 2002, either of the two following transliteration, Arabic-Script Uyghur, Cyrillic-Script methods became very common on web publishing in Uyghur, Latin-Script Uyghur. Uyghur: 1) font downloading; and/or 2) image format. There is no need to explain the inconvenience of the 1. Introduction second method. More interesting but complex problems The Uyghurs are a Turkic-speaking ethnic group, occurred in the case of the first one. The major problem officially about nine million, inhabiting in Central Asia came from the fact that every web site owner created and including today’s Xinjiang Uyghur Autonomous Region named his/her own fonts, and users/visitors had to (hereafter: XUAR, also called Chinese Turkistan) as well download a specific font (or different fonts) for almost as parts of Kazakhstan and urban regions in the Ferghana every single website. No one accepted the font name and valley. The official writing system of the XUAR Uyghurs coding of the other, and no common standard was created. is Arabic-Script Uyghur 1 (hereafter: ASU) whereas the Most of the fonts created during this period, either Cyrillic-Script Uyghur2 (hereafter: CSU ) is still in used replaced the ASCII characters or replaced the Unicode by the Uyghurs of the ex-Soviet Union Republics Arabic characters (0x600-0x6FF) with Uyghur characters, (USSR). The newly introduced transliteration 3 – Latin- without replacement agreement. Since the number of the Script Uyghur 4 (hereafter: LSU) has become widely Arabic letters in the code rage 0x600-0x6FF is larger accepted among Uyghurs and Uyghurologists is a than the number of ASU letters, people made different commonly used standard for the transliteration for both choices as they replaced some Arabic characters with ASU and CSU. ASU characters. Therefore, multiplication of the font The influence of web publishing started appearing in names and the growth of coding differences (for the same Uyghur society in the last 10 years. Since the existing glyphs) among the fonts became an obstacle to the platforms don’t supply any Uyghur input method nor any development of ASU computer processing and web fonts that including all the glyphs of the ASU alphabet, publishing. A large number of issues regarding non- inputting Uyghur text into interactive web pages (in the standard fonts and their use were addressed in many different ways to the individual computer scientists. Meanwhile, many of these problems were circumvented 1 See annex 2 by using methods unrelated to the Unicode standard. As a 2 See annex 1 result, web site creators eventually expressed their strong 3 Using one writing system to represent words in another is desire to further use the Unicode standard for Uyghur called transliteration. language processing. 4 called Uyghur Kompyutér Yéziqi (UKY) or Uyghur Latin Yéziqi (ULY) in Uyghur, meaning “Uyghur Computer Writing” In June 2002, the author developed the first Uyghur or “Latin-Script Uyghur”. See Unicode font and implemented both system-level and http://www.ukij.org/teshwiq/UKY_Heqqide(KonaYeziq).htm browser-level Input Method Editors for Windows. It
  • 2. became a revolutionary accomplishment, owing mostly The creation of a Unicode based Uyghur font has became to the new method and applications that are fully a necessity for the progress of Uyghur information Unicode-compliant (as opposed to occasionally processing since the existing platforms do not include compatible). Hence, a campaign was launched to (supply) any Uyghur font. Existing fonts (both Arabic popularize and adapt the Unicode standard for Uyghur fonts and other fonts which include Arabic letters) do not fonts. In this paper, we present the entire process that we include all the necessary shapes of Uyghur letters (see have been following and developing for three years. The annex 2), and therefore some substitution sequences following subsections will cover four major parts of the mislead display problems. For example: entire implementation procedure. ‫1. ﺋﺎﻟەﻣﺪىﻜﻰ هەﻣﻤە ﺋىﻨﺴﺎن ﻗەﺑىﻪ ﺋەﻣەس‬ ‫2. ﺋﺎﻟﻪﻣﺪﯨﻜﻰ ھﻪﻣﻤﻪ ﺋﯩﻨﺴﺎﻥ ﻗﻪﺑﯩﻬ ﺋﻪﻣﻪﺱ‬ 3. Uyghur Unicode font developing (Not all human beings in the world are evil) Uyghur (ASU) letters have been developed on the basis The first sentence above is considered illegal character of the Arabic alphabet from Arabic. The ASU alphabet combination if it uses existing fonts (ex: Times New has 8 vowels5 and 24 consonants (see annex1). Uyghur, Roman, Traditional Arabic) because the cursive shapes of just like Arabic, is written from right to left, each letter ‫ ﺋﻪ ,ھ ,ﻯ‬are not correct according to the ASU alphabet having different shapes depending on its position in a (see annex 2). It should appear as in sentence 2 in which word. The Uyghur letters have initial, median, final and the letters use a specific font — UKIJ Tuz Tom. In order isolated forms; some letters have conjunct forms6. In total, to create right cursive connection forms for Uyghur, it the Uyghur alphabet has 126 different glyphs. The 108 was necessary to take special measures for three basic glyphs 7 of the Uyghur letters have already been problem-letters‫ ﺋﻪ ,ھ , ﻯ‬and two “glottal stop signs ‫”ﺌ , ﺉ‬ accepted by the Unicode Consortium/ISO, and 18 glyphs8 out of the 20 glyphs for composed forms were added in (supported hamze), during the creation of Uyghur fonts. 1998. Unfortunately, two conjunct median forms (of the The absence of such measures would make it impossible Uyghur letters ‫ ﺋﯥ‬and ‫ 9ﺌﯧ )ﺋﻰ‬and ‫ 01ﺌﯩ‬are still absent11 in to display the cursive forms of the three letters correctly in browsers and other application software. the Unicode Standard’s table 12 – Arabic Presentation ‫ : 31 ﻯ‬Uyghur letter i as in ishik (‫ ,ﺋﯩﺸﯩﻚ‬door). The 8 forms-A. This lack renders the Unicode Consortium/ISO as it stands incomplete and this has forced people to different forms are listed in the table 1 below. For the supplement it through borrowing from FBD1 and FBD2 initial′ and median′ forms (‫ )ﯨ , ﯩ‬of this letter we use the the “supported hamze” which is then combined with the initial and median forms of the Arabic letter ‫ ;9460 ﻯ‬for median′ form of ‫ ﺋﯥ‬and ‫ ﺋﻰ‬to generate two synthetic the final′ and isolated′ forms (‫ )ﻯ , ﻰ‬we use the final and combined letters. isolated forms of the Farsi letter ‫60 ﻯ‬CC, respectively. The 20 conjunct glyphs can also be expressed as a ‫ :41ﺋﻪ‬Uyghur letter e as in eyneklerde (‫ ,ﺋﻪﻳﻨﻪﻛﻠﻪﺭﺩە‬in the sequence of two existing Unicode glyphs (as it is the case now for the two missing conjunct glyphs). But this kind mirrors). This letter uses the final and isolated glyph s(‫, ﻩ‬ of usage may cause problems like reducing text inputting ‫ )ﻪ‬of the Arabic letter ‫(7460 ھ‬h), in the same way as speed, increasing data storage redundancy, complicating Persian does. This causes a special problem due to the data sorting operations etc. fact that the glyphs of Arabic ‫(7460 51ھ‬h) in the initial and median positions(‫ )ھ , ﻬ‬correspond to those of Uyghur 5 The Arabic alphabet only has 3 letters and for long vowels ‫( ھ‬h as in ‫ ھﯧﻠﯩﻬﻪﻡ‬hélihem, even now; ‫ ﮔﯘﻧﺎھ‬gunah, sin or uses ‫ .ﺍ ﻭ ﻱ‬The others are not noted in normal writing. Given its offense; ‫ ﻗﻪﺑﯩﻬ‬qebih, odious), which, in turn, has different phonetic characteristics, Uyghur notes down all vowels: ،‫ﺋﺎ، ﺋﻪ‬ ‫ , ﺋﻮ، ﺋﯘ، ﺋﯚ، ﺋﯜ، ﺋﯥ، ﺋﻰ‬using derivates of traditional Arabic final and isolated glyphs(‫ .)ھ , ﻬ‬In order to deal with this letters. inconsistency, we have chosen to use 06D5 for the 6 The initial form and, under some circumstances, the median Uyghur letter ‫ ﺋﻪ‬and 06BE for the Uyghur letter ‫.ھ‬ form of all vowels is preceded by one “glottal stop sign ‫ ﺉ‬or ‫”ﺌ‬ iso.′ fin.′ med.′ ini.′ iso. fin. med. ini. (supported hamze) with which they form a common letter ‫ﺍ‬ ‫ﺎ‬ ‫ﯫ‬ ‫ﯪ‬ (treated by Uyghur as a single letter, see annex 2). ‫ ﻝ‬followed ‫ﻩ‬ ‫ﻪ‬ ‫ﯭ‬ ‫ﯬ‬ by ‫ ﺍ‬forms ‫ ﻼ‬or ‫ ﻻ‬depending on their position. 7 ‫ﻭ‬ ‫ﻮ‬ ‫ﯯ‬ ‫ﯮ‬ See http://www.oyghan.com/images/UyghurUnicodeTable.gif 8 See Arabic Presentation Forms-A, glyph code range: FBEA – ‫ﯗ‬ ‫ﯘ‬ ‫ﯱ‬ ‫ﯰ‬ FBFB. See also table 1. ‫ﯙ‬ ‫ﯚ‬ ‫ﯳ‬ ‫ﯲ‬ 9 Character name for the Unicode Standard: ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E ‫ﯛ‬ ‫ﯜ‬ ‫ﯵ‬ ‫ﯴ‬ MEDIAN FORM. Ex: ‫( ﺑﺎﻏﺌﯧﺮﯨﻖ‬Baghériq). 10 Character name for the Unicode Standard: ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA 13 Character name for the Unicode Standard: ARABIC ABOVE WITH ALEF MAKSURA MEDIAN FORM. Ex: LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA ‫( ﻗﻪﺗﺌﯩﻲ‬certainly, doubtlessly) (represents YEH-shaped letter with no dots in any positional 11 The XUAR’s delegation members, Prof. Hoshur Islam and form), 0649. 14 Yasin Imin, who have submitted the proposition also admit this Character name for the Unicode Standard:ARABIC LETTER fault. See also Arabic Presentation Forms-A (code range: FBEA AE (Uighur, Kazakh, Kirghiz), 06D (isolated form is ‫.)ە‬ 15 – FBFB). See http://www.unicode.org/standard/where/ , Variant shapes 12 http://www.unicode.org/charts/PDF/UFB50.pdf of the Arabic character hah.
  • 3. ‫ې‬ ‫ﯥ‬ ‫ﯧ‬ ‫ﯦ‬ ‫ﯶ‬ ‫ﯷ‬ ‫ﺌﯧ‬ ‫ﯸ‬ and RTL (right to left mark; 200F), is also recommended ‫ﻯ‬ ‫ﻰ‬ ‫ﯩ‬ ‫ﯨ‬ ‫ﯹ‬ ‫ﯺ‬ ‫ﺌﯩ‬ ‫ﯻ‬ in any Uyghur font. The rest of the time-consuming repetitive font developing task is absolutely the same as ‫ھ‬ ‫ﻬ‬ ‫ﻬ‬ ‫ھ‬ when creating an Arabic script font 20 . Some Uyghur Table 1. Uyghur vowels and the three problem-letters (the one Arabic character ‫ ھ‬hah has four different basic shapes, which correspond to the Unicode fonts are available for free at the UCSA website. four shapes of two different letters in Uyghur). Our recommended font creating tools are: Font Creator21 and Fontographer 22 . Glyph substitutions, positioning ‫ ﺉ‬and ‫ :61ﺌ‬the glottal stop: this is a phoneme which is not lookups and shaping features and Open Type tables of listed separately in the ASU alphabet but still covered by Arabic fonts can be added with the help of software like its spelling rules. In Uyghur words, the glottal stop is not Microsoft VOLT. as strongly pronounced as it is in Semitic languages or in Uzbek, for example, and it has weakened to become no 4. Font embedding and character displaying more than a hiatus. Marked in ASU by a hamza on top of Web pages can be rendered without downloading or a “tooth”, it appears usually in words of Arabic origin installing any specific fonts if: 1) the fonts used in the and replaces an original ‘ain (‫ )ع‬or a hamza (‫ )ء‬in a pages are available on user’s computer, and 2) if the median or final position (e.g. ‫ ﺋﺎﻟﻪﻡ‬from Arabic ‫,ﻋﺎﻟﹶﻢ‬ browsers provide native support for the fonts and ‫ ﺳﺎﺋﻪﺕ‬from Arabic ‫ ﺧﺎﺋﯩﻦ ,ﺳﺎ َﺔ‬from Arabic ‫ﺳﻮﺋﺎﻝ , ﺧﺎﺋِﻦ‬ ‫ﻋ‬ languages used. The second condition has already been from Arabic ‫ .)ﺳ َال‬In initial position, the same sign is ‫ُﺆ‬ met but unfortunately the first one has not yet, as there considered as part of the initial form of a vowel and does are no Uyghur fonts available on the existing platforms not have any phonetic value 17 . They correspond to the that are installed on the users’ computers. Therefore, to initial and median forms of the Arabic letter ‫.6260 ئ‬ ensure that Uyghur texts are displayed correctly in web These Arabic glyphs are not considered as different browsers, users must find a way to install in their shapes of any independent letter in the Uyghur alphabet computers the fonts that are used in the web pages. The (cf. annex 2). Since one glyph of each of the two letters same holds true for all the other “forgotten languages” on ‫ ﺋﯥ‬and ‫( ﺋﻰ‬shown in light red in the table above) are still different platforms. The font installation requirement either causes difficulties for people who don’t have much missing in Unicode, we can use a sequence of either of technical experience, or discourages others from these glyphs ( ‫ ﺉ‬or ‫ )ﺌ‬followed by the final, isolated, attempting to read the text. median′ or final′ forms of vowels ‫ ﺋﯥ‬and ‫( ﺋﻰ‬shown in These difficulties can be overcome by embedding fonts blue in the table above). More precisely, the other into the web pages. When a page is downloaded into a conjunct forms can be obtained combining with the browser via the Hypertext Transfer Protocol, any Arabic letter ‫ 6260 ئ‬and a vowel respectively. embedded fonts in the page are also downloaded without In spite of the above mentioned limitations (two glyphs any need for the user to intervene. The Microsoft Web instead of one conjunct glyph for ‫ ﺋﯥ‬and ‫ )ﺋﻰ‬the above Embedding Fonts Tool—WEFT 23 makes it possible to mentioned conventions have now been widely accepted create embedded font objects that can be linked to web by the Uyghur Computer Science Association(UCSA18), pages. The following steps let web pages developers and at a later date, by the Xinjiang University branch of create embedded fonts and link them to a web page: the 863 Research Group19. • Create embedded fonts using Microsoft WEFT After having learnt the specificities of those letters, it is • Prepare the web page using any fonts that are easy to create Uyghur fonts using existing font creating installed on the platform, and software. The inclusion of non-spacing combining marks, • Link the embedded fonts to the web page. such as ZWJ (zero width joiner 200C), ZWNJ (zero Microsoft WEFT generates 1) embedded fonts for every width non-joiner; 200D), LTR (left to right mark; 200E), web site with a different extension (.EOT), and 2) a script that links an embedding font to a web page. The 16 disadvantage of the WEFT generated embedded fonts is Character name for the Unicode Standard: ARABIC LETTER YEH WITH HAMZA ABOVE <initial> and that the fonts are compatible only with Internet Explorer. <median> 0626. This makes it highly desirable for more efforts to be 17 It is often said that the decision of Uyghur linguists to add invested in providing a cross-platform functionality for this sign as part of the initial form of letters is a link with the this kind of software. old Uyghur writing system, in which all initial vowels were preceded by a tooth. The Arabic alphabet has 3 letters, ‫ و ,ا‬and ‫ ي‬which can be used to indicate long vowels. Short vowels can be indicated through the use of vowel marks above or under the consonants but which are dispensed of in normal writing. Given its phonetic characteristics, Uyghur notes down all vowels: ،‫ﺋﺎ‬ ‫ ,ﺋﻪ، ﺋﻮ، ﺋﯘ، ﺋﯚ، ﺋﯜ، ﺋﯥ، ﺋﻰ‬using derivates of traditional Arabic 20 See letters. http://www.microsoft.com/typography/OpenType%20Dev/arabi 18 UCSA – The Uyghur Computer Science Association (or c/intro.mspx for more information about developing OpenType UKIJ – Uyghur Kompyutér Ilimi Jem’iyiti in Uyghur) is a non- Fonts for Arabic Script 21 profit association, founded by the author in Jan 2004. Web site: http://www.high-logic.com/fontcreator.html 22 http://www.ukij.org http://www.fontlab.com/Font-tools/Fontographer 19 23 A National High-Tech Research Group, financed by the PRC Free software at government. The XJU branch is specialized in multilingual http://www.microsoft.com/typography/web/embedding/default. software development. htm
  • 4. 5. Creation of a browser-level virtual input events” module frees the hook immediately after the user method decides to switch the inputting language to another one. As mentioned in the introduction, the existing platforms This method has been implemented using JavaScript and do not supply any system-level Uyghur language VBScript language, tested on different browsers and inputting service. Late in 2003, the first system-level commonly used in some Uyghur web sites25. Uyghur Unicode IME for Windows was developed by the author and distributed free of charge24. Six month later, 6. Multiscript converting the Xinjiang University branch of the 863 Research Due to the co-existence of different writing systems Group and some individuals started joining the Uyghur (Arabic-Script Uyghur, Cyrillic-Script Uyghur and Latin- Unicode Popularization campaign by distributing their Script Uyghur) for the Uyghur language, research on a Unicode-supported IME. Nevertheless, it still can not be conversion tool with which people can toggle between said that all or even most Uyghur internet users are the three scripts is forthcoming for future information equipped with Uyghur inputting tools. Therefore, the sharing. The fact that there is one-to-one browser-level inputting method still fills a great need correspondence 26 between the letters of these three since it enables people to input Uyghur letter into any writing systems is certainly a major helping factor. For text-inputting field on a web page without having to better understanding, we take an example of the Uyghur install a system-level Uyghur IME. The basic structure of proverb “working for free is better than doing nothing” in the browser-level Uyghur text inputting tool is three scripts: ‫ﺑﯩﻜﺎﺭ ﻳﯜﺭﮔﯩﭽﻪ ﺑﯩﻜﺎﺭ ﺋﯩﺸﻠﻪ‬ represented as in figure 1: бикар йүргичə бикар ишлə bikar yürgiche bikar ishle The following basic workflow explains the basic Keyboard and mouse events conversion process: Source text in source script Input Uyghur? no yes Pre-processing Capture K.&M. Events Character mapping Code – Char. Mapping Character converting Dispatch Events Disambiguation no Switch Lang.? no Conversion end.? yes yes Release K.&M. Events Result in destination script Figure 1. workflow of the browser-level inputting method Figure 2. script converting As we can see from the workflow above, once the user The functionalities of each module may require some selects the Uyghur Inputting option, the “capture clarification: keyboard and mouse events” module creates a hook to Pre-processing: this is an important step in converting. It monitor the keyboard and mouse activities. The “code- involves preserving elements that should remain char. mapping” module creates a keycode-to-Uyghur- unchanged27 after the conversion. For example, when Character matrix to get the right Uyghur character that converting LSU text “Men Photoshop ni yaxshi körimen” corresponds to the key code (ex: 109 ‫ .)ﻡ‬The “dispatch (I love Photoshop) into ASU, we should be able to obtain events” module sends Uyghur characters from the map to “‫ ﻧﻰ ﻳﺎﺧﺸﻰ ﻛﯚﺭﯨﻤﻪﻥ‬Photoshop ‫ ”ﻣﻪﻥ‬and vice-versa. the active text inputting field on a web page. This process repeats itself until the “release keyboard and mouse 25 See www.ukij.org , www.biliwal.com, www.oyghan.com, www.uyghurdictionary.org etc. 26 The only exception is j (as in jurnal) in LSU 24 27 More than 200,000 downloads counted since Dec 2003 from This is the case of hypertext links, HTML tags and proper www.oyghan.com and www.bizuyghur.com/oyghan . names.
  • 5. Character mapping: creates an “A_is_B” matrix for The embeddable web fonts, generated by third-party every script pair, or three matrices in total. software WEFT, are compatible only with Internet Character converting: uses the three matrices in order to Explorer. Therefore, we are truly looking forward to convert between the different scripts. more efforts by the computer software industry to expand Disambiguation: this module is necessary when compatibility. We expect to improve the pre-processing converting from LSU to ASU and/or CSU, because of module of the converting tool to make it more user- spelling mistakes or, more importantly, because of the friendly. There are undoubtedly other theoretical issues to problems due to the difficulty encountered in typing the resolve especially in the disambiguating of LSU LSU diacritical makes on many keyboards: very misspelled words. commonly, the letters Ö, Ü, É, ö, ü and é are replaced by Another important problem related to Uyghur is the O, U, E, o, u and e. This may cause fatal errors. For major impediment to developing a spell-check example: öltürüsh (to kill) olturush(to sit, party), functionality caused by its agglutinative language, térim yer (cultivable land) terim yer (who eats my coupled with associated spelling changes in root words. sweat), yétim(orphan) yetim(spelling mistake). This work is going to be the focus of our attention in a Besides, spelling mistakes due to the poor grasp of LSU next stage of development. rules are significant problem. All these problems require Finally, we call on software companies not to omit the intensive language processing. This functionality of the Uyghur from their supported language list in the future. multiscript converting tool28 that we have released on the internet is still under development. The following images 8. References will help you understand our converting tools which use [1] Waris A. Janbaz, Online Uyghur Unicode processing above mentioned methods. technique and its implementation (publication in Chinese), Xinjiang University Press, China, 2002. [2] Abdurehim, Waris A. Janbaz, Orthographic rules of the Latin-Script Uyghur (in Uyghur) , 2004, http://www.ukij.org/teshwiq/UKY_Heqqide(KonaYe ziq).htm. [3] The Unicode Consortium The Unicode Standard, Version 4.0, Addison-Wesley Professional, ISBN: 0321185781, USA, 2003. [4] Xinjiang University, Proceedings 2000 International Conference on Multilingual Information Processing. Ürümchi (publication in Chinese), China, 2000. [5] The Unicode Consortium Website Image 1. Offline plug-in version for Microsoft Word http://www.unicode.org [6] Reinhard F. Hahn, Spoken Uyghur. Washington: the University of Washington Press, ISBN: 0-295- 97015-4, USA, 1991. Annex 1: Arabic-Script Uyghur, Cyrillic- Script Uyghur and Latin-Script Uyghur Alphabets ‫ﺥ‬ ‫چ‬ ‫ﺝ‬ ‫ﺕ‬ ‫پ‬ ‫ﺏ‬ ‫ﺋﻪ‬ ‫ﺋﺎ‬ ASU x ch j t p b e a LSU x ч җ т п б ə а CSU ‫ﻑ‬ ‫ﻍ‬ ‫ﺵ‬ ‫ﺱ‬ ‫ژ‬ ‫ﺯ‬ ‫ﺭ‬ ‫ﺩ‬ ASU f gh sh s j (zh) z r d LSU Image 2. Online demo version ф ғ ш c ж з р д CSU 7. Conclusions and future work ‫ھ‬ ‫ﻥ‬ ‫ﻡ‬ ‫ﻝ‬ ‫ڭ‬ ‫گ‬ ‫ﻙ‬ ‫ﻕ‬ ASU Our work to date has focused mainly on the design and LSU implementation issues related to creating Uyghur h n m l ng g k q Unicode fonts, as well as on browser-level input method һ н м л ң г k қ CSU and multi-script converting application. According to ASU ‫ﻱ‬ ‫ﺋﻰ‬ ‫ﺋﯥ‬ ‫ۋ‬ ‫ﺋﯜ‬ ‫ﺋﯚ‬ ‫ﺋﯘ‬ ‫ﺋﻮ‬ user feedback, we feel fairly satisfied with the results of this first ever research on Uyghur language processing. y i é w ü ö u o LSU й и e в ү ө у o CSU 28 Online demo version is available at Additional Cyrillic letters : ы ё ц э ю я http://www.uyghurdictionary.org/tools.asp, offline plug-in version for Microsoft Word is available at http://oyghan.com/OTB/index.html
  • 6. Annex 2: Arabic-Script Uyghur Alphabet with shapes