Arabic Domain Names   What’s been done so far? Euro-SSIG July 25-31 2008 Dr. Abdulaziz H. Al-Zoman Chairman of Steering Co...
Agenda <ul><li>Introduction  </li></ul><ul><li>Arabic Language Characteristics </li></ul><ul><li>Our contribution methodol...
Introduction What is the problem? <ul><li>Current ASCII-based DNs are  incapable  of representing Arabic characters </li><...
Introduction Internet in the Arab World  <ul><li>Population of Arab world:  340  M ( 5 %  of world population) </li></ul><...
Introduction Obstacles facing Internet in the Arab world <ul><li>Low level of telecommunication infrastructure </li></ul><...
Arabic Language Characteristics <ul><li>Consists of  28  characters. </li></ul><ul><li>Written from  right to left  in a c...
Arabic Language Characteristics <ul><li>Tashkeel (diacritic) </li></ul><ul><ul><li>A  small sing  (not a letter) that is u...
Arabic Language Characteristics <ul><li>Words are separated by  spaces </li></ul><ul><ul><li>A space character is needed b...
Contribution Methodology <ul><li>The work was done based on the following methodology: </li></ul><ul><ul><li>Identifying p...
Work Done so far Identifying areas of contributions <ul><li>Levels of an A-IDN Solution </li></ul>To define the accepted  ...
Work Done so far  Identifying areas of contributions <ul><li>Linguistic issues </li></ul><ul><ul><li>ISSUE 1.1 :  Tashkeel...
 
Work Done so far  Participation & Initiation of Groups <ul><li>MINC :  Multilingual Internet Names Consortium, 2000 </li><...
Work Done so far  Publishing Reports & Papers <ul><li>5 Scientific research papers published in conference proceedings and...
Work Done so far  Publishing Reports & Papers <ul><li>Scientific Research Papers  </li></ul><ul><ul><li>&quot; Arabic Top-...
Work Done so far  Conducting Web Surveys <ul><li>3 On-line web surveys  </li></ul><ul><ul><li>cover most of the linguistic...
Work Done so far  Meeting Linguistic Experts <ul><li>SaudiNIC met with  4 Arabic linguists  to get their guidance regardin...
Work Done so far  Information Dissemination <ul><li>Web sites (in Arabic and English) </li></ul><ul><ul><li>http://www.ara...
Work Done so far  Test Implementations <ul><li>Country level  </li></ul><ul><ul><li>Individually done be some Arab countri...
<ul><li>Participated in the new ICANN Arabic  example.test  domains ( مثال . إختبار ) </li></ul><ul><ul><li>Moderate the A...
Reaching Arabic Domains <ul><li>User can reach Arabic domain names through </li></ul><ul><ul><li>Using ADNPP Root servers ...
Internet ISP أهلا بكم في موقع . السعودية ADNPP Current Solution Participating in ADNPP Proxy Query Response User Web serve...
Internet ISP أهلا بكم في موقع . السعودية ADNPP Plugin Solution Proxy Query Response User Web server AR-ROOT.NIC.NET.SA  AR...
Examples …
Examples …
Examples …
Thanks <ul><li>شكرا </li></ul><ul><li>xn -- mgbti4d </li></ul>
Arabic Domain Names What are the issues if   we expand and look at the whole script?
About Arabic Script <ul><li>The  2 nd   most widely used alphabetic writing system in the world </li></ul><ul><li>Used by ...
Accepted characters for  Arabic, Persian, Urdu, Pashto, Jawi
Arabic Script IDN  - Major Issues <ul><li>Acceptable/disallowed characters </li></ul><ul><ul><li>IDNA200x table (Pvalid /D...
Issues Need Further Investigations  1.  Valid Unicode Codepoints  <ul><li>0600..0603  ; CONTEXTO  # ARABIC NUMBER SIGN..AR...
Disallowed characters by IDNA200X
Recommended Disallowed characters
Issues Need Further Investigations 2.  Combining Marks <ul><li>The use of combining marks with some base characters would ...
Issues Need Further Investigations 3.  Diacritics (Tashkeel) <ul><li>Tashkeel Points: about 9 </li></ul><ul><ul><li>064B-0...
Issues Need Further Investigations 4.  ZWNJ & ZWJ Control Characters <ul><li>The support of  ZWJ  and  ZWNJ  in domain nam...
Issues Need Further Investigations 4.  ZWNJ & ZWJ Control Characters <ul><li>ZWNJ: Visually noticed </li></ul>ح‌بل input[0...
Issues Need Further Investigations 4.  ZWNJ & ZWJ Control Characters <ul><li>ZWNJ: Visually  Unnoticed </li></ul>ط‌بل inpu...
Issues Need Further Investigations 4.  ZWNJ & ZWJ Control Characters مجمع ‍ - الرباط ‍ - الدولي input[0] = U+0645  input[1...
Issues Need Further Investigations 5.  Digits <ul><li>Arabic-Indic VS. Eastern Arabic-Indic digits </li></ul><ul><ul><li>0...
Issues Need Further Investigations 5.  Digits <ul><li>Arabic-Indic vs. European-Arabic digits </li></ul><ul><ul><li>0   1 ...
Issues Need Further Investigations 6.  Similar Shape Characters <ul><li>There are a number of groups of characters that ha...
 
Issues Need Further Investigations 6.  Similar Shape Characters <ul><li>كلمني </li></ul>ک لمني ک لمن ې input[0] = U+0643  ...
Issues Need Further Investigations 6.  Similar Shape Characters <ul><li>كلى </li></ul>ک ل ی ک ل ۍ input[0] = U+06a9  input...
 
Issues Need Further Investigations 7.  Bidirectional Behavior   <ul><li>Arabic script domains will include characters that...
Thanks <ul><li>شكرا </li></ul><ul><li>xn -- mgbti4d </li></ul>
Upcoming SlideShare
Loading in …5
×

Arabic Domain Names: What’s been done so far?

623 views
536 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
623
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Arabic Domain Names: What’s been done so far?

  1. 1. Arabic Domain Names What’s been done so far? Euro-SSIG July 25-31 2008 Dr. Abdulaziz H. Al-Zoman Chairman of Steering Committee Arabic Domain Name Pilot Project
  2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>Arabic Language Characteristics </li></ul><ul><li>Our contribution methodology </li></ul><ul><li>What have we done so far? </li></ul><ul><li>How to reaching Arabic domains? </li></ul><ul><li>If we more time … </li></ul><ul><ul><li>From Language to Script … what are the issues? </li></ul></ul>
  3. 3. Introduction What is the problem? <ul><li>Current ASCII-based DNs are incapable of representing Arabic characters </li></ul><ul><li>Difficulty to reach Arabic sites using English DNs ( pronunciation & spelling problems) </li></ul><ul><li>Full Arabic DNs will encourage Arab users to widely use the Internet </li></ul>Arabic News paper صحيفة الشرق الأوسط www.al-sharqalawsat.com www.asharqalawsat.com www.asharq-alaowsat.com www.elsharkelaosat.com … E-government Site يسّر www.yasser.gov.sa www.yaser.gov.sa www.yasir.gov.sa www.yassir.gov.sa …
  4. 4. Introduction Internet in the Arab World <ul><li>Population of Arab world: 340 M ( 5 % of world population) </li></ul><ul><li>Arab Internet users represent 2.5 % of world users </li></ul><ul><li>Average Internet penetration in Arab world < 10 % </li></ul><ul><li>Less than 10 % who can speak English in the Arab world </li></ul>
  5. 5. Introduction Obstacles facing Internet in the Arab world <ul><li>Low level of telecommunication infrastructure </li></ul><ul><li>Lack of adequate regulations </li></ul><ul><li>High cost </li></ul><ul><li>Computer Illiteracy </li></ul><ul><li>Language barrier </li></ul><ul><ul><li>Contents </li></ul></ul><ul><ul><li>Tools and applications </li></ul></ul><ul><ul><li>Domain names </li></ul></ul>
  6. 6. Arabic Language Characteristics <ul><li>Consists of 28 characters. </li></ul><ul><li>Written from right to left in a cursive style </li></ul><ul><li>Most characters have different shapes </li></ul><ul><ul><li>depending on their position (beginning, middle, or end) within a word </li></ul></ul><ul><ul><li>probably conjugated with preceding and succeeding characters. </li></ul></ul><ul><ul><li>These different shapes for a single character do not count as different code points but they are handled using different fonts </li></ul></ul><ul><ul><li>Letters that can be joined are always joined in both hand-written and printed Arabic. </li></ul></ul>ج ب ا ن ج جـ ـجـ ـج ب بـ ـبـ ـب ا ـا ن نـ ـنـ ـن جبان
  7. 7. Arabic Language Characteristics <ul><li>Tashkeel (diacritic) </li></ul><ul><ul><li>A small sing (not a letter) that is usually put on top or under a character for the purpose of correct pronunciation </li></ul></ul><ul><ul><li>It is not widely used except incase of the possibility of mispronouncing words that have the same letters but with different pronunciations, and hence having different meanings. </li></ul></ul><ul><li>ِ Abbreviations are not widely used </li></ul><ul><ul><li>When an abbreviation is written (in domain name) characters will be joined together … leads to a different word and pronunciation </li></ul></ul><ul><li>Two sets of numerals are used: </li></ul><ul><ul><li>Arabic : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 </li></ul></ul><ul><ul><li>Arabic-Indic : 0 ، 1 ، 2 ، 3 ، 4 ، 5 ، 6 ، 7 ، 8 ، 9 </li></ul></ul><ul><ul><li>Numerals are written from left to right </li></ul></ul>ج َ ب َ انٌ جَبَّانٍ
  8. 8. Arabic Language Characteristics <ul><li>Words are separated by spaces </li></ul><ul><ul><li>A space character is needed between words to get correct shaping </li></ul></ul><ul><ul><li>Connecting words without spaces </li></ul></ul><ul><ul><ul><li>not acceptable </li></ul></ul></ul><ul><ul><ul><li>decrease readability </li></ul></ul></ul><ul><ul><ul><li>Confused with other words </li></ul></ul></ul><ul><li>مدارسخيف </li></ul><ul><li>خطأدبي </li></ul><ul><li>مدارس خيف </li></ul><ul><li>خط أدبي </li></ul>space space
  9. 9. Contribution Methodology <ul><li>The work was done based on the following methodology: </li></ul><ul><ul><li>Identifying problems & areas of contributions </li></ul></ul><ul><ul><li>Participating and initiating interest groups & task forces </li></ul></ul><ul><ul><li>Conducting web surveys </li></ul></ul><ul><ul><li>Publishing reports & papers </li></ul></ul><ul><ul><li>Meeting linguists (face to face) </li></ul></ul><ul><ul><li>Disseminating information to public </li></ul></ul><ul><ul><li>Testing and building local experiences </li></ul></ul>
  10. 10. Work Done so far Identifying areas of contributions <ul><li>Levels of an A-IDN Solution </li></ul>To define the accepted Arabic character set to be used for writing Arabic domain names To define the top-level domains of the Arabic domain name tree structure (i.e., Arabic gTLDs, and ccTLDs ) IETF, UNICODE, … ICANN/IANA, … 4 Arabic root servers 3 Technical solutions 3 Technical solutions 4 Arabic root servers 1 Linguistic issues 2 Arabic TLDs 1
  11. 11. Work Done so far Identifying areas of contributions <ul><li>Linguistic issues </li></ul><ul><ul><li>ISSUE 1.1 : Tashkeel </li></ul></ul><ul><ul><li>ISSUE 1.2 : Kasheeda </li></ul></ul><ul><ul><li>ISSUE 1.3 : Taa-Marbota+Haa </li></ul></ul><ul><ul><li>ISSUE 1.4 : Hamzah </li></ul></ul><ul><ul><li>ISSUE 1.5 : Alif Maqsura+Ya </li></ul></ul><ul><ul><li>ISSUE 1.6 : Numbers </li></ul></ul><ul><ul><li>ISSUE 1.7 : dot or Arabic Zero </li></ul></ul><ul><ul><li>ISSUE 1.8 : Connecting Multiple Words </li></ul></ul><ul><ul><li>ISSUE 1.9 : Space </li></ul></ul><ul><ul><li>ISSUE 1.10 : Mixing Latin & Arabic Characters </li></ul></ul><ul><ul><li>ISSUE 1.11 : Special Charters </li></ul></ul><ul><ul><li>ISSUE 1.12 : Accepted Character Set </li></ul></ul><ul><li>Arabic TLDs </li></ul><ul><ul><li>ISSUE 2.1 : Criteria for selecting an Arabic gTLD </li></ul></ul><ul><ul><li>ISSUE 2.2 : Suggested list of Arabic gTLDs </li></ul></ul><ul><ul><li>ISSUE 2.3 : Criteria for selecting an Arabic ccTLD </li></ul></ul><ul><ul><li>ISSUE 2.4 : Suggested list of Arabic ccTLDs </li></ul></ul>1
  12. 13. Work Done so far Participation & Initiation of Groups <ul><li>MINC : Multilingual Internet Names Consortium, 2000 </li></ul><ul><ul><li>Arabic Working Group </li></ul></ul><ul><li>AINC : Arab Internet Names Consortium, April 2001 </li></ul><ul><ul><li>Founder and member of the board </li></ul></ul><ul><ul><li>Chairman of the Linguistic Committee </li></ul></ul><ul><li>ADNTF : Arabic Domain Name Task Force, Q2/2003 </li></ul><ul><ul><li>Formed under the auspices of ESCWA (UN) </li></ul></ul><ul><ul><li>Issuing an Internet Draft for supporting the Arabic language in domain names </li></ul></ul><ul><li>GCC ccTLDs Group: </li></ul><ul><ul><li>Formed under the auspices of ITC committee of GCC </li></ul></ul><ul><ul><li>GCC Arabic domain name pilot project </li></ul></ul><ul><li>Arab Team for Arabic Domain Names, 2005 </li></ul><ul><ul><li>Formed under the auspices of Arab League </li></ul></ul><ul><ul><li>Arabic domain name pilot project </li></ul></ul>2
  13. 14. Work Done so far Publishing Reports & Papers <ul><li>5 Scientific research papers published in conference proceedings and journals </li></ul><ul><li>Technical reports </li></ul><ul><li>Internet drafts </li></ul><ul><ul><li>http://www.ietf.org/internet-drafts/draft-farah-adntf-ling-guidelines-00.txt </li></ul></ul>3
  14. 15. Work Done so far Publishing Reports & Papers <ul><li>Scientific Research Papers </li></ul><ul><ul><li>&quot; Arabic Top-Level Domain Names &quot;, International Journal of Computer Processing of Oriental Languages, Volume 17 Number 3 September 2004, To Appear. </li></ul></ul><ul><ul><li>&quot; Linguistic Issues in Arabic Domain Names &quot;, In Proceedings of the 17th NCC, KAAU, Al-Madina Almunawarah, Saudi Arabia, 5-8 April, 2004, pp 235-250 [in Arabic] </li></ul></ul><ul><ul><li>&quot; Arabic Top-Level Domain Names &quot;, In Proceedings of the 17th NCC, KAAU, Al-Madina Almunawarah, Saudi Arabia, 5-8 April, 2004, pp 281-296 [in Arabic] </li></ul></ul><ul><ul><li>&quot; Using Arabic Language in writing domain names &quot;, Arab journal of library and information science, Vol 22, No. 3, July 2002, pp. 21-38 [in Arabic]. </li></ul></ul><ul><ul><li>&quot; Using Arabic Language in writing domain names &quot;, In Proceedings of IACIT 2001, JUST, Irbid, Jordan, 13-15 Nov., 2001, pp 264-272 [in Arabic] </li></ul></ul><ul><li>Technical Reports </li></ul><ul><ul><li>“ Supporting the Arabic Language in Domain Names ”, submitted to ADNTF-ESCWA, October 2003 </li></ul></ul><ul><ul><ul><li>The base for the internet draft </li></ul></ul></ul><ul><ul><li>Status Report of the Arabic Linguistic Committee of AINC-September 2001 </li></ul></ul><ul><ul><li>Status Report of the Arabic Linguistic Committee of AINC-April 2002 </li></ul></ul>3
  15. 16. Work Done so far Conducting Web Surveys <ul><li>3 On-line web surveys </li></ul><ul><ul><li>cover most of the linguistic issues with more than 550 responses </li></ul></ul><ul><li>Collected information have been analyzed and compared with the recommendations of the AINC linguistic committee </li></ul><ul><li>Results have been published and presented in conferences </li></ul>4
  16. 17. Work Done so far Meeting Linguistic Experts <ul><li>SaudiNIC met with 4 Arabic linguists to get their guidance regarding the Arabic linguistic issues in domain names. </li></ul>5
  17. 18. Work Done so far Information Dissemination <ul><li>Web sites (in Arabic and English) </li></ul><ul><ul><li>http://www.arabic-domains.org.sa </li></ul></ul><ul><li>Participating in local/regional/international conferences and meetings </li></ul><ul><li>Publishing scientific research papers </li></ul><ul><li>Publishing articles in newspaper and magazine </li></ul><ul><li>Radio programs </li></ul><ul><li>Seminars to public and interested groups </li></ul>6
  18. 19. Work Done so far Test Implementations <ul><li>Country level </li></ul><ul><ul><li>Individually done be some Arab countries (ccTLDs) </li></ul></ul><ul><ul><ul><li>Arabic. English , e.g., نطاق .com.sa </li></ul></ul></ul><ul><ul><ul><li>Problem of mixing languages (left-to-right and right-to-left) </li></ul></ul></ul><ul><li>GCC level ( 2004-2005) </li></ul><ul><ul><li>During the Gulf Cooperation Council (GCC) ccTLDs group meeting on 7 March 2004 , </li></ul></ul><ul><ul><ul><li>“ A Technical Proposal for Implementing Arabic Domain names in the GCC Countries” was presented and accepted </li></ul></ul></ul><ul><ul><li>A technical taskforce was formed and assigned the task to implement the proposal within 6 months </li></ul></ul><ul><li>Arab world (2005 - now) </li></ul><ul><ul><li>The recommendations of the 2nd meeting of the Working Group on Arabic Domain Names, Cairo, May 2005: </li></ul></ul><ul><ul><ul><li>Extend the GCC Pilot Project for Arabic Domain Names to include all members of the Arab League (22 countries). </li></ul></ul></ul><ul><ul><ul><li>Renamed it to be &quot; Arabic Domain Names Pilot Project ”. </li></ul></ul></ul><ul><ul><ul><li>Under the auspices of the Arab League. </li></ul></ul></ul><ul><ul><li>Implemented a browser plug-in </li></ul></ul>7 ae, bh, kw, om, qa, sa
  19. 20. <ul><li>Participated in the new ICANN Arabic example.test domains ( مثال . إختبار ) </li></ul><ul><ul><li>Moderate the Arabic site for the IDNwiki gateway </li></ul></ul><ul><ul><li>Published a technical report about the test “IDN Top Level Domain Evaluations and Testing Report” </li></ul></ul><ul><li>Develop many tools and systems that supports Arabic domain names </li></ul><ul><ul><li>Browser plug-in “Arabic.Arabic” </li></ul></ul><ul><ul><li>Simple IDN registry system </li></ul></ul><ul><ul><li>IDN/ADN converter interface and many other tools </li></ul></ul>Work Done so far Other Testing and tools … 7
  20. 21. Reaching Arabic Domains <ul><li>User can reach Arabic domain names through </li></ul><ul><ul><li>Using ADNPP Root servers </li></ul></ul><ul><ul><ul><li>Through participating ISPs </li></ul></ul></ul><ul><ul><ul><li>Using browser that support IDN </li></ul></ul></ul><ul><ul><li>Using plug-in (Arbic.Arabic) </li></ul></ul><ul><ul><ul><li>Work for both MS IE and Mozilla Firefox </li></ul></ul></ul>ع AR-ROOT.NIC.NET.SA (Arabic Root Server) AR-ROOT.NIC.AE (Arabic Root Server) <ul><li>Slave for all the Arabic ccTLDs. </li></ul><ul><li>(Only NS records + any Glue A records) </li></ul><ul><li>Master for all the Arabic ccTLDs. </li></ul><ul><li>(Only NS records + any Glue A records) </li></ul>AR-CCTLD.NIC.NET.SA (SA Arabic ccTLD Server) <ul><li>Master for “ السعودية ” . </li></ul>NS1.UAENIC.AE (AE Arabic ccTLD Server) <ul><li>Master for “ الإمارات ” . </li></ul>AR-ROOT.QATAR.NET.QA (QA Arabic ccTLD Server) <ul><li>Master for “ قطر ” . </li></ul>Arabic Root servers Arabic ccTLD servers “ السعودية ” “ الإمارات ” “ قطر ” “ . ”
  21. 22. Internet ISP أهلا بكم في موقع . السعودية ADNPP Current Solution Participating in ADNPP Proxy Query Response User Web server AR-ROOT.NIC.NET.SA AR-ROOT.NIC.AE “ السعودية ” “ الإمارات ” “ قطر ” “ . ” ADN Solution DNS System sa ae eg sy … . DNS Resolver Firewall
  22. 23. Internet ISP أهلا بكم في موقع . السعودية ADNPP Plugin Solution Proxy Query Response User Web server AR-ROOT.NIC.NET.SA AR-ROOT.NIC.AE “ السعودية ” “ الإمارات ” “ قطر ” “ . ” ADN Solution DNS System sa ae eg sy … . DNS Resolver Firewall ع
  23. 24. Examples …
  24. 25. Examples …
  25. 26. Examples …
  26. 27. Thanks <ul><li>شكرا </li></ul><ul><li>xn -- mgbti4d </li></ul>
  27. 28. Arabic Domain Names What are the issues if we expand and look at the whole script?
  28. 29. About Arabic Script <ul><li>The 2 nd most widely used alphabetic writing system in the world </li></ul><ul><li>Used by many languages such as: </li></ul><ul><ul><li>Persian, Urdu, Turkish, Kurdish, Pashto, Jawi, … </li></ul></ul><ul><li>It is widely used by more than 43 countries </li></ul><ul><ul><li>more than one billion potential users could be concerned in using Arabic script domain names. </li></ul></ul>
  29. 30. Accepted characters for Arabic, Persian, Urdu, Pashto, Jawi
  30. 31. Arabic Script IDN - Major Issues <ul><li>Acceptable/disallowed characters </li></ul><ul><ul><li>IDNA200x table (Pvalid /Disallowed /ContextO) </li></ul></ul><ul><ul><li>Language tables </li></ul></ul><ul><li>Combining Marks </li></ul><ul><li>Diacritics </li></ul><ul><li>World/label separators (space, ZWNJ, ZWJ, hyphen) </li></ul><ul><li>Digits </li></ul><ul><li>Confusing similar characters (e.g. variant tables) </li></ul><ul><li>Bidirectional </li></ul>
  31. 32. Issues Need Further Investigations 1. Valid Unicode Codepoints <ul><li>0600..0603 ; CONTEXTO # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA </li></ul><ul><li>0604..060A ; UNASSIGNED # <reserved>..<reserved> </li></ul><ul><li>060B..060F ; DISALLOWED # AFGHANI SIGN..ARABIC SIGN MISRA </li></ul><ul><li>0610..0615 ; PVALID # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..AR </li></ul><ul><li>0616..061A ; UNASSIGNED # <reserved>..<reserved> </li></ul><ul><li>061B ; DISALLOWED # ARABIC SEMICOLON </li></ul><ul><li>061C..061D ; UNASSIGNED # <reserved>..<reserved> </li></ul><ul><li>061E..061F ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC Q </li></ul><ul><li>0620 ; UNASSIGNED # <reserved> </li></ul><ul><li>0621..063A ; PVALID # ARABIC LETTER HAMZA..ARABIC LETTER GHAIN </li></ul><ul><li>063B..063F ; UNASSIGNED # <reserved>..<reserved> </li></ul><ul><li>0640..065E ; PVALID # ARABIC TATWEEL..ARABIC FATHA WITH TWO DOTS </li></ul><ul><li>065F ; UNASSIGNED # <reserved> </li></ul><ul><li>0660..0669 ; PVALID # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT </li></ul><ul><li>066A..066D ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STA </li></ul><ul><li>066E..0674 ; PVALID # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIG </li></ul><ul><li>0675..0678 ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER </li></ul><ul><li>0679..06D3 ; PVALID # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE </li></ul><ul><li>06D4 ; DISALLOWED # ARABIC FULL STOP </li></ul><ul><li>06D5..06DC ; PVALID # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN </li></ul><ul><li>06DD ; CONTEXTO # ARABIC END OF AYAH </li></ul><ul><li>06DE ; DISALLOWED # ARABIC START OF RUB EL HIZB </li></ul><ul><li>06DF..06E8 ; PVALID # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL </li></ul><ul><li>06E9 ; DISALLOWED # ARABIC PLACE OF SAJDAH </li></ul><ul><li>06EA..06FC ; PVALID # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER </li></ul><ul><li>06FD..06FE ; DISALLOWED # ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN SI </li></ul><ul><li>06FF ; PVALID # ARABIC LETTER HEH WITH INVERTED V </li></ul>Source: draft-faltstrom-idnbis-tables-05.txt
  32. 33. Disallowed characters by IDNA200X
  33. 34. Recommended Disallowed characters
  34. 35. Issues Need Further Investigations 2. Combining Marks <ul><li>The use of combining marks with some base characters would confuse with other character. </li></ul><ul><ul><li>Combining maddah and hamza 0653-0655 </li></ul></ul><ul><ul><li>Other combining marks 0656-065E </li></ul></ul>≠ U+0681 U+062d U+0654 U+0654 U+062d ځ is confusing with حٔ = ٔ + ح U+0623 U+0627 U+0654 U+0654 U+0627 أ is confusing with أ = ٔ + ا
  35. 36. Issues Need Further Investigations 3. Diacritics (Tashkeel) <ul><li>Tashkeel Points: about 9 </li></ul><ul><ul><li>064B-0652, 0670 </li></ul></ul><ul><li>From user point view, he/she does not know </li></ul><ul><ul><li>Whether the label is with or without tashkeel </li></ul></ul><ul><ul><li>Whether the label is totally (i.e., all characters) use tashkeel </li></ul></ul><ul><ul><li>The correct pronunciations to put the correct taskeel </li></ul></ul><ul><ul><li>There are many different pronunciations depending on the local dialogue </li></ul></ul><ul><li>There will be huge number of combinations when registering a single label </li></ul><ul><ul><li>good playground for phishing </li></ul></ul><ul><li>جمعية </li></ul><ul><li>جَمْعِيَّة </li></ul><ul><li>جَمعِيَّة </li></ul><ul><li>جَمْعِيّة </li></ul><ul><li>جَمعِيّة </li></ul><ul><li>جمعيةُ </li></ul><ul><li>جمعيةً </li></ul><ul><li>جمعيةٍ </li></ul><ul><li>جِمعية </li></ul><ul><li>جمّعية </li></ul>
  36. 37. Issues Need Further Investigations 4. ZWNJ & ZWJ Control Characters <ul><li>The support of ZWJ and ZWNJ in domain names would result in some confusions </li></ul><ul><li>Their use and concepts are not known to regular users </li></ul>
  37. 38. Issues Need Further Investigations 4. ZWNJ & ZWJ Control Characters <ul><li>ZWNJ: Visually noticed </li></ul>ح‌بل input[0] = U+06 2d input[1] = U+200c input[2] = U+0628 input[3] = U+0644 حبل input[0] = U+062d input[1] = U+0628 input[2] = U+0644
  38. 39. Issues Need Further Investigations 4. ZWNJ & ZWJ Control Characters <ul><li>ZWNJ: Visually Unnoticed </li></ul>ط‌بل input[0] = U+0637 input[1] = U+200c input[2] = U+0628 input[3] = U+0644 طبل input[0] = U+0637 input[1] = U+0628 input[2] = U+0644 ≠
  39. 40. Issues Need Further Investigations 4. ZWNJ & ZWJ Control Characters مجمع ‍ - الرباط ‍ - الدولي input[0] = U+0645 input[1] = U+062c input[2] = U+0645 input[3] = U+0639 input[4] = U+200d input[5] = U+002d input[6] = U+0627 input[7] = U+0644 input[8] = U+0631 input[9] = U+0628 input[10] = U+0627 input[11] = U+0637 input[12] = U+200d input[13] = U+002d input[14] = U+0627 input[15] = U+0644 input[16] = U+062f input[17] = U+0648 input[18] = U+0644 input[19] = U+064a مجمع - الرباط - الدولي input[0] = U+0645 input[1] = U+062c input[2] = U+0645 input[3] = U+0639 input[4] = U+002d input[5] = U+0627 input[6] = U+0644 input[7] = U+0631 input[8] = U+0628 input[9] = U+0637 input[10] = U+002d input[11] = U+0627 input[12] = U+0644 input[13] = U+062f input[14] = U+0648 input[15] = U+0644 input[16] = U+064a ZWJ Visually noticed ZWJ Visually Unnoticed
  40. 41. Issues Need Further Investigations 5. Digits <ul><li>Arabic-Indic VS. Eastern Arabic-Indic digits </li></ul><ul><ul><li>0 1 2 3 4 5 6 7 8 9 </li></ul></ul><ul><ul><li>۹ ۸ ۷ ۶ ۵ ۴ ۳ ۲ ۱ ۰ </li></ul></ul>۱۲۳۷۸۹۰ input[0] = U+06f1 input[1] = U+06f2 input[2] = U+06f3 input[3] = U+06f7 input[4] = U+06f8 input[5] = U+06f9 input[6] = U+06f0 ١٢٣٧٨٩٠ input[0] = U+0661 input[1] = U+0662 input[2] = U+0663 input[3] = U+0667 input[4] = U+0668 input[5] = U+0669 input[6] = U+0660 ≠
  41. 42. Issues Need Further Investigations 5. Digits <ul><li>Arabic-Indic vs. European-Arabic digits </li></ul><ul><ul><li>0 1 2 3 4 5 6 7 8 9 </li></ul></ul><ul><ul><li>0 1 2 3 4 5 6 7 8 9 </li></ul></ul><ul><li>“ Windows has supported number substitution by allowing the representation of different cultural shapes for the same digits while keeping the internal storage of these digits unified among different locales, for example numbers are stored in their well known hexadecimal values, 0x40, 0x41 , but displayed according to the selected language . </li></ul><ul><li>Source: http://msdn2.microsoft.com/en-us/library/aa350685(VS.85).aspx?PHPSESSID=o1fb21liejulfgrptbmi9dec92 </li></ul>م 12 م input[0] = U+0645 input[1] = U+0031 input[2] = U+0032 input[3] = U+0645 م١٢م input[0] = U+0645 input[1] = U+0661 input[2] = U+0662 input[3] = U+0645 ≠
  42. 43. Issues Need Further Investigations 6. Similar Shape Characters <ul><li>There are a number of groups of characters that have the same shapes, </li></ul><ul><ul><li>eg. Kaf, Heh, Yeh, Alef, … groups </li></ul></ul>
  43. 45. Issues Need Further Investigations 6. Similar Shape Characters <ul><li>كلمني </li></ul>ک لمني ک لمن ې input[0] = U+0643 input[1] = U+0644 input[2] = U+0645 input[3] = U+0646 input[4] = U+064a input[0] = U+06a9 input[1] = U+0644 input[2] = U+0645 input[3] = U+0646 input[4] = U+064a input[0] = U+06a9 input[1] = U+0644 input[2] = U+0645 input[3] = U+0646 input[4] = U+06d0 كلمني کلمني کلمنې
  44. 46. Issues Need Further Investigations 6. Similar Shape Characters <ul><li>كلى </li></ul>ک ل ی ک ل ۍ input[0] = U+06a9 input[1] = U+0644 input[2] = U+06cd input[0] = U+06a9 input[1] = U+0644 input[2] = U+06cc input[0] = U+0643 input[1] = U+0644 input[2] = U+0649
  45. 48. Issues Need Further Investigations 7. Bidirectional Behavior <ul><li>Arabic script domains will include characters that are LTR (e.g., dash, dot, AE digits) </li></ul><ul><li>Exiting IDNA2003 (stringprep) assumes that the first and last characters of the label must be RandALCat characters. </li></ul>
  46. 49. Thanks <ul><li>شكرا </li></ul><ul><li>xn -- mgbti4d </li></ul>

×