SlideShare a Scribd company logo
1 of 64
Download to read offline
Text 
IDN Root Zone LGR 
#ICANN51 
15 October 2014 
Sarmad Hussain 
IDN Program Senior Manager
Agenda 
Text 
• Introduction – Sarmad Hussain 
• Need, Limitations and Mechanisms for the Root 
Zone LGR – Marc Blanchet 
• Challenges in Addressing Multiple Languages 
using Arabic Script– Meikal Mumin 
• Coordination between Chinese, Japanese and 
Korean Scripts – Wang Wei 
• Coordination between Neo-Brahmi Scripts – 
Nishit Jain 
• Coordination between Cyrillic, Greek and Latin 
Scripts – Cary Karp 
• Q/A 
#ICANN51
Types of Coordination 
Text 
• One script – one GP 
o Arabic 
• One script – many GPs 
o Han – Chinese, Korean, Japanese 
• Many scripts – one GP 
o Neo-Brahmi scripts 
• Many scripts – many GPs 
o Cyrillic, Greek, Latin 
#ICANN51
Aspects of Coordination 
Text 
• Need – what work should be undertaken by the GPs 
o Same code points 
o Visually similar code points 
o Similar rules 
o Other? 
• Mechanism – how will these GP’s interact with each 
other 
o After individual GP work 
o During individual GP work 
o Before individual GP work 
#ICANN51
Text 
Need, Limitations and 
Mechanisms for the Root 
Zone LGR 
Presented by: Marc Blanchet 
Integration Panel 
IDN Root Zone LGR 
#ICANN51
The Need for LGRs 
Text 
• It’s not all about variants! 
• LGRs define what labels are valid 
o They are needed for automated label validation 
• For some scripts, all that is needed is a defined 
repertoire 
o Each application confined to one repertoire 
#ICANN51
Root Repertoire 
Text 
• Collection of single script repertoires 
o Each tagged by script: “und-Cyrl,” “und-Jpan” 
o No cross-repertoire labels 
o No overlap, except “common” code points, Han 
• Each script repertoire limited to: 
o Modern, widespread use 
o Everyday use 
o Stable code points 
#ICANN51
But What About Variants? 
Text 
• Some scripts require variants 
o Code points that are “the same” to users 
• Two types: 
o Those that lead to “blocked” variants 
o Those that lead to “allocatable” variants 
• Procedure: 
o Maximize number of blocked variants, and minimize the 
number of allocatable variants 
#ICANN51
More on Variants 
Text 
• Variant mappings will be used to automatically 
generate all permutations (variant labels) 
• Type of variant mapping determines whether: 
o To block a variant label 
(either variant or original can be allocated, not both) 
o To allow allocating it to the same applicant as original label 
• As result of integration, blocked variants can exist 
across GP repertoires 
o GP coordination will ensure consistent outcome 
#ICANN51
What, Why and When of WLEs 
Text 
• Whole Label Evaluation Rules (WLE) 
• Why they are needed 
o Prevent labels that cannot be processed/rendered 
• When to consider 
o Generally affect “complex scripts” 
o Not intended to enforce “spelling rules” 
• Example: 
o Disallow vowel marks where they can’t be rendered: 
at the start or following other vowel marks, etc. 
#ICANN51
Limitations 
Text 
• TLDs are intended for: 
o “Unambiguous labels with good mnemonic value” * 
• Not intended to capture all facets of a writing 
system 
o Should focus on modern, everyday use 
o OK not to support some conventions 
§ e.g., disallowing apostrophe does not support the ‘s ending for 
names of businesses, hyphen disallowed in root 
o Some limits necessary to reduce systemic risks 
#ICANN51 
*https://tools.ietf.org/html/draft-iab-dns-zone-codepoint-pples-02
What Should Be Coordinated? 
Text 
• Repertoire: Consistent treatment of similar repertoires 
§ Examples: Indic scripts 
• Variants: Compatible definition of variants 
§ Examples: Han script, overlapping repertoires 
o Cross-script homoglyphs 
§ Examples: Latin, Greek, Cyrillic 
• WLE: Consistent treatment of structurally similar 
scripts 
§ Examples: Indic scripts, definition of matra 
#ICANN51
Text Resources 
• Considerations for Designing a Label Generation Ruleset for the Root Zone 
• https://community.icann.org/download/attachments/43989034/Considerations-for-LGR-2014-09-23.pdf 
• Maximal Starting Repertoire (MSR-1) 
• https://www.icann.org/news/announcement-2-2014-06-20-en 
• https://www.icann.org/en/system/files/files/msr-overview-06jun14-en.pdf 
• Procedure to Develop and Maintain the Label Generation Rules for the Root 
Zone in Respect of IDNA Labels 
• https://www.icann.org/en/system/files/files/draft-lgr-procedure-20mar13-en.pdf 
• Representing Label Generation Rules in XML 
• https://tools.ietf.org/html/draft-davies-idntables 
• Requirements for LGR Proposals 
• https://community.icann.org/download/attachments/43989034/Requirements%20for%20LGR%20Proposals.pdf 
• Variant Rules 
• https://community.icann.org/download/attachments/43989034/Variant%20Rules.pdf
Text 
Challenges in Addressing 
Multiple Languages using 
Arabic Script 
Meikal Mumin 
Arabic Generation Panel 
IDN Root Zone LGR 
#ICANN51
Representing scripts in a world of languages 
Text 
• abc.def is a Roman/Latin script IDN 
#ICANN51 
is a Arabic script IDN • اﺍبﺏتﺕ.ثﺙجﺝحﺡ 
• But we do not know which languages are used by website of either IDNs 
• So International Domain Names (IDNs) have a script as property, but not a language. So what 
does this mean? 
o It means that IDNs cannot be based on the orthography of one language, such as Arabic language, but 
that… 
o LGR and related standards must therefore address the entire community of readers and writers of Arabic 
script 
• The problem is that, while we can only represent scripts, we think in terms of language 
o All data is at language level while we have to define LGR at script level 
o There are no institutions representing scripts communities 
o Writing is usually considered as a (reduced) representation of language 
• So what is the actual scope of Arabic script LGR?
Scope of the Arabic script LGR 
Text 
• Arabic script is centered around Africa and the Middle east as a writing system but in the course 
of time it has expanded across nearly all continents, with established past or present use in 
o the Americas, (Western, Central, Southern, and Eastern) Europe, (nearly all areas of) Asia, Africa (North 
and South of the Sahara) 
o Only within Africa, there is attested past or present use of Arabic script for the writing of 80+14 African 
languages apart from Arabic (Mumin 2014) 
o With todays patterns of migrations, continuing proselytization, and population growth, more user 
communities of Arabic script are manifesting in both the Global South and North 
• Accordingly, Arabic script is used not just locally or regionally but globally, albeit to radically 
different degrees and in entirely different manners, since… 
o for numerous languages, Arabic script is in active competition with other scripts, and… 
o for numerous languages, Arabic script is used only by a part of the language community 
o It is not foreseeable how the situation will evolve in the future and what the impact of IDNs would be on the 
community 
o To give a more extreme example – Would a language community possibly care if they can register a domain 
name using the orthography of their language if any reading and writing is only done with pen & paper? 
#ICANN51
Representing the underrepresented 
Text 
• Unfortunately, this linguistic diversity is not well 
represented 
o There is a lack of data on languages and orthographies 
o Particularly languages of low status or socio-economic 
participation lack representation 
o There is little available on non-western orthographies, while 
non-standardized orthographies are generally not considered 
o Often much TF-AIDN has to rely on users intuitions from an 
entirely different part of the script community 
o E.g. during code-point analysis, we frequently lacked data to 
establish whether a code point is used optionally or obligatorily 
in a given orthography, which required within the current 
process 
#ICANN51
Qualifying and quantifying script use: The EGIDS 
scale 
Text 
• Security and stability of DNS and the root zone are highly important, and 
therefore conservatism is a strong principle surrounding IDNs 
"Where the Integration Panel was able to establish to its satisfaction that a given 
code point was assigned a character solely for use in a disused orthography, or for 
a language in serious decline, the code point has been removed from the MSR.” 
#ICANN51 
Maximal Starting Repertoire — MSR--‐1 Overview and Rationale, REVISION – June 6, 2014, p. 22 
o MSR dictates that the Expanded Graded Intergenerational Disruption Scale 
[EGIDS] (Lewis and Simons 2010) is used to categorize the “effective demand” of 
languages within a given country: 
§ The EGIDS consists of 13 levels, ranking languages from the highest representation and 
role in society, being a National language, to the lowest, extinction 
§ “For the MSR the IP used the cut-off between EGIDS level 4 [Educational] and level 5 
[Developing].” 
• Unfortunately, such representation of language in society is not just 
accidental but usually a result of historical processes
Text 
"Scripts divide languages into cultures, make 
dialects into new distinct languages, and create 
new dialects. […] If, as is often said, ‘A language is 
a dialect with an army and navy’, how much more 
is it ‘a dialect with a distinct script’!” 
(Warren-Rothlin 2014: 264) 
#ICANN51
People, society, language and the role for IDNs 
Text 
• Languages and scripts are 
o …evaluated by people (Language attitude) 
o …assigned a status by both societies and scientists (Dialect vs. language) 
o …and regulated by governments (Language policy) 
o and this is reflected also in studies and statistics on languages 
§ There have even been historical reports of orthography suppression of Arabic script, where the 
use of writing systems has been banned and criminalized 
• We must be cautious not to strengthen further trends of linguistic 
discrimination and strive for equal treatment of languages, even where 
they lack socio-economic participation or political representation 
o TF-AIDN did identify 32 code-points during the analysis, with evidence of use but 
which cannot be included in LGR because they do not have an EGDIS rating 
higher than 5 
#ICANN51
Example #1 – Code point analysis and issues with 
EGIDS Text 
data 
• Example Seraiki [ISO 639-3: skr]: 
o Seraiki is a language of Pakistan 
o There are numerous publications in Seraiki, including daily newspapers 
o Within Pakistan, Seraiki has an EGIDS rating of 5 (Written) 
o IP recommends excluding any language with an EGIDS rating lower than 4 
• Example Harari [ISO 639-3: har]: 
o Harari is a language of Ethiopia 
o There are significant expatriate communities, which seem to be very active 
§ E.g. The Australian Saay Harari Association, which published an orthography description and a virtual keyboard with the 
assistance of the State Library of Victoria, Australia, in 2009 
o Within Ethiopia, Harari has an EGIDS rating of 6a (Vigorous), while it has not status in Australia 
o Because of the activity of the expatriate community, TF-AIDN assumes an active use of the orthography and 
would suggest inclusion of relevant code points 
o Unfortunately, this is not possible within the current process stipulated by IP 
#ICANN51
A-priori principles and a-posteriori analysis 
Text 
• In the case of Arabic script IDNs, ICANN has tasked two groups to work together to develop the 
Label Generation Rules (LGR) 
o Integration Panel (IP) has developed the “Procedure to Develop and Maintain the Label Generation Rules 
for the Root Zone in Respect of IDNA Labels”, as well as the Maximal Starting Repertoire (MSR-1) 
o On the basis of the procedure and the MSR-1, the Task Force on Arabic Script IDNs (TF-AIDN), should 
formulate the LGR, which is then approved by IP 
o Accordingly, rules have been laid out by IP before observation and analysis of data was conducted by TF-AIDN 
• Therefore MSR and the LGR development process has been designed before an (ideally data 
driven) code point analysis could be conducted by script generation panels 
o TF-AIDN noticed this, being the first script generation panel to take up work 
o Accordingly, TF-AIDN did suggest to IP as public comment to MSR-1 that 
§ MSR-1 should only be frozen one script at a time 
§ after relevant script Generation Panel has been formed and given its feedback on its relevant portion 
o Unfortunately, IP considered this as an effective request for removal of MSR1 
#ICANN51
Example #2 - Variants 
Text 
• Variants are required to balance the usability of IDNs as well as the 
representation of languages against security and stability of DNS and the root 
zone 
• Arabic Case Study Team Issues Report has published a report, identifying 6 
types of variants in Arabic script. Two examples: 
So how can we reasonably argue that this 
difference in letter shape is not confusable by 
all readers and across all representations and 
#ICANN51 
fonts… 
...while this difference is confusable 
to at least a subset of readers or in a 
subset of representations and fonts… 
• …when there are no empiric scientific tests to support either theory? 
• …when there is a systemic bias in representation with even within our group 
(as 15 out of 29 members are first language speakers of Arabic)?
Text 
#ICANN51 
تﺕرﺭیﯼمﻡاﺍ کﮎاﺍسﺱیﯼہﮧ 
شﺵكﻙرﺭاًﺍ 
شﺵکﮎرﺭیﯼہﮧ 
بﺏاﺍ سﺱپﭖاﺍسﺱ 
تﺕشﺵكﻙرﺭ 
Thank You
Text 
Coordination between 
Chinese, Japanese and 
Korean Scripts 
Wang Wei 
Chinese Generation Panel 
IDN Root Zone LGR 
#ICANN51
The Historical Changes 
of Chinese Character in East Asia 
Text 
Second century BC to 5th century AD 
In the modern Hangul-based Korean writing 
system, Chinese characters (Hanjia) are no 
longer officially used, but still sometimes used 
occasionally in daily life. 
Chinese characters (Kanji) were adopted from 
the 5th century AD. 
All three scripts (kanji, and the hiragana and 
katakana syllabaries) are used as main scripts. 
Hanzi unification in the Qin dynasty (221-207 B.C.) 
Now, two writing systems: Simplified Chinese (SC) and 
Traditional Chinese (TC). SC and TC have the same 
meaning and the same pronunciation, are typical variants. 
TC: Taiwan, Macau, Hong Kong 
SC: Mainland China, Singapore 
TC & SC: Malaysia
Relationship of Chinese Characters in Three 
Scripts 
Text 
In ISO 15924, the script for Chinese characters is mainly defined in this specification: 
• ISO 15924 code: Hani 
• ISO 15924 no: 500 
• English Name: Han (Hanzi, Kanji, Hanja)
TeSxLt D/TLD Chinese Character IDN Registration 
CDNC Character Table and Registration Rules under RFC 3743/4713 
SLD: .CN, .TW, .HK, .SG, .ASIA 
TLD: .中国, .台湾, .香港 
JPRS IDN Registration 
SLD: .JP 
KISA: 
NO Chinese character 
registration under .KR 
So Far 
19535(CGP) 
19537 (CDNC) 
618 
6
TeVxat riant Solutions in Different Scripts 
CDNC: RFC 3743 & 4713 
• Allocate all Applied-for IDL and Variant IDLs to the same registrant 
• Delegate Applied-for IDL, Preferred SC IDL, Preferred TC IDL 
• Reserve all the other variant IDLs 
• Delegate reserved variant IDLs when requested at a later date 
JPRS: No Variant issue 
Among Kanji characters, some are in a simplified form (called the “new character form”), derived from the 
traditional imported form (called the “old character form”). 
It is appropriate to distinguish new and old forms as different and independent characters instead of pure 
variants. This understanding has been reflected in the IANA IDN table developed by the JPRS, in which no 
variants are identified for Kanji. 
KISA: No Variant issue, so far … 
Hanja is no longer widely used in the ROK. A law enacted in 2011 orders all ROK official government 
documents to be written ONLY in Hangul. 
KISA stated that its SLD IDN policy does not allow and nor does they have any intention of allowing the use 
of Hanja in their domestic market.
TeCxto ordination Principle 
Each CJK panel creates an LGR and each LGR includes a repertoire and variants. 
The variant mappings must agree for the same code point for all LGRs. 
The variant types may be different (blocked or allocatable), the variant types do not have to agree across 
LGRs. 
The repertoires may be different. 
Allocatable 
A potential allocation rule says that once the variant label is generated, that variant label may be allocated to the 
applicant for the original label. 
Blocked 
A blocking rule says that a particular label must not be allocated to anyone under any circumstances.
Example to Illustrate: Case Study 0 
Appendix F of draft-lgr-procedure-20mar13-en.pdf. 
Text 
For CGP 
Code Point 
Allocatable 
Variant 
Blocked 
Variant 
Tag 
爱 
U+7231 
愛 
U+611B 
- 
und-hani 
愛 
U+611B 
爱 
U+7231 
- 
und-hani 
For JGP, probably 
Code Point 
Allocatable 
Variant 
Blocked 
Variant 
Tag 
愛 
U+611B 
- 
- 
und-jpab 
Code 
Point 
Allocatable 
Variant 
Blocked 
Variant 
Tag 
爱 
U+7231 
愛 
U+611B 
- 
und-hani 
愛 
U+611B 
爱 
U+7231 
- 
und-hani 
愛 
U+611B 
- 
爱 
U+7231 
und-jpan 
Applying for U+611B using the und-jpan blocks the use of U+7231 in the same location in any label, no matter 
which tag it is applied under. This is so, even though U+7231 is not a character in Japanese at all and does not 
appear in the tagged repertoire und-jpan. Because it is not part of that repertoire, it cannot be used in any label 
applied for with the und-jpan tag.
TePxrt ogress of CGP, JGP and KGP 
CGP: 
Formal establishment announcement on 24 September. 
(https://www.icann.org/news/announcement-2014-09-24-en) 
Draw up initial repertoire and variant type definition in XML format. 
Provided some coordination study case for IP and K/J. 
JGP: Not seated yet 
? 
? 
KGP: Not seated yet 
2014.08.21: KLGP domestic meeting. 
2014.08.26: Joint meeting with Han Chuan LEE and other attendees 
2014.09.03: CJK people discussion
TeCxtG P Repertoire and Variant Type 
In 2004, according to RFC 3743 and RFC 4713, CDNC 
submitted to IANA a unified Chinese Character Set 
(19520 characters) for domain name registration, 
building up mapping relationships between any given 
simplified character, its traditional character(s) and its 
variant(s). 
In 2012, CDNC added 17 more Chinese characters as 
requested by Hongkong community, increasing the set 
number to 19537. But only 15 of those 17 characters are 
included in MSR-1. 
• Thus CGP takes the intersection of MSR-1 and the 
latest version of CDNC character set, amounting to 
19535 characters, excluding Latin Hyphen, digits and 
letters. 
• Following CDNC registration rule and RFC 3743 & 
4713, CGP take the second column (the preferred 
variants) as “allocatable,” while the rest of the variants 
as “blocked.” 
<char cp="575D" tag="sc:Hani"> 
<var cp="575D" type="simp" comment="identity" /> 
<var cp="57BB" type="block" /> 
<var cp="58E9" type="trad" /> 
</char> 
<char cp="57BB" tag="sc:Hani"> 
<var cp="575D" type="simp" /> 
<var cp="57BB" type="block" comment="identity" /> 
<var cp="58E9" type="trad" /> 
</char> 
<char cp="58E9" tag="sc:Hani"> 
<var cp="575D" type="simp" /> 
<var cp="57BB" type="block" /> 
<var cp="58E9" type="trad" comment="identity" /> 
</char> 
Code Point Allocatable 
Variant 
Blocked 
Variant Tag 
坝(575D) 壩(58E9)   und-hani 
坝(575D) 垻(57BB) und-hani 
垻(57BB) 坝(575D) und-hani 
垻(57BB) 壩(58E9) und-hani 
壩(58E9) 坝(575D) und-hani 
壩(58E9) 垻(57BB) und-hani
CTeGxt P’s Perspective for Variant Mapping Coordination 
• CGP is aware that the coordination can not be achieved by one party. 
• CGP is tremendously open to make an unified variant mapping table working 
together with JGP and KGP. 
• CGP is ready to modify the initial repertoire and variant type annotation according 
to the coordination result, and if necessary, to delete some code points to avoid 
complicated conflicts. 
• Those UNIQUE Chinese character codes in JGP and KGP are NOT to be added 
into CGP repertoire.
TeCxta se Study 1 
All code points are included in CGP initial repertoire and regarded as variants of each other. 
The mapping relationship in RFC 3743 format is as follows: 
• 一4E00 (0); 一4E00(86),一4E00(886); 一(0),壱(0),壹(0),弌(0); 
• 壱58F1 (0); 壹58F9(86),壹58F9(886); 一(0),壱(0),壹(0),弌(0); 
• 壹58F9 (0); 壹58F9(86),壹58F9(886); 一(0),壱(0),壹(0),弌(0); 
• 弌5F0C (0); 一4E00(86),一4E00(886); 一(0),壱(0),壹(0),弌(0); 
Meanwhile, all code points are included in JPRS IDN table as well. 
(http://www.iana.org/domains/idn-tables/tables/jp_ja-jp_1.2.html) 
There is no mapping relationship among them. 
• 一 4E00(2,3);4E00(2,3); # 16-76, CJK UNIFIED IDEOGRAPH-4E00 
• 壱 58F1(2,3);58F1(2,3); # 16-77, CJK UNIFIED IDEOGRAPH-58F1 
• 壹 58F9(2,3);58F9(2,3); # 52-69, CJK UNIFIED IDEOGRAPH-58F9 
• 弌 5F0C(2,3);5F0C(2,3); # 48-01, CJK UNIFIED IDEOGRAPH-5F0C
TeCxta se Study 1 
Code Point Allocatable Variant Blocked Variant Tag 
一 (U+4E00) - 壱 (U+58F1) und-hani 
一 (U+4E00) -­‐ 壹 (U+58F9) und-hani 
一 (U+4E00) -­‐ 弌 (U+5F0C) und-hani 
壹 (U+58F9) - 一 (U+4E00) und-hani 
壹 (U+58F9) -­‐ 壱 (U+58F1) und-hani 
壹 (U+58F9) -­‐ 弌 (U+5F0C) und-hani 
弌 (U+5F0C) 一(U+4E00) - und-hani 
弌 (U+5F0C) -­‐ 壹 (U+58F9) und-hani 
弌 (U+5F0C) -­‐ 壱 (U+58F1) und-hani 
壱 (U+58F1) 壹(U+58F9) - und-hani 
壱 (U+58F1) -­‐ 一 (U+4E00) und-hani 
壱 (U+58F1) -­‐ 弌 (U+5F0C) und-hani 
一 (U+4E00) - - und-jpan 
壹 (U+58F9) - - und-jpan 
弌 (U+5F0C) - - und-jpan 
壱 (U+58F1) - - und-jpan
TeCxta se Study 2 
The code point and its variant(s) exist separately in CGP and JGP 
• 刊 (U+520A) # in CGP and JGP 
• 刋 (U+520B) # in CGP and JGP 
• 栞 (U+681E) # only in JGP 
In CGP repertoire, the mapping is: 
• 刊520A (0);刊520A(86),刊520A(886);刊(0),刋(0); 
• 刋520B (0);刊520A(86),刊520A(886);刊(0),刋(0); 
In JPRS table,code points are: 
• 刊 520A(2,3);520A(2,3); 
• 刋 520B(2,3);520B(2,3); 
• 栞 681E(2,3);681E(2,3);
TeCxta se Study 2 
Though 栞(U+681E) is not included in CGP repertoire, but it is regarded as the variant of 刊(U 
+52-A) and 刋(U+520B) in ancient Chinese literature and some local areas. 
CGP would like to extend the CGP repertoire by adding 栞(U+681E) and build up the variant 
relationship. 
Code Point Allocatable Variant Blocked Variant Tag 
刊(U+520A) - 刋(U+520B) und-hani 
刊(U+520A) -­‐ 栞(U+681E) und-hani 
刋(U+520B) 刊(U+520A) -­‐ und-hani 
刋(U+520B) -­‐ 栞(U+681E) und-hani 
栞(U+681E) 刊(U+520A) -­‐ und-hani 
栞(U+681E) -­‐ 刋(U+520B) und-hani 
刊(U+520A) - - und-jpan 
刋(U+520B) - - und-jpan 
栞(U+681E) - - und-jpan
TeCxta se Study 3 
The code point ONLY exists in JPRS table: 
• 辻(U+8FBB) 
‘辻’ does NOT exist in CGP now and traditionally, it is regarded as a Japanese 
UNIQUE character code. 
If CGP linguistic experts keep the viewpoint that ‘辻’ is not associated any code point in CGP 
repertoire, CGP will not add this code point into CGP repertoire: 
Code Point A l l o c a t a b l e 
Variant Blocked Variant Tag 
辻(U+8FBB) - - und-jpan
TeExxt pectation for JGP and KGP 
• Generate the repertoire and variant type annotation ASAP 
• JGP: Kanji repertoire and variant type annotation 
• KGP: Allow Hanjia? >> Hanjia repertoire and variant type annotation 
• Work together on the unified variant mapping table for the overlapped code points 
• Case 0: jpan or kore tagged code point block hani variant 
• Case 1: NO change to any variant type annotation 
• Case 2: jpan or kore tagged code point added into hani variant 
• Case 3: jpan or kore UNIQUE code points 
• Case 4: … 
• Revise each panel’s repertoire and variant type annotation 
and cross-check the consistency and potential conflicts. 
• Generate each panel’s Whole Label Generation Rule 
and cross-check the consistency and potential conflicts. 
KGP: ???? 
JGP: 6356 
6186 
CGP: 19535 
???
TeCxth allenges… 
• Postponed work plan 
- Synchronization between C, J and K 
- Extension from 31 Dec 2014 to 2015 
• Repertoire Modification 
- Negotiation among three panels’ linguistic experts 
- Code points extension or reduction 
- Variant type annotation changes 
• Whole Label Generation Rule Set 
• Each panel SHOULD be aware of PROs and CONs of the language tag based solution 
• Focus on the techniques and best-practice
Text 
Thanks Q&A
Text 
Coordination between Neo- 
Brahmi Scripts 
Nishit Jain 
Neo-Brahmi Generation Panel 
IDN Root Zone LGR 
#ICANN51
Neo-Brahmi 
Generation Panel 
44
What is Brahmi? 
• An ancient script 
• Most of the modern scripts in 
Indian subcontinent have been 
derived from Brahmi. 
• Geographically the scripts being 
used in Central Asia, South Asia and 
South-East Asia 
• These scripts are used by 
multiple language families: 
Largely by Indo-Aryan and 
Dravidian 
Brahmi script engraved on Ashoka Pillar in 3rd 
45 
century BCE 
Source: http://en.wikipedia.org/wiki/Brahmi_script
Why Brahmi? 
46 
• Despite their variations in the visual forms, the basic 
philosophy in their usage is common 
• They all are “akshar” driven, and follow a specific syntax 
o Analogical reference can be made to Indian National standard, IS 
13194:1991 Section 8 
• This syntax being the implicit foundation in representation of 
these scripts in the digital medium, adherence to the structure 
acts as a obligatory security consideration even in the case of 
Internationalized Domain Names.
Why Neo-Brahmi? 
• Of all the scripts derived from 
“Brahmi,” not all are in modern 
usage 
• Approach is in consonance with 
the "Conservatism Principle" of 
the LGR procedure. 
47
Previous Similar Work 
48 
• For IDN version of “.in” ccTLD, (.bharat) equivalent in 22 
Official Indian Languages, similar exercise had been 
carried out 
• Following things were finalized for each language 
– Permissible set of code points 
– Visually similar variant strings 
– Complex whole label evaluation rules 
• Recently .भारत ccTLD has been launched in Devanagari 
script covering Hindi, Marathi, Konkani, Boro, Dogri, 
Maithili, Nepali and Sindhi.
Revisiting the Rules in Context of 
LGR Framework 
49 
• LGR work is different in following contexts 
– Wider stakeholder group 
– Overarching principles in the LGR procedure 
• Especially Simplicity and Predictability principles 
• This revision, however, would not change 
– The need for the well-formedness of the label in 
terms of Akshar formalism
Neo Brahmi GP - Current Status 
50 
• Currently the group is 10 members 
• Mixed bag of expertise like linguistic, Unicode 
Udaya Narayana Singh Raiomond Doctor 
Mahesh D. Kulkarni Anupam Agrawal 
Akshat S. Joshi Abhijit Dutta 
N. Deiva Sundaram Neha Gupta 
Nishit Jain Prabhakar Pandey 
• We are in process of getting more members 
on-board
Neo Brahmi GP – Outreach Efforts 
• Conducted a workshop in AprIGF-2014 for awareness and call for 
participation in LGR procedure. 
o Topic: “Bringing diverse linguistic communities together for a unified IDN 
51 
ruleset” 
o The panel discussion touched upon the various aspects of 
creation of the LGR for the Neo-Brahmi scripts 
o http://2014.rigf.asia/agenda/workshop-proposals/workshop-proposal-13/ 
• Participation and presentation in ICANN 49 public meeting at 
Singapore 
• Participation and presentation in ICANN 50 public meeting at 
London 
Reaching out to the community for wider participation
Cross-Script Similarities 
• Code 
point 
similarity 
across 
scripts 
• Cases 
where 
Devanagari-­‐Gujara: 
and 
Devanagari-­‐Gurumukhi 
strings 
look 
similar. 
52
Neo Brahmi GP – Approach
Neo Brahmi GP – Approach 
54 
• There are cases of: 
– One script, one language 
– One script, multiple languages 
• Multiple sub-groups may exist to ensure proper 
representation of each language 
• Each sub-group ideally would comprise of 
– Language expert(s) 
– Community representative(s)
Neo-Brahmi GP 
Internal 
Composition 
Integration 
Panel 
Neo-­‐Brahmi 
GP 
Devanagari 
SG 
Hindi 
Marathi 
Konkani 
Nepali 
Bodo 
Dogri 
Maithili 
Santhali 
… 
Tamil 
Sub-­‐Group 
Telugu 
SG 
Gujara: 
SG 
Gurmukhi 
SG 
… 
Bengali 
SG 
Bangla 
Assamese 
Manipuri 
… 
55
Text 
Coordination between 
Cyrillic, Greek and Latin 
Scripts 
Cary Karp 
Latin Generation Panel 
IDN Root Zone LGR 
#ICANN51
Text 
• Apsyeoxic — two words that appear to be spelled 
identically but are actually sequences of characters 
from different scripts are said to be apsyeoxic /æpsi 
ˈaːksɪk/. This term is derived from the graphic similarity 
between the string of Roman letters apsyeoxic and the 
visually confusable string of Cyrillic letters арѕуеохіс 
o http://dictionary.sensagent.com/ 
#ICANN51
Text 
• Culling Cyrillic and Latin code points from the MSR 
which are commonly represented with congruent 
glyphs: 
o Latin 
§ aäæcçdeëəәhiïoöpsxyÿʒ 
o Cyrillic 
§ аӓӕсҫԁеёəәһіїоӧрѕхуӱӡ 
#ICANN51
Text 
• Adding Greek and admitting closely similar, but not 
identical glyphs: 
o Cyrillic 
§ а ҫїко р 
o Greek 
§ αβγςϊκοόρν 
o Latin 
§ ɑßɣçï oópv 
• The extent of the problem crossing all three scripts 
does not appear particularly great 
#ICANN51
Text 
• Stepping away from both IDNA and the MSR and 
considering uppercase: 
o Cyrillic 
§ АГВЕНІКМ ОПРТФХ 
o Greek 
§ ΑΓΒΕΗΙΚΜΝΟΠΡΤΦΧΥΖ 
o Latin 
§ A BEHIKMNO PTɸXYZ 
• IDNA expects issues relating to case to be resolved 
before the protocol is invoked 
• This does not mean that such issues are irrelevant 
#ICANN51
Text 
• This does mean that if the LGR panels are to 
address cross-script issues, they may also need to 
deal with collateral details that lie outside the 
current scope of the initiative 
#ICANN51
Text 
#ICANN51 
Thank You
Text Engage with ICANN on Web & Social Media 
twitter.com/icann 
facebook.com/icannorg 
linkedin.com/company/icann 
gplus.to/icann 
weibo.com/icannorg 
flickr.com/photos/icann 
youtube.com/user/ICANNnews icann.org

More Related Content

What's hot (13)

Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýDolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
How To "Speak Developer"
How To "Speak Developer"How To "Speak Developer"
How To "Speak Developer"
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Language and Contact SOCIOLINGUISTICS
Language and Contact SOCIOLINGUISTICSLanguage and Contact SOCIOLINGUISTICS
Language and Contact SOCIOLINGUISTICS
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Translation non commodity
Translation non commodityTranslation non commodity
Translation non commodity
 
80 81
80 8180 81
80 81
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Carla um-nov12
Carla um-nov12Carla um-nov12
Carla um-nov12
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 

Viewers also liked

ICANN 50: IDN Root Zone LGR Generation Panels Workshop
ICANN 50: IDN Root Zone LGR Generation Panels WorkshopICANN 50: IDN Root Zone LGR Generation Panels Workshop
ICANN 50: IDN Root Zone LGR Generation Panels WorkshopICANN
 
Reed bed water solutions lvk ppt
Reed bed water solutions lvk ppt Reed bed water solutions lvk ppt
Reed bed water solutions lvk ppt Srivatsan Rayam
 
Bioreactors for bioremediation.
Bioreactors for bioremediation.Bioreactors for bioremediation.
Bioreactors for bioremediation.Sumbal Javed
 
Wastewater Treatment Using Constructed Wetlands
Wastewater Treatment Using Constructed WetlandsWastewater Treatment Using Constructed Wetlands
Wastewater Treatment Using Constructed WetlandsEduardo Casado Fernández
 
=Rain water harvesting =-
 =Rain water harvesting =- =Rain water harvesting =-
=Rain water harvesting =-omg
 
Rain water harvesting ppt
Rain water harvesting pptRain water harvesting ppt
Rain water harvesting pptPalash Jain
 
Rain water harvesting (complete)
Rain water harvesting (complete)Rain water harvesting (complete)
Rain water harvesting (complete)Abhay Goyal
 

Viewers also liked (9)

ICANN 50: IDN Root Zone LGR Generation Panels Workshop
ICANN 50: IDN Root Zone LGR Generation Panels WorkshopICANN 50: IDN Root Zone LGR Generation Panels Workshop
ICANN 50: IDN Root Zone LGR Generation Panels Workshop
 
Reed bed water solutions lvk ppt
Reed bed water solutions lvk ppt Reed bed water solutions lvk ppt
Reed bed water solutions lvk ppt
 
Bioreactors for bioremediation.
Bioreactors for bioremediation.Bioreactors for bioremediation.
Bioreactors for bioremediation.
 
Wastewater Treatment Using Constructed Wetlands
Wastewater Treatment Using Constructed WetlandsWastewater Treatment Using Constructed Wetlands
Wastewater Treatment Using Constructed Wetlands
 
=Rain water harvesting =-
 =Rain water harvesting =- =Rain water harvesting =-
=Rain water harvesting =-
 
Bioreactors
BioreactorsBioreactors
Bioreactors
 
Rain water harvesting ppt
Rain water harvesting pptRain water harvesting ppt
Rain water harvesting ppt
 
Rain water harvesting (complete)
Rain water harvesting (complete)Rain water harvesting (complete)
Rain water harvesting (complete)
 
Solid waste management ppt
Solid waste management pptSolid waste management ppt
Solid waste management ppt
 

Similar to ICANN 51: IDN Root Zone LGR (workshop)

ICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software LocalizationLionbridge
 
Lip africa summit challenges in emerging marget localisation
Lip africa summit challenges in emerging marget localisationLip africa summit challenges in emerging marget localisation
Lip africa summit challenges in emerging marget localisationManuela Noske
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Principles of Programming Languages - Lecture Notes
Principles of Programming Languages -  Lecture NotesPrinciples of Programming Languages -  Lecture Notes
Principles of Programming Languages - Lecture Notessuthi
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Krishna Sai
 
02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtagssuvo1111
 
A Participatory Research Approach to develop an Arabic Symbol Dictionary
A Participatory Research Approach to develop an Arabic Symbol DictionaryA Participatory Research Approach to develop an Arabic Symbol Dictionary
A Participatory Research Approach to develop an Arabic Symbol DictionaryE.A. Draffan
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Professor Lili Saghafi
 

Similar to ICANN 51: IDN Root Zone LGR (workshop) (20)

1. reason why study spl
1. reason why study spl1. reason why study spl
1. reason why study spl
 
ICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN 51: IDN Program Update
ICANN 51: IDN Program Update
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software Localization
 
Lip africa summit challenges in emerging marget localisation
Lip africa summit challenges in emerging marget localisationLip africa summit challenges in emerging marget localisation
Lip africa summit challenges in emerging marget localisation
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Principles of Programming Languages - Lecture Notes
Principles of Programming Languages -  Lecture NotesPrinciples of Programming Languages -  Lecture Notes
Principles of Programming Languages - Lecture Notes
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-
 
Ppl 13 july2019
Ppl 13 july2019Ppl 13 july2019
Ppl 13 july2019
 
02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtags
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
RTL UI
RTL UIRTL UI
RTL UI
 
First Stages and challenges of LibreOffice Translation in Hausa Language
First Stages and challenges  of LibreOffice Translation  in Hausa LanguageFirst Stages and challenges  of LibreOffice Translation  in Hausa Language
First Stages and challenges of LibreOffice Translation in Hausa Language
 
A Participatory Research Approach to develop an Arabic Symbol Dictionary
A Participatory Research Approach to develop an Arabic Symbol DictionaryA Participatory Research Approach to develop an Arabic Symbol Dictionary
A Participatory Research Approach to develop an Arabic Symbol Dictionary
 
Machine translator Introduction
Machine translator IntroductionMachine translator Introduction
Machine translator Introduction
 
Test upload
Test uploadTest upload
Test upload
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
Programming Languages Categories / Programming Paradigm By: Prof. Lili Saghafi
 
Php packages
Php packagesPhp packages
Php packages
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 

More from ICANN

Call for Volunteers: Accountability & Transparency Review Team_PT
Call for Volunteers: Accountability & Transparency Review Team_PTCall for Volunteers: Accountability & Transparency Review Team_PT
Call for Volunteers: Accountability & Transparency Review Team_PTICANN
 
Call for Volunteers: Accountability & Transparency Review Team_ZH
Call for Volunteers: Accountability & Transparency Review Team_ZHCall for Volunteers: Accountability & Transparency Review Team_ZH
Call for Volunteers: Accountability & Transparency Review Team_ZHICANN
 
Call for Volunteers: Accountability & Transparency Review Team_ES
Call for Volunteers: Accountability & Transparency Review Team_ESCall for Volunteers: Accountability & Transparency Review Team_ES
Call for Volunteers: Accountability & Transparency Review Team_ESICANN
 
Call for Volunteers: Accountability & Transparency Review Team_AR
Call for Volunteers: Accountability & Transparency Review Team_ARCall for Volunteers: Accountability & Transparency Review Team_AR
Call for Volunteers: Accountability & Transparency Review Team_ARICANN
 
Call for Volunteers: Accountability & Transparency Review Team_FR
Call for Volunteers: Accountability & Transparency Review Team_FRCall for Volunteers: Accountability & Transparency Review Team_FR
Call for Volunteers: Accountability & Transparency Review Team_FRICANN
 
Call for Volunteers: Accountability & Transparency Review Team_RU
Call for Volunteers: Accountability & Transparency Review Team_RUCall for Volunteers: Accountability & Transparency Review Team_RU
Call for Volunteers: Accountability & Transparency Review Team_RUICANN
 
Call for Volunteers: Accountability & Transparency Review Team
Call for Volunteers: Accountability & Transparency Review TeamCall for Volunteers: Accountability & Transparency Review Team
Call for Volunteers: Accountability & Transparency Review TeamICANN
 
ICANN Expected Standards of Behavior | French
ICANN Expected Standards of Behavior | FrenchICANN Expected Standards of Behavior | French
ICANN Expected Standards of Behavior | FrenchICANN
 
ICANN Expected Standards of Behavior
ICANN Expected Standards of BehaviorICANN Expected Standards of Behavior
ICANN Expected Standards of BehaviorICANN
 
ICANN Expected Standards of Behavior | Russian
ICANN Expected Standards of Behavior | RussianICANN Expected Standards of Behavior | Russian
ICANN Expected Standards of Behavior | RussianICANN
 
ICANN Expected Standards of Behavior | Arabic
ICANN Expected Standards of Behavior | ArabicICANN Expected Standards of Behavior | Arabic
ICANN Expected Standards of Behavior | ArabicICANN
 
ICANN Expected Standards of Behavior | Chinese
ICANN Expected Standards of Behavior | ChineseICANN Expected Standards of Behavior | Chinese
ICANN Expected Standards of Behavior | ChineseICANN
 
ICANN Expected Standards of Behavior | Spanish
ICANN Expected Standards of Behavior | SpanishICANN Expected Standards of Behavior | Spanish
ICANN Expected Standards of Behavior | SpanishICANN
 
Policy Development Process Infographic Turkish
Policy Development Process Infographic TurkishPolicy Development Process Infographic Turkish
Policy Development Process Infographic TurkishICANN
 
Policy Development Process Infographic Russian
Policy Development Process Infographic RussianPolicy Development Process Infographic Russian
Policy Development Process Infographic RussianICANN
 
Policy Development Process Infographic Portuguese
Policy Development Process Infographic PortuguesePolicy Development Process Infographic Portuguese
Policy Development Process Infographic PortugueseICANN
 
Policy Development Process Infographic Spanish
Policy Development Process Infographic SpanishPolicy Development Process Infographic Spanish
Policy Development Process Infographic SpanishICANN
 
Policy Development Process Infographic French
Policy Development Process Infographic FrenchPolicy Development Process Infographic French
Policy Development Process Infographic FrenchICANN
 
Policy Development Process Infographic English
Policy Development Process Infographic EnglishPolicy Development Process Infographic English
Policy Development Process Infographic EnglishICANN
 
Policy Development Process Infographic Chinese
Policy Development Process Infographic ChinesePolicy Development Process Infographic Chinese
Policy Development Process Infographic ChineseICANN
 

More from ICANN (20)

Call for Volunteers: Accountability & Transparency Review Team_PT
Call for Volunteers: Accountability & Transparency Review Team_PTCall for Volunteers: Accountability & Transparency Review Team_PT
Call for Volunteers: Accountability & Transparency Review Team_PT
 
Call for Volunteers: Accountability & Transparency Review Team_ZH
Call for Volunteers: Accountability & Transparency Review Team_ZHCall for Volunteers: Accountability & Transparency Review Team_ZH
Call for Volunteers: Accountability & Transparency Review Team_ZH
 
Call for Volunteers: Accountability & Transparency Review Team_ES
Call for Volunteers: Accountability & Transparency Review Team_ESCall for Volunteers: Accountability & Transparency Review Team_ES
Call for Volunteers: Accountability & Transparency Review Team_ES
 
Call for Volunteers: Accountability & Transparency Review Team_AR
Call for Volunteers: Accountability & Transparency Review Team_ARCall for Volunteers: Accountability & Transparency Review Team_AR
Call for Volunteers: Accountability & Transparency Review Team_AR
 
Call for Volunteers: Accountability & Transparency Review Team_FR
Call for Volunteers: Accountability & Transparency Review Team_FRCall for Volunteers: Accountability & Transparency Review Team_FR
Call for Volunteers: Accountability & Transparency Review Team_FR
 
Call for Volunteers: Accountability & Transparency Review Team_RU
Call for Volunteers: Accountability & Transparency Review Team_RUCall for Volunteers: Accountability & Transparency Review Team_RU
Call for Volunteers: Accountability & Transparency Review Team_RU
 
Call for Volunteers: Accountability & Transparency Review Team
Call for Volunteers: Accountability & Transparency Review TeamCall for Volunteers: Accountability & Transparency Review Team
Call for Volunteers: Accountability & Transparency Review Team
 
ICANN Expected Standards of Behavior | French
ICANN Expected Standards of Behavior | FrenchICANN Expected Standards of Behavior | French
ICANN Expected Standards of Behavior | French
 
ICANN Expected Standards of Behavior
ICANN Expected Standards of BehaviorICANN Expected Standards of Behavior
ICANN Expected Standards of Behavior
 
ICANN Expected Standards of Behavior | Russian
ICANN Expected Standards of Behavior | RussianICANN Expected Standards of Behavior | Russian
ICANN Expected Standards of Behavior | Russian
 
ICANN Expected Standards of Behavior | Arabic
ICANN Expected Standards of Behavior | ArabicICANN Expected Standards of Behavior | Arabic
ICANN Expected Standards of Behavior | Arabic
 
ICANN Expected Standards of Behavior | Chinese
ICANN Expected Standards of Behavior | ChineseICANN Expected Standards of Behavior | Chinese
ICANN Expected Standards of Behavior | Chinese
 
ICANN Expected Standards of Behavior | Spanish
ICANN Expected Standards of Behavior | SpanishICANN Expected Standards of Behavior | Spanish
ICANN Expected Standards of Behavior | Spanish
 
Policy Development Process Infographic Turkish
Policy Development Process Infographic TurkishPolicy Development Process Infographic Turkish
Policy Development Process Infographic Turkish
 
Policy Development Process Infographic Russian
Policy Development Process Infographic RussianPolicy Development Process Infographic Russian
Policy Development Process Infographic Russian
 
Policy Development Process Infographic Portuguese
Policy Development Process Infographic PortuguesePolicy Development Process Infographic Portuguese
Policy Development Process Infographic Portuguese
 
Policy Development Process Infographic Spanish
Policy Development Process Infographic SpanishPolicy Development Process Infographic Spanish
Policy Development Process Infographic Spanish
 
Policy Development Process Infographic French
Policy Development Process Infographic FrenchPolicy Development Process Infographic French
Policy Development Process Infographic French
 
Policy Development Process Infographic English
Policy Development Process Infographic EnglishPolicy Development Process Infographic English
Policy Development Process Infographic English
 
Policy Development Process Infographic Chinese
Policy Development Process Infographic ChinesePolicy Development Process Infographic Chinese
Policy Development Process Infographic Chinese
 

Recently uploaded

定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 

Recently uploaded (20)

定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICECall Girls Service Dwarka @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICE
Call Girls Service Dwarka @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 

ICANN 51: IDN Root Zone LGR (workshop)

  • 1. Text IDN Root Zone LGR #ICANN51 15 October 2014 Sarmad Hussain IDN Program Senior Manager
  • 2. Agenda Text • Introduction – Sarmad Hussain • Need, Limitations and Mechanisms for the Root Zone LGR – Marc Blanchet • Challenges in Addressing Multiple Languages using Arabic Script– Meikal Mumin • Coordination between Chinese, Japanese and Korean Scripts – Wang Wei • Coordination between Neo-Brahmi Scripts – Nishit Jain • Coordination between Cyrillic, Greek and Latin Scripts – Cary Karp • Q/A #ICANN51
  • 3. Types of Coordination Text • One script – one GP o Arabic • One script – many GPs o Han – Chinese, Korean, Japanese • Many scripts – one GP o Neo-Brahmi scripts • Many scripts – many GPs o Cyrillic, Greek, Latin #ICANN51
  • 4. Aspects of Coordination Text • Need – what work should be undertaken by the GPs o Same code points o Visually similar code points o Similar rules o Other? • Mechanism – how will these GP’s interact with each other o After individual GP work o During individual GP work o Before individual GP work #ICANN51
  • 5. Text Need, Limitations and Mechanisms for the Root Zone LGR Presented by: Marc Blanchet Integration Panel IDN Root Zone LGR #ICANN51
  • 6. The Need for LGRs Text • It’s not all about variants! • LGRs define what labels are valid o They are needed for automated label validation • For some scripts, all that is needed is a defined repertoire o Each application confined to one repertoire #ICANN51
  • 7. Root Repertoire Text • Collection of single script repertoires o Each tagged by script: “und-Cyrl,” “und-Jpan” o No cross-repertoire labels o No overlap, except “common” code points, Han • Each script repertoire limited to: o Modern, widespread use o Everyday use o Stable code points #ICANN51
  • 8. But What About Variants? Text • Some scripts require variants o Code points that are “the same” to users • Two types: o Those that lead to “blocked” variants o Those that lead to “allocatable” variants • Procedure: o Maximize number of blocked variants, and minimize the number of allocatable variants #ICANN51
  • 9. More on Variants Text • Variant mappings will be used to automatically generate all permutations (variant labels) • Type of variant mapping determines whether: o To block a variant label (either variant or original can be allocated, not both) o To allow allocating it to the same applicant as original label • As result of integration, blocked variants can exist across GP repertoires o GP coordination will ensure consistent outcome #ICANN51
  • 10. What, Why and When of WLEs Text • Whole Label Evaluation Rules (WLE) • Why they are needed o Prevent labels that cannot be processed/rendered • When to consider o Generally affect “complex scripts” o Not intended to enforce “spelling rules” • Example: o Disallow vowel marks where they can’t be rendered: at the start or following other vowel marks, etc. #ICANN51
  • 11. Limitations Text • TLDs are intended for: o “Unambiguous labels with good mnemonic value” * • Not intended to capture all facets of a writing system o Should focus on modern, everyday use o OK not to support some conventions § e.g., disallowing apostrophe does not support the ‘s ending for names of businesses, hyphen disallowed in root o Some limits necessary to reduce systemic risks #ICANN51 *https://tools.ietf.org/html/draft-iab-dns-zone-codepoint-pples-02
  • 12. What Should Be Coordinated? Text • Repertoire: Consistent treatment of similar repertoires § Examples: Indic scripts • Variants: Compatible definition of variants § Examples: Han script, overlapping repertoires o Cross-script homoglyphs § Examples: Latin, Greek, Cyrillic • WLE: Consistent treatment of structurally similar scripts § Examples: Indic scripts, definition of matra #ICANN51
  • 13. Text Resources • Considerations for Designing a Label Generation Ruleset for the Root Zone • https://community.icann.org/download/attachments/43989034/Considerations-for-LGR-2014-09-23.pdf • Maximal Starting Repertoire (MSR-1) • https://www.icann.org/news/announcement-2-2014-06-20-en • https://www.icann.org/en/system/files/files/msr-overview-06jun14-en.pdf • Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels • https://www.icann.org/en/system/files/files/draft-lgr-procedure-20mar13-en.pdf • Representing Label Generation Rules in XML • https://tools.ietf.org/html/draft-davies-idntables • Requirements for LGR Proposals • https://community.icann.org/download/attachments/43989034/Requirements%20for%20LGR%20Proposals.pdf • Variant Rules • https://community.icann.org/download/attachments/43989034/Variant%20Rules.pdf
  • 14. Text Challenges in Addressing Multiple Languages using Arabic Script Meikal Mumin Arabic Generation Panel IDN Root Zone LGR #ICANN51
  • 15. Representing scripts in a world of languages Text • abc.def is a Roman/Latin script IDN #ICANN51 is a Arabic script IDN • اﺍبﺏتﺕ.ثﺙجﺝحﺡ • But we do not know which languages are used by website of either IDNs • So International Domain Names (IDNs) have a script as property, but not a language. So what does this mean? o It means that IDNs cannot be based on the orthography of one language, such as Arabic language, but that… o LGR and related standards must therefore address the entire community of readers and writers of Arabic script • The problem is that, while we can only represent scripts, we think in terms of language o All data is at language level while we have to define LGR at script level o There are no institutions representing scripts communities o Writing is usually considered as a (reduced) representation of language • So what is the actual scope of Arabic script LGR?
  • 16. Scope of the Arabic script LGR Text • Arabic script is centered around Africa and the Middle east as a writing system but in the course of time it has expanded across nearly all continents, with established past or present use in o the Americas, (Western, Central, Southern, and Eastern) Europe, (nearly all areas of) Asia, Africa (North and South of the Sahara) o Only within Africa, there is attested past or present use of Arabic script for the writing of 80+14 African languages apart from Arabic (Mumin 2014) o With todays patterns of migrations, continuing proselytization, and population growth, more user communities of Arabic script are manifesting in both the Global South and North • Accordingly, Arabic script is used not just locally or regionally but globally, albeit to radically different degrees and in entirely different manners, since… o for numerous languages, Arabic script is in active competition with other scripts, and… o for numerous languages, Arabic script is used only by a part of the language community o It is not foreseeable how the situation will evolve in the future and what the impact of IDNs would be on the community o To give a more extreme example – Would a language community possibly care if they can register a domain name using the orthography of their language if any reading and writing is only done with pen & paper? #ICANN51
  • 17. Representing the underrepresented Text • Unfortunately, this linguistic diversity is not well represented o There is a lack of data on languages and orthographies o Particularly languages of low status or socio-economic participation lack representation o There is little available on non-western orthographies, while non-standardized orthographies are generally not considered o Often much TF-AIDN has to rely on users intuitions from an entirely different part of the script community o E.g. during code-point analysis, we frequently lacked data to establish whether a code point is used optionally or obligatorily in a given orthography, which required within the current process #ICANN51
  • 18. Qualifying and quantifying script use: The EGIDS scale Text • Security and stability of DNS and the root zone are highly important, and therefore conservatism is a strong principle surrounding IDNs "Where the Integration Panel was able to establish to its satisfaction that a given code point was assigned a character solely for use in a disused orthography, or for a language in serious decline, the code point has been removed from the MSR.” #ICANN51 Maximal Starting Repertoire — MSR--‐1 Overview and Rationale, REVISION – June 6, 2014, p. 22 o MSR dictates that the Expanded Graded Intergenerational Disruption Scale [EGIDS] (Lewis and Simons 2010) is used to categorize the “effective demand” of languages within a given country: § The EGIDS consists of 13 levels, ranking languages from the highest representation and role in society, being a National language, to the lowest, extinction § “For the MSR the IP used the cut-off between EGIDS level 4 [Educational] and level 5 [Developing].” • Unfortunately, such representation of language in society is not just accidental but usually a result of historical processes
  • 19. Text "Scripts divide languages into cultures, make dialects into new distinct languages, and create new dialects. […] If, as is often said, ‘A language is a dialect with an army and navy’, how much more is it ‘a dialect with a distinct script’!” (Warren-Rothlin 2014: 264) #ICANN51
  • 20. People, society, language and the role for IDNs Text • Languages and scripts are o …evaluated by people (Language attitude) o …assigned a status by both societies and scientists (Dialect vs. language) o …and regulated by governments (Language policy) o and this is reflected also in studies and statistics on languages § There have even been historical reports of orthography suppression of Arabic script, where the use of writing systems has been banned and criminalized • We must be cautious not to strengthen further trends of linguistic discrimination and strive for equal treatment of languages, even where they lack socio-economic participation or political representation o TF-AIDN did identify 32 code-points during the analysis, with evidence of use but which cannot be included in LGR because they do not have an EGDIS rating higher than 5 #ICANN51
  • 21. Example #1 – Code point analysis and issues with EGIDS Text data • Example Seraiki [ISO 639-3: skr]: o Seraiki is a language of Pakistan o There are numerous publications in Seraiki, including daily newspapers o Within Pakistan, Seraiki has an EGIDS rating of 5 (Written) o IP recommends excluding any language with an EGIDS rating lower than 4 • Example Harari [ISO 639-3: har]: o Harari is a language of Ethiopia o There are significant expatriate communities, which seem to be very active § E.g. The Australian Saay Harari Association, which published an orthography description and a virtual keyboard with the assistance of the State Library of Victoria, Australia, in 2009 o Within Ethiopia, Harari has an EGIDS rating of 6a (Vigorous), while it has not status in Australia o Because of the activity of the expatriate community, TF-AIDN assumes an active use of the orthography and would suggest inclusion of relevant code points o Unfortunately, this is not possible within the current process stipulated by IP #ICANN51
  • 22. A-priori principles and a-posteriori analysis Text • In the case of Arabic script IDNs, ICANN has tasked two groups to work together to develop the Label Generation Rules (LGR) o Integration Panel (IP) has developed the “Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels”, as well as the Maximal Starting Repertoire (MSR-1) o On the basis of the procedure and the MSR-1, the Task Force on Arabic Script IDNs (TF-AIDN), should formulate the LGR, which is then approved by IP o Accordingly, rules have been laid out by IP before observation and analysis of data was conducted by TF-AIDN • Therefore MSR and the LGR development process has been designed before an (ideally data driven) code point analysis could be conducted by script generation panels o TF-AIDN noticed this, being the first script generation panel to take up work o Accordingly, TF-AIDN did suggest to IP as public comment to MSR-1 that § MSR-1 should only be frozen one script at a time § after relevant script Generation Panel has been formed and given its feedback on its relevant portion o Unfortunately, IP considered this as an effective request for removal of MSR1 #ICANN51
  • 23. Example #2 - Variants Text • Variants are required to balance the usability of IDNs as well as the representation of languages against security and stability of DNS and the root zone • Arabic Case Study Team Issues Report has published a report, identifying 6 types of variants in Arabic script. Two examples: So how can we reasonably argue that this difference in letter shape is not confusable by all readers and across all representations and #ICANN51 fonts… ...while this difference is confusable to at least a subset of readers or in a subset of representations and fonts… • …when there are no empiric scientific tests to support either theory? • …when there is a systemic bias in representation with even within our group (as 15 out of 29 members are first language speakers of Arabic)?
  • 24. Text #ICANN51 تﺕرﺭیﯼمﻡاﺍ کﮎاﺍسﺱیﯼہﮧ شﺵكﻙرﺭاًﺍ شﺵکﮎرﺭیﯼہﮧ بﺏاﺍ سﺱپﭖاﺍسﺱ تﺕشﺵكﻙرﺭ Thank You
  • 25. Text Coordination between Chinese, Japanese and Korean Scripts Wang Wei Chinese Generation Panel IDN Root Zone LGR #ICANN51
  • 26. The Historical Changes of Chinese Character in East Asia Text Second century BC to 5th century AD In the modern Hangul-based Korean writing system, Chinese characters (Hanjia) are no longer officially used, but still sometimes used occasionally in daily life. Chinese characters (Kanji) were adopted from the 5th century AD. All three scripts (kanji, and the hiragana and katakana syllabaries) are used as main scripts. Hanzi unification in the Qin dynasty (221-207 B.C.) Now, two writing systems: Simplified Chinese (SC) and Traditional Chinese (TC). SC and TC have the same meaning and the same pronunciation, are typical variants. TC: Taiwan, Macau, Hong Kong SC: Mainland China, Singapore TC & SC: Malaysia
  • 27. Relationship of Chinese Characters in Three Scripts Text In ISO 15924, the script for Chinese characters is mainly defined in this specification: • ISO 15924 code: Hani • ISO 15924 no: 500 • English Name: Han (Hanzi, Kanji, Hanja)
  • 28. TeSxLt D/TLD Chinese Character IDN Registration CDNC Character Table and Registration Rules under RFC 3743/4713 SLD: .CN, .TW, .HK, .SG, .ASIA TLD: .中国, .台湾, .香港 JPRS IDN Registration SLD: .JP KISA: NO Chinese character registration under .KR So Far 19535(CGP) 19537 (CDNC) 618 6
  • 29. TeVxat riant Solutions in Different Scripts CDNC: RFC 3743 & 4713 • Allocate all Applied-for IDL and Variant IDLs to the same registrant • Delegate Applied-for IDL, Preferred SC IDL, Preferred TC IDL • Reserve all the other variant IDLs • Delegate reserved variant IDLs when requested at a later date JPRS: No Variant issue Among Kanji characters, some are in a simplified form (called the “new character form”), derived from the traditional imported form (called the “old character form”). It is appropriate to distinguish new and old forms as different and independent characters instead of pure variants. This understanding has been reflected in the IANA IDN table developed by the JPRS, in which no variants are identified for Kanji. KISA: No Variant issue, so far … Hanja is no longer widely used in the ROK. A law enacted in 2011 orders all ROK official government documents to be written ONLY in Hangul. KISA stated that its SLD IDN policy does not allow and nor does they have any intention of allowing the use of Hanja in their domestic market.
  • 30. TeCxto ordination Principle Each CJK panel creates an LGR and each LGR includes a repertoire and variants. The variant mappings must agree for the same code point for all LGRs. The variant types may be different (blocked or allocatable), the variant types do not have to agree across LGRs. The repertoires may be different. Allocatable A potential allocation rule says that once the variant label is generated, that variant label may be allocated to the applicant for the original label. Blocked A blocking rule says that a particular label must not be allocated to anyone under any circumstances.
  • 31. Example to Illustrate: Case Study 0 Appendix F of draft-lgr-procedure-20mar13-en.pdf. Text For CGP Code Point Allocatable Variant Blocked Variant Tag 爱 U+7231 愛 U+611B - und-hani 愛 U+611B 爱 U+7231 - und-hani For JGP, probably Code Point Allocatable Variant Blocked Variant Tag 愛 U+611B - - und-jpab Code Point Allocatable Variant Blocked Variant Tag 爱 U+7231 愛 U+611B - und-hani 愛 U+611B 爱 U+7231 - und-hani 愛 U+611B - 爱 U+7231 und-jpan Applying for U+611B using the und-jpan blocks the use of U+7231 in the same location in any label, no matter which tag it is applied under. This is so, even though U+7231 is not a character in Japanese at all and does not appear in the tagged repertoire und-jpan. Because it is not part of that repertoire, it cannot be used in any label applied for with the und-jpan tag.
  • 32. TePxrt ogress of CGP, JGP and KGP CGP: Formal establishment announcement on 24 September. (https://www.icann.org/news/announcement-2014-09-24-en) Draw up initial repertoire and variant type definition in XML format. Provided some coordination study case for IP and K/J. JGP: Not seated yet ? ? KGP: Not seated yet 2014.08.21: KLGP domestic meeting. 2014.08.26: Joint meeting with Han Chuan LEE and other attendees 2014.09.03: CJK people discussion
  • 33. TeCxtG P Repertoire and Variant Type In 2004, according to RFC 3743 and RFC 4713, CDNC submitted to IANA a unified Chinese Character Set (19520 characters) for domain name registration, building up mapping relationships between any given simplified character, its traditional character(s) and its variant(s). In 2012, CDNC added 17 more Chinese characters as requested by Hongkong community, increasing the set number to 19537. But only 15 of those 17 characters are included in MSR-1. • Thus CGP takes the intersection of MSR-1 and the latest version of CDNC character set, amounting to 19535 characters, excluding Latin Hyphen, digits and letters. • Following CDNC registration rule and RFC 3743 & 4713, CGP take the second column (the preferred variants) as “allocatable,” while the rest of the variants as “blocked.” <char cp="575D" tag="sc:Hani"> <var cp="575D" type="simp" comment="identity" /> <var cp="57BB" type="block" /> <var cp="58E9" type="trad" /> </char> <char cp="57BB" tag="sc:Hani"> <var cp="575D" type="simp" /> <var cp="57BB" type="block" comment="identity" /> <var cp="58E9" type="trad" /> </char> <char cp="58E9" tag="sc:Hani"> <var cp="575D" type="simp" /> <var cp="57BB" type="block" /> <var cp="58E9" type="trad" comment="identity" /> </char> Code Point Allocatable Variant Blocked Variant Tag 坝(575D) 壩(58E9)   und-hani 坝(575D) 垻(57BB) und-hani 垻(57BB) 坝(575D) und-hani 垻(57BB) 壩(58E9) und-hani 壩(58E9) 坝(575D) und-hani 壩(58E9) 垻(57BB) und-hani
  • 34. CTeGxt P’s Perspective for Variant Mapping Coordination • CGP is aware that the coordination can not be achieved by one party. • CGP is tremendously open to make an unified variant mapping table working together with JGP and KGP. • CGP is ready to modify the initial repertoire and variant type annotation according to the coordination result, and if necessary, to delete some code points to avoid complicated conflicts. • Those UNIQUE Chinese character codes in JGP and KGP are NOT to be added into CGP repertoire.
  • 35. TeCxta se Study 1 All code points are included in CGP initial repertoire and regarded as variants of each other. The mapping relationship in RFC 3743 format is as follows: • 一4E00 (0); 一4E00(86),一4E00(886); 一(0),壱(0),壹(0),弌(0); • 壱58F1 (0); 壹58F9(86),壹58F9(886); 一(0),壱(0),壹(0),弌(0); • 壹58F9 (0); 壹58F9(86),壹58F9(886); 一(0),壱(0),壹(0),弌(0); • 弌5F0C (0); 一4E00(86),一4E00(886); 一(0),壱(0),壹(0),弌(0); Meanwhile, all code points are included in JPRS IDN table as well. (http://www.iana.org/domains/idn-tables/tables/jp_ja-jp_1.2.html) There is no mapping relationship among them. • 一 4E00(2,3);4E00(2,3); # 16-76, CJK UNIFIED IDEOGRAPH-4E00 • 壱 58F1(2,3);58F1(2,3); # 16-77, CJK UNIFIED IDEOGRAPH-58F1 • 壹 58F9(2,3);58F9(2,3); # 52-69, CJK UNIFIED IDEOGRAPH-58F9 • 弌 5F0C(2,3);5F0C(2,3); # 48-01, CJK UNIFIED IDEOGRAPH-5F0C
  • 36. TeCxta se Study 1 Code Point Allocatable Variant Blocked Variant Tag 一 (U+4E00) - 壱 (U+58F1) und-hani 一 (U+4E00) -­‐ 壹 (U+58F9) und-hani 一 (U+4E00) -­‐ 弌 (U+5F0C) und-hani 壹 (U+58F9) - 一 (U+4E00) und-hani 壹 (U+58F9) -­‐ 壱 (U+58F1) und-hani 壹 (U+58F9) -­‐ 弌 (U+5F0C) und-hani 弌 (U+5F0C) 一(U+4E00) - und-hani 弌 (U+5F0C) -­‐ 壹 (U+58F9) und-hani 弌 (U+5F0C) -­‐ 壱 (U+58F1) und-hani 壱 (U+58F1) 壹(U+58F9) - und-hani 壱 (U+58F1) -­‐ 一 (U+4E00) und-hani 壱 (U+58F1) -­‐ 弌 (U+5F0C) und-hani 一 (U+4E00) - - und-jpan 壹 (U+58F9) - - und-jpan 弌 (U+5F0C) - - und-jpan 壱 (U+58F1) - - und-jpan
  • 37. TeCxta se Study 2 The code point and its variant(s) exist separately in CGP and JGP • 刊 (U+520A) # in CGP and JGP • 刋 (U+520B) # in CGP and JGP • 栞 (U+681E) # only in JGP In CGP repertoire, the mapping is: • 刊520A (0);刊520A(86),刊520A(886);刊(0),刋(0); • 刋520B (0);刊520A(86),刊520A(886);刊(0),刋(0); In JPRS table,code points are: • 刊 520A(2,3);520A(2,3); • 刋 520B(2,3);520B(2,3); • 栞 681E(2,3);681E(2,3);
  • 38. TeCxta se Study 2 Though 栞(U+681E) is not included in CGP repertoire, but it is regarded as the variant of 刊(U +52-A) and 刋(U+520B) in ancient Chinese literature and some local areas. CGP would like to extend the CGP repertoire by adding 栞(U+681E) and build up the variant relationship. Code Point Allocatable Variant Blocked Variant Tag 刊(U+520A) - 刋(U+520B) und-hani 刊(U+520A) -­‐ 栞(U+681E) und-hani 刋(U+520B) 刊(U+520A) -­‐ und-hani 刋(U+520B) -­‐ 栞(U+681E) und-hani 栞(U+681E) 刊(U+520A) -­‐ und-hani 栞(U+681E) -­‐ 刋(U+520B) und-hani 刊(U+520A) - - und-jpan 刋(U+520B) - - und-jpan 栞(U+681E) - - und-jpan
  • 39. TeCxta se Study 3 The code point ONLY exists in JPRS table: • 辻(U+8FBB) ‘辻’ does NOT exist in CGP now and traditionally, it is regarded as a Japanese UNIQUE character code. If CGP linguistic experts keep the viewpoint that ‘辻’ is not associated any code point in CGP repertoire, CGP will not add this code point into CGP repertoire: Code Point A l l o c a t a b l e Variant Blocked Variant Tag 辻(U+8FBB) - - und-jpan
  • 40. TeExxt pectation for JGP and KGP • Generate the repertoire and variant type annotation ASAP • JGP: Kanji repertoire and variant type annotation • KGP: Allow Hanjia? >> Hanjia repertoire and variant type annotation • Work together on the unified variant mapping table for the overlapped code points • Case 0: jpan or kore tagged code point block hani variant • Case 1: NO change to any variant type annotation • Case 2: jpan or kore tagged code point added into hani variant • Case 3: jpan or kore UNIQUE code points • Case 4: … • Revise each panel’s repertoire and variant type annotation and cross-check the consistency and potential conflicts. • Generate each panel’s Whole Label Generation Rule and cross-check the consistency and potential conflicts. KGP: ???? JGP: 6356 6186 CGP: 19535 ???
  • 41. TeCxth allenges… • Postponed work plan - Synchronization between C, J and K - Extension from 31 Dec 2014 to 2015 • Repertoire Modification - Negotiation among three panels’ linguistic experts - Code points extension or reduction - Variant type annotation changes • Whole Label Generation Rule Set • Each panel SHOULD be aware of PROs and CONs of the language tag based solution • Focus on the techniques and best-practice
  • 43. Text Coordination between Neo- Brahmi Scripts Nishit Jain Neo-Brahmi Generation Panel IDN Root Zone LGR #ICANN51
  • 45. What is Brahmi? • An ancient script • Most of the modern scripts in Indian subcontinent have been derived from Brahmi. • Geographically the scripts being used in Central Asia, South Asia and South-East Asia • These scripts are used by multiple language families: Largely by Indo-Aryan and Dravidian Brahmi script engraved on Ashoka Pillar in 3rd 45 century BCE Source: http://en.wikipedia.org/wiki/Brahmi_script
  • 46. Why Brahmi? 46 • Despite their variations in the visual forms, the basic philosophy in their usage is common • They all are “akshar” driven, and follow a specific syntax o Analogical reference can be made to Indian National standard, IS 13194:1991 Section 8 • This syntax being the implicit foundation in representation of these scripts in the digital medium, adherence to the structure acts as a obligatory security consideration even in the case of Internationalized Domain Names.
  • 47. Why Neo-Brahmi? • Of all the scripts derived from “Brahmi,” not all are in modern usage • Approach is in consonance with the "Conservatism Principle" of the LGR procedure. 47
  • 48. Previous Similar Work 48 • For IDN version of “.in” ccTLD, (.bharat) equivalent in 22 Official Indian Languages, similar exercise had been carried out • Following things were finalized for each language – Permissible set of code points – Visually similar variant strings – Complex whole label evaluation rules • Recently .भारत ccTLD has been launched in Devanagari script covering Hindi, Marathi, Konkani, Boro, Dogri, Maithili, Nepali and Sindhi.
  • 49. Revisiting the Rules in Context of LGR Framework 49 • LGR work is different in following contexts – Wider stakeholder group – Overarching principles in the LGR procedure • Especially Simplicity and Predictability principles • This revision, however, would not change – The need for the well-formedness of the label in terms of Akshar formalism
  • 50. Neo Brahmi GP - Current Status 50 • Currently the group is 10 members • Mixed bag of expertise like linguistic, Unicode Udaya Narayana Singh Raiomond Doctor Mahesh D. Kulkarni Anupam Agrawal Akshat S. Joshi Abhijit Dutta N. Deiva Sundaram Neha Gupta Nishit Jain Prabhakar Pandey • We are in process of getting more members on-board
  • 51. Neo Brahmi GP – Outreach Efforts • Conducted a workshop in AprIGF-2014 for awareness and call for participation in LGR procedure. o Topic: “Bringing diverse linguistic communities together for a unified IDN 51 ruleset” o The panel discussion touched upon the various aspects of creation of the LGR for the Neo-Brahmi scripts o http://2014.rigf.asia/agenda/workshop-proposals/workshop-proposal-13/ • Participation and presentation in ICANN 49 public meeting at Singapore • Participation and presentation in ICANN 50 public meeting at London Reaching out to the community for wider participation
  • 52. Cross-Script Similarities • Code point similarity across scripts • Cases where Devanagari-­‐Gujara: and Devanagari-­‐Gurumukhi strings look similar. 52
  • 53. Neo Brahmi GP – Approach
  • 54. Neo Brahmi GP – Approach 54 • There are cases of: – One script, one language – One script, multiple languages • Multiple sub-groups may exist to ensure proper representation of each language • Each sub-group ideally would comprise of – Language expert(s) – Community representative(s)
  • 55. Neo-Brahmi GP Internal Composition Integration Panel Neo-­‐Brahmi GP Devanagari SG Hindi Marathi Konkani Nepali Bodo Dogri Maithili Santhali … Tamil Sub-­‐Group Telugu SG Gujara: SG Gurmukhi SG … Bengali SG Bangla Assamese Manipuri … 55
  • 56.
  • 57. Text Coordination between Cyrillic, Greek and Latin Scripts Cary Karp Latin Generation Panel IDN Root Zone LGR #ICANN51
  • 58. Text • Apsyeoxic — two words that appear to be spelled identically but are actually sequences of characters from different scripts are said to be apsyeoxic /æpsi ˈaːksɪk/. This term is derived from the graphic similarity between the string of Roman letters apsyeoxic and the visually confusable string of Cyrillic letters арѕуеохіс o http://dictionary.sensagent.com/ #ICANN51
  • 59. Text • Culling Cyrillic and Latin code points from the MSR which are commonly represented with congruent glyphs: o Latin § aäæcçdeëəәhiïoöpsxyÿʒ o Cyrillic § аӓӕсҫԁеёəәһіїоӧрѕхуӱӡ #ICANN51
  • 60. Text • Adding Greek and admitting closely similar, but not identical glyphs: o Cyrillic § а ҫїко р o Greek § αβγςϊκοόρν o Latin § ɑßɣçï oópv • The extent of the problem crossing all three scripts does not appear particularly great #ICANN51
  • 61. Text • Stepping away from both IDNA and the MSR and considering uppercase: o Cyrillic § АГВЕНІКМ ОПРТФХ o Greek § ΑΓΒΕΗΙΚΜΝΟΠΡΤΦΧΥΖ o Latin § A BEHIKMNO PTɸXYZ • IDNA expects issues relating to case to be resolved before the protocol is invoked • This does not mean that such issues are irrelevant #ICANN51
  • 62. Text • This does mean that if the LGR panels are to address cross-script issues, they may also need to deal with collateral details that lie outside the current scope of the initiative #ICANN51
  • 64. Text Engage with ICANN on Web & Social Media twitter.com/icann facebook.com/icannorg linkedin.com/company/icann gplus.to/icann weibo.com/icannorg flickr.com/photos/icann youtube.com/user/ICANNnews icann.org