Localizing your appsfor multibyte languagesKen ISHIMOTO (K’s Room Japan)
Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language wi...
Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language wi...
Part 1 - WebObject• Eclipse• Ant build• Properties (to make WebObjects ready)• Database
Eclipse• Set your Workspace to UTF-8if you not do that you can getall kind of problems, alsohaving not English Code inSour...
Ant build• Set your Ant Compile task script to UTF-8
Properties in you APP• This are the Properties that we use• file.encoding=UTF-8• er.extensions.ERXApplication.DefaultEncodi...
CSS@charset "UTF-8";
Javascript<script type="text/javascript" charset="UTF-8">
Database - MySQL• MySQL = &useUnicode=true&characterEncoding=UTF-8don’t forget to create a ‘utf8’ database
Database - FrontBaseNothing to do, just works
Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language wi...
Part II - What is a multibyteLanguage (Japanese)• Basics• Alphabet (How works Japanese)• Encoding (What Encoding I have to...
Basics• This is a sample Page from a Book• a Book starting reading from right to left, soyou open it where usually close i...
Spaces between Words• This is a pen.• これはペンです。• Today we have a good weather in Tokyo.• 今日、東京はとてもいい天気です。 also a big proble...
yen symbol vs backslash• If you’re familiar with the Japanese keyboard, the backslash key () is replaced by the symbol for...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
Japanese Alphabet•漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Wo...
漢字 Kanji• The complexity of this Characters• The vast majority of these are not in common use in either Japan or China; as...
Encoding of 生• UNICODE : 751F• UTF-8 : E7 94 9F• Shift-JIS : 90B6 A character can have not only 16 bit, and todaymultibyte...
Pronunciation : 生• ON : Chinese-style reading for kanji.ショウ, ショウ_ジル, ショウ_ズル, ジョウ, セイ, ゼイShou, Shou_jiru, Shou_zuru, Jou, S...
difference between Countries手紙Letter Toilet paperJapanese and Chinese are very differenteven if there are some Kanji’s tha...
Character : 生• 生きる Ikiru ..... live, living , alive• 生クリーム Nama kuri-mu ..... fresh cream• 生涯 Shougai ..... lifetime• 生命 S...
Japanese Alphabet• 漢字 Kanji (Chinese characters)•ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Wo...
ひらがな Hiragana• Hiragana is a Japanese syllabary,one basic component of theJapanese writing system.• Hiragana is used to wr...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)•カタカナ Katakana (Foreign Words)• ローマ字 Ro...
カタカナ Katakana• Katakana is a Japanese syllabary, onecomponent of the Japanese writing system.• In contrast to the hiragana...
Half-width kana 半角カナ• Half-width kana (半角カナ Hankaku kana) are katakana characters displayed at half their normal width (a2...
String s1 = "アナタ";String s2 = "アナタ";ERXStringUtilitiesEXTENDED.changeHanKatakanaToZenkakuKatakana(s1);// RESULT = "アナタ"s1....
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
NUMBER 数字
NUMBER 数字• like with Space also Numbers havevariations.• single Byte (Hankaku)• double Byte (Zenkaku)• chinese Char versio...
• Hankaku (Single) - 0123456789• Zenkaku - 0123456789• Kanji - 0 is 零 or 〇1 is 一 or 壱  /  2 is 二 or 弐  /  3 is 三 or 参四五六七八...
String s1 = “0123456789”;String s2 = "0123456789";ERXStringUtilities.isDigitsOnly(s1);// RESULT = trueERXStringUtilities.i...
replace double to singleString s = "0123456789";ERXStringUtilitiesEXTENDED.changeZenkakuNumberToHanNumber(s);// RESULT = “...
LETTER 英字
LETTER 英字• Everybody loves the simple 26characters, that in most School takes2 years to learn.• In some Countries there ar...
LETTER 英字• There is for each Letter a doublebyte Letter• ‘U‘ == ‘U ’to convert every Letterinto single size beforestoring ...
String s1 = "BC";String s2 = "BC";s1.equalsIgnoreCase(s2);// RESULT = falses1 = ERXStringUtilitiesEXTENDED.changeZenkakuEi...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
Sign 記号
Sign 記号• For each Sign there is a double bytecounterpart• ‘!‘ == ‘! ’to convert every Sign intosingle size before storing ...
String s1 = "!@#$%^&*()";String s2 = "!@#$%^&*()";s1 = ERXStringUtilitiesEXTENDED.changeZenkakuKigouToHanKigou(s2);// RESU...
SPACE スペース
SPACE スペース• String a = “ “;• String b = “ ”;a == space charb == double-size space charto convert every Numberinto single s...
// head and tail are 3 space charsString s = “ A B C ”;s.trim();// RESULT = ‘A B C’ERXStringUtilities.trimString(s);// RES...
// head and tail are 3 japanese ZENKAKU(double byte) space charsString s = “   A B C   ”;s.trim();// RESULT = ‘   A B C   ...
// between A and B are 2 single space + 2 double space + 2 single spaceString s = “A    B”;s.replace(" ", "");// RESULT = ...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
絵文字 Emoji (Smilies)
絵文字 Emoji (Smilies)• Emoji (絵文字); Japanese pronunciation: [emodʑi] is the Japanese term for theideograms or smileys used i...
WOEmojilast year WOWODC 2012, I spoke aboutSnoWOman CMS and there is a Framework namedWOEmoji, with using this Framework i...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
外字 Gaiji (Self-made characters)• Gaiji (外字), literally meaning "external characters", are kanji that are not represented i...
Gaiji 外字 Editor• This is a old Gaiji Editor, so the usercould make his own characters andthat was nice. it started with th...
ERXStringUtilitiesEXTENDED.delete_ModelDependenceCharacters(true, s, 200, false,false);Because i don’t have a Win Machine ...
Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign W...
Furigana 振り仮名• Furigana (振り仮名) is a Japanese reading aid, consisting of smaller kana, or syllabic characters, printednext ...
Encoding
Encoding• UTF-8• EUC-JP• Shift JIS• ISO/IEC 2022• and some more ...
UTF-8• UTF-8 (UCS Transformation Format—8-bit[1]) is a variable-width encoding that can represent everycharacter in the Un...
EUC-JP• EUC-JP Extended Unix CodeExtended Unix Code (EUC) is a multibyte character encoding system used primarily for Japa...
Shift JIS• Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the...
ISO/IEC 2022• ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISOstandard (eq...
Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language wi...
Localization ローカライズ• Localization of your App• Localization Data• Sorting
Localization of your App
ERXLocalizer// Writing Components and code with ERXLocalizer makes your life very easy// their are so many things you can ...
.stringsin your App ‘Resources’ folder create a folder with Language-name + ‘.lproj’make it a plist file with KeyValue.and ...
Localization Data
Localization of Data1.Attributes in Entity2. set Data in Edit-page3. Display the Attributedepending on the Localizer[[eo]]...
Sorting
Sorting 1name(how it is written)furigana(how it is pronounce)
Sorting 2林森漢字 Kanji(Chinese characters)Person 1 Person 2ひらがな Hiraganaorカタカナ Katakana(Japanese Alphabet)もり はやしMr. Mori Mr. ...
Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language wi...
WOdka improvements• Language-switching
WOdkaLanguageEnums• Language name• Locale Code• Date format + 24 hours setting• Data for Flag information
WOdkaCountryEnums• Country name• code2 : ISO Code for Country• code3 : ISO Code for Country• money : ERXMoneyEnums• langua...
Thanks to• Masahiko TANI - A10 Objects Inc., (Japan)• Hiroyuki FUKUI - Astonish Create (Japan)Special Thanks to• PaulYU - ...
ThankYouWOWODC2013
Localizing your apps for multibyte languages
Upcoming SlideShare
Loading in …5
×

Localizing your apps for multibyte languages

2,233 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,233
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Localizing your apps for multibyte languages

  1. 1. Localizing your appsfor multibyte languagesKen ISHIMOTO (K’s Room Japan)
  2. 2. Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language with WebObjects• Part IV - multibyte & WOdka
  3. 3. Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language with WebObjects• Part IV - multibyte & WOdka
  4. 4. Part 1 - WebObject• Eclipse• Ant build• Properties (to make WebObjects ready)• Database
  5. 5. Eclipse• Set your Workspace to UTF-8if you not do that you can getall kind of problems, alsohaving not English Code inSource can break thecompilation.
  6. 6. Ant build• Set your Ant Compile task script to UTF-8
  7. 7. Properties in you APP• This are the Properties that we use• file.encoding=UTF-8• er.extensions.ERXApplication.DefaultEncoding=UTF-8• er.extensions.ERXApplication.DefaultMessageEncoding=UTF-8• er.extensions.ERXLocalizationEditor.encoding=UTF-8• wodka.Application.LanguageEncoding={Japanese = UTF-8; }
  8. 8. CSS@charset "UTF-8";
  9. 9. Javascript<script type="text/javascript" charset="UTF-8">
  10. 10. Database - MySQL• MySQL = &useUnicode=true&characterEncoding=UTF-8don’t forget to create a ‘utf8’ database
  11. 11. Database - FrontBaseNothing to do, just works
  12. 12. Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language with WebObjects• Part IV - multibyte & WOdka
  13. 13. Part II - What is a multibyteLanguage (Japanese)• Basics• Alphabet (How works Japanese)• Encoding (What Encoding I have to use)
  14. 14. Basics• This is a sample Page from a Book• a Book starting reading from right to left, soyou open it where usually close it.• you read from right to left andfrom top to bottom• This can be very complex for Word-processingSoftware so XX Word isn’t a good choice towrite Books or Magazines.That’s also one Reasonwhy there are some Japanese Text Editor that cando that.
  15. 15. Spaces between Words• This is a pen.• これはペンです。• Today we have a good weather in Tokyo.• 今日、東京はとてもいい天気です。 also a big problem can bethat there are no spacesbetween words.
  16. 16. yen symbol vs backslash• If you’re familiar with the Japanese keyboard, the backslash key () is replaced by the symbol for theYen (¥).Way back when, we did a Japanese version of BRIEF, so I was familiar with this phenomenon—paths wouldbe separated byYen symbols, but everything worked as expected.• set the URL_A_chars to “$+!’,?;&@=#%><{}[]"~`^|*()”• completely failed to compile, because it looked like this:• set the URL_A_chars to “$+!’,?;&@=#%><{}[]¥"~`^¥¥|*()”• and ¥ didn’t escape as you’d expect.• If I create a new file, either on my system or the English only system I can use any font and type the  keyand I get the glyph. Side by side in this file I can use exactly the same font but when I type the symbol Iget the ¥ glyph. 
  17. 17. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  18. 18. Japanese Alphabet•漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  19. 19. 漢字 Kanji• The complexity of this Characters• The vast majority of these are not in common use in either Japan or China; as discussed below,approximately 2,000 to 3,000 characters are in common use in Japan, a few thousand more find occasionaluse, and a total of about 13,000 characters can be encoded in various Japanese Industrial Standards forkanji.• Kyōiku kanji The Kyōiku kanji (教育漢字, "education kanji") are 1,006 characters that Japanese childrenlearn in elementary school.• Jōyō kanji The Jōyō kanji (常用漢字, "regular-use kanji") are 2,136 characters consisting of all the Kyōikukanji, plus 1,130 additional kanji taught in junior high and high school. In publishing, characters outside thiscategory are often given furigana.• Jinmeiyō kanji Since September 27, 2004, the Jinmeiyō kanji (人名用漢字, "kanji for use in personal
  20. 20. Encoding of 生• UNICODE : 751F• UTF-8 : E7 94 9F• Shift-JIS : 90B6 A character can have not only 16 bit, and todaymultibyte characters can also have more than 32bit. so it is difficult to say in a database the namefield has only 20 varchar. That would be enough forsome Languages but in UTF-8 that can be only a fewchars long and not enough.生
  21. 21. Pronunciation : 生• ON : Chinese-style reading for kanji.ショウ, ショウ_ジル, ショウ_ズル, ジョウ, セイ, ゼイShou, Shou_jiru, Shou_zuru, Jou, Sei, Zei• KUN : Japanese-style reading for kanji.イ_カス, イ_キ, イ_キル, イ_ケル, ウ_マレ, ウ_マレル, ウ_ム, ウブ, ウマ_レ, ウマ_レル, オ_イ, オ_ウ, キ, ナ_ス, ナ_ル, ナマ, ハ_エ, ハ_エル, ハ_ヤス, バ_エi_kasu, i_ki, i_kiru, i_keru, u_mare, U-mareru, u_mu ....• Special reading.アイ, イク, イケ, エ, オ, サ, ナリ, ニュウ, ヌク, フ, ブ, ム_ス, ヨイai, iku, ike, e, o, sa, nari, nyuu, nuku, fu, bu, mu_su, yoi• In China this get read : Shēng
  22. 22. difference between Countries手紙Letter Toilet paperJapanese and Chinese are very differenteven if there are some Kanji’s that looksthe some.It is like English and French, the sharesome Letters but can you read andunderstand it?
  23. 23. Character : 生• 生きる Ikiru ..... live, living , alive• 生クリーム Nama kuri-mu ..... fresh cream• 生涯 Shougai ..... lifetime• 生命 Seimei ..... life• 生む Umu ..... bornWe can see that 1 Kanji can have a lot ofdifferent meanings, and pronunciations.So it makes 100% no sense to sort aDatabase with Kanji’s.People wouldn’t find the Data where theexcepted. And the sort would be only aUnicode Sort that has no meaning.every Char is very easy touse and access, no specialtreatment is necessary.
  24. 24. Japanese Alphabet• 漢字 Kanji (Chinese characters)•ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  25. 25. ひらがな Hiragana• Hiragana is a Japanese syllabary,one basic component of theJapanese writing system.• Hiragana is used to write nativewords for which there are nokanji, including grammaticalparticles , and suffixes such as さん~san "Mr., Mrs., Miss, Ms.". every Char is very easy touse and access, no specialtreatment is necessary.
  26. 26. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)•カタカナ Katakana (Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  27. 27. カタカナ Katakana• Katakana is a Japanese syllabary, onecomponent of the Japanese writing system.• In contrast to the hiragana syllabary, which isused for those Japanese language words andgrammatical inflections which kanji does notcover, the katakana syllabary is primarily usedfor transcription of foreign language words intoJapaneseevery Char is very easy to useand access, no specialtreatment is necessary.
  28. 28. Half-width kana 半角カナ• Half-width kana (半角カナ Hankaku kana) are katakana characters displayed at half their normal width (a2:1 aspect ratio), instead of the usual square (1:1) aspect ratio.• Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to bedisplayed on the same grid as monospaced fonts of Latin characters.• Half-width hiragana or kanji were not used.• Half-width kana characters are not generally used today, but find some use in specific settings, such as cashregister displays, on shop receipts, and Japanese digital television and DVD subtitles.注意!those kind of char’s can be a pain, so a good program will make aconversion from half to full size Katakana.
  29. 29. String s1 = "アナタ";String s2 = "アナタ";ERXStringUtilitiesEXTENDED.changeHanKatakanaToZenkakuKatakana(s1);// RESULT = "アナタ"s1.equalsIgnoreCase(s2)// RESULT = falses1.length()// RESULT = 3s2.length()// RESULT = 3Half-width kana 半角カナ
  30. 30. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)•ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  31. 31. NUMBER 数字
  32. 32. NUMBER 数字• like with Space also Numbers havevariations.• single Byte (Hankaku)• double Byte (Zenkaku)• chinese Char version (Kanji)
  33. 33. • Hankaku (Single) - 0123456789• Zenkaku - 0123456789• Kanji - 0 is 零 or 〇1 is 一 or 壱  /  2 is 二 or 弐  /  3 is 三 or 参四五六七八九to convert every Numberinto single size beforestoring in the database isthe easy way to go.
  34. 34. String s1 = “0123456789”;String s2 = "0123456789";ERXStringUtilities.isDigitsOnly(s1);// RESULT = trueERXStringUtilities.isDigitsOnly(s2);// RESULT = trues1.equalsIgnoreCase(s2);// RESULT = falseisDigitsOnly
  35. 35. replace double to singleString s = "0123456789";ERXStringUtilitiesEXTENDED.changeZenkakuNumberToHanNumber(s);// RESULT = “0123456789”
  36. 36. LETTER 英字
  37. 37. LETTER 英字• Everybody loves the simple 26characters, that in most School takes2 years to learn.• In some Countries there arevariations like German with ÜÖÄ
  38. 38. LETTER 英字• There is for each Letter a doublebyte Letter• ‘U‘ == ‘U ’to convert every Letterinto single size beforestoring in the database isthe easy way to go.
  39. 39. String s1 = "BC";String s2 = "BC";s1.equalsIgnoreCase(s2);// RESULT = falses1 = ERXStringUtilitiesEXTENDED.changeZenkakuEijiToHanEiji(s2);// RESULT = ‘BC’LETTER 英字
  40. 40. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)•記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  41. 41. Sign 記号
  42. 42. Sign 記号• For each Sign there is a double bytecounterpart• ‘!‘ == ‘! ’to convert every Sign intosingle size before storing inthe database is the easyway to go.
  43. 43. String s1 = "!@#$%^&*()";String s2 = "!@#$%^&*()";s1 = ERXStringUtilitiesEXTENDED.changeZenkakuKigouToHanKigou(s2);// RESULT = ‘!@#$%^&*()’Sign 記号
  44. 44. SPACE スペース
  45. 45. SPACE スペース• String a = “ “;• String b = “ ”;a == space charb == double-size space charto convert every Numberinto single size beforestoring in the database isthe easy way to go.
  46. 46. // head and tail are 3 space charsString s = “ A B C ”;s.trim();// RESULT = ‘A B C’ERXStringUtilities.trimString(s);// RESULT = ‘A B C’ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s);// RESULT = ‘A B C’trim
  47. 47. // head and tail are 3 japanese ZENKAKU(double byte) space charsString s = “   A B C   ”;s.trim();// RESULT = ‘   A B C   ’ERXStringUtilities.trimString(s);// RESULT = ‘   A B C   ’ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s);// RESULT = ‘A B C’better trim
  48. 48. // between A and B are 2 single space + 2 double space + 2 single spaceString s = “A    B”;s.replace(" ", "");// RESULT = ‘A  B’ERXStringUtilities.removeCharacters(s, " ");// RESULT = ‘A  B’ERXStringUtilitiesEXTENDED.changeZenkakuToHanKakaku(s).replace(" ", "");// RESULT = ‘ABC’remove Space between chars
  49. 49. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)•絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  50. 50. 絵文字 Emoji (Smilies)
  51. 51. 絵文字 Emoji (Smilies)• Emoji (絵文字); Japanese pronunciation: [emodʑi] is the Japanese term for theideograms or smileys used in Japanese electronic messages and webpages.• Emoji pictograms by au are specified using the IMG tag. SoftBank Mobile emojiare wrapped between SI/SO escape sequences, and support colors andanimation. DoCoMos emoji are the most compact to transmit while ausversion is more flexible based on open standards.If you are creating a CMS or Data Entry like Blog,Forum or whatever else, you will have to deal withthis Emoji. Japanese People loves to use it.
  52. 52. WOEmojilast year WOWODC 2012, I spoke aboutSnoWOman CMS and there is a Framework namedWOEmoji, with using this Framework it is easy toconvert Emojis for saving to the database and willautomatically working also on Windofs or Androiddevices.Version 2 of this Framework(working on it) canalso convert to the new open standard Emoji that isunder developing just right now in Japan.I am a payed supporter of this Project and waitingfor delivery, so WOEmoji can be updated.
  53. 53. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)•外字 Gaiji (Self-made characters)• 振り仮名 Furigana
  54. 54. 外字 Gaiji (Self-made characters)• Gaiji (外字), literally meaning "external characters", are kanji that are not represented in existingJapanese encoding systems.These include variant forms of common kanji that need to berepresented alongside the more conventional glyph in reference works, and can include non-kanjisymbols as well.Win XP : the had only a few 1000 Kanjis and it wasn’t easy to use someKanjis that was not available. so People started with creating their own,also the look was sometimes different.WinVista : you can see the font is a little different.But you have to buy this 1500 char Gaiji Package for about USD 500.-OS X : works out of the Box and it is free.
  55. 55. Gaiji 外字 Editor• This is a old Gaiji Editor, so the usercould make his own characters andthat was nice. it started with the firstversion of Win. but now with theInternet there is a problem, becauselot of People really recognize thatthis character can bee seen only onthis one machine, and after pushing itup via mail or data entry into adatabase, it looks different on everyother machine. so need to stripe outthis characters and give a feedbackto not use that.
  56. 56. ERXStringUtilitiesEXTENDED.delete_ModelDependenceCharacters(true, s, 200, false,false);Because i don’t have a Win Machine here, so I wasn’t able to create a Sample-string,but their is a command for deleting that kind of character Area.Gaiji 外字
  57. 57. Japanese Alphabet• 漢字 Kanji (Chinese characters)• ひらがな Hiragana (Japanese Alphabet)• カタカナ Katakana (Alphabet for Foreign Words)• ローマ字 Romaji (English characters)• 記号 Kigo (Sign)• 絵文字 Emoji (Smilies)• 外字 Gaiji (Self-made characters)•振り仮名 Furigana
  58. 58. Furigana 振り仮名• Furigana (振り仮名) is a Japanese reading aid, consisting of smaller kana, or syllabic characters, printednext to a kanji (ideographic character) or other character to indicate its pronunciation. It is typically usedto clarify rare, nonstandard or ambiguous readings, or in childrens or learners materials.
  59. 59. Encoding
  60. 60. Encoding• UTF-8• EUC-JP• Shift JIS• ISO/IEC 2022• and some more ...
  61. 61. UTF-8• UTF-8 (UCS Transformation Format—8-bit[1]) is a variable-width encoding that can represent everycharacter in the Unicode character set. It was designed for backward compatibility with ASCII and to avoidthe complications of endianness and byte order marks in UTF-16 and UTF-32.We use for every project UTF-8 now, and you aremostly save and have not take care about otherEncoding, but...
  62. 62. EUC-JP• EUC-JP Extended Unix CodeExtended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean,and simplified Chinese.• The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent charactersets containing a maximum of 94 characters, or 8836 (942) characters, or 830584 (943) characters, assequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four codedcharacter sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented withthe EUC scheme. G0 is almost always an ISO-646 compliant coded character set (e.g. US-ASCII/KS X1003/ISO 646:KR in EUC-KR and US-ASCII/the lower half of JIS X 0201 in EUC-JP) that is invoked on GL(i.e. with the most significant bit cleared).If you have to do work with some Win Machines itcan happen that you have to import Data that areencoded with this encoding.For my experience I never used that.
  63. 63. Shift JIS• Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for theJapanese language, originally developed by a Japanese company called ASCII Corporation in conjunctionwith Microzoft and standardized as JIS X 0208 Appendix 1.This is the most used encoding in Japan, and you canbe sure that if you get Data from an existingDatabase or have to connect to an Database youhave to deal with this.We did a lot of SJIS - UTF-8 conversion in the past.
  64. 64. ISO/IEC 2022• ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISOstandard (equivalent to the ECMA standard ECMA-35[1] ) specifying• a technique for including multiple character sets in a single character encoding system, and• a technique for representing these character sets in both 7 and 8 bit systems using the same encoding.You have only to deal with that if you do someMailing solutions, but I really don’t care about thatanymore, JavaMail works just fine.
  65. 65. Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language with WebObjects• Part IV - multibyte & WOdka
  66. 66. Localization ローカライズ• Localization of your App• Localization Data• Sorting
  67. 67. Localization of your App
  68. 68. ERXLocalizer// Writing Components and code with ERXLocalizer makes your life very easy// their are so many things you can do with it, so get comfortable with it.// Localized String from CodeERXLocalizer.defaultLocalizer().localizedStringForKey("Nav.Main");// Localized String in HTML<wo:str value = "$localizer.Nav.Main" /><wo:localized value="Nav.Main" />* This is a bad example because I am using the power of the ‘dark force’ Inline Binding. You shouldn’t do that,* but I use it always. Sorry I am a bad guy.
  69. 69. .stringsin your App ‘Resources’ folder create a folder with Language-name + ‘.lproj’make it a plist file with KeyValue.and save the File asUTF-16UTF-8with UTF-8 it is easier to read and also git commits can be viewed.
  70. 70. Localization Data
  71. 71. Localization of Data1.Attributes in Entity2. set Data in Edit-page3. Display the Attributedepending on the Localizer[[eo]].name_en()or[[eo]].name_jaor[[eo]].valueForKey("name")
  72. 72. Sorting
  73. 73. Sorting 1name(how it is written)furigana(how it is pronounce)
  74. 74. Sorting 2林森漢字 Kanji(Chinese characters)Person 1 Person 2ひらがな Hiraganaorカタカナ Katakana(Japanese Alphabet)もり はやしMr. Mori Mr. Hayashi
  75. 75. Localizing your apps• Part 1 - WebObject• Part II - What is a multibyte Language• Part III - Combine multibyte Language with WebObjects• Part IV - multibyte & WOdka
  76. 76. WOdka improvements• Language-switching
  77. 77. WOdkaLanguageEnums• Language name• Locale Code• Date format + 24 hours setting• Data for Flag information
  78. 78. WOdkaCountryEnums• Country name• code2 : ISO Code for Country• code3 : ISO Code for Country• money : ERXMoneyEnums• language :WOdkaLanguageEnums• telephone code• tax : tax info• zip : zip format• company Mailing Format• family Mailing Format• Localized words : male, female, sexMale, sexFemale• flag : Path to Flag-data• continent : ERXContinentEnums• EU : ERXEuropeanUnionsEnums"[S][CR][T][_][F][_][L]""[L] [F]様"family Mailing Formats = sext = titlef = first namel = last namecr = next line
  79. 79. Thanks to• Masahiko TANI - A10 Objects Inc., (Japan)• Hiroyuki FUKUI - Astonish Create (Japan)Special Thanks to• PaulYU - Green orchid llc (USA)
  80. 80. ThankYouWOWODC2013

×