Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ぐだ生 Java入門第三回(文字コードの話)(Keynote版)

2,411 views

Published on

Published in: Technology
  • Be the first to comment

ぐだ生 Java入門第三回(文字コードの話)(Keynote版)

  1. 1. Java( Unicode) 2011 4 16 twitter: @zaki50
  2. 2. Who am I• YAMAZAKI Makoto(twitter: @zaki50)• Android • • StickyShortcut• Java
  3. 3. • (CharacterSet) (Encoding)•• UTF-16• UTF-8
  4. 4. • Unicode 6.0 11•
  5. 5. • US ASCII• Shift JIS• JIS X 208• UCS-2• UCS-4
  6. 6. • Unicode • UTF-8 • UTF-16 • UTF-32 • ... ( : US ASCII, Shift JIS)
  7. 7. Unicode
  8. 8. Unicode•• Xerox Microsoft, Apple, Sun Microsystems, HP, JUST System The Unicode Consortium• iso10646
  9. 9. Unicode iso10646•• Unicode
  10. 10. Unicode• The Unicode Consortium ( http://www.unicode.org/ )••
  11. 11. • Unicode • •
  12. 12. ( , Ligature)• • (U+3075) + (U+309A) = • ( (U+3077) )
  13. 13. Unicode• Unicode Unicode
  14. 14. • NFC(Normalization Form C) • NFD(Normalization Form D) • NFKC(Normalization Form KC) • NFKD(Normalization Form KD)C(Composition, )/D(Decomposition, ) K(Compatibility, )
  15. 15. • NFC• MacOS X (HFS+) NFD• NFKC, NFKD
  16. 16. Unicode• C(Composition)• D(Decomposition)
  17. 17. Unicode• K(Compatibility) 1 : (U+3000) (U+0020) ( ) (5 )
  18. 18. Unicode K KCD
  19. 19. Eclipse
  20. 20. UTF-16
  21. 21. UTF-16• • Java String • Windows
  22. 22. CodePoint UTF-16U+0000-U+FFFF xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx U+010000-U 0000 0000 000u uuuu 1101 10ww wwxx xxxx +0010FFFF xxxx xxxx xxxx xxxx 1101 11xx xxxx xxxx x,u,w ∈ {0,1} wwww = uuuuu - 1
  23. 23. UTF-16• 2• 2
  24. 24. • 1 16 16bit * 2 • 1 16bit 16 20bit • U+D800-U+DFFF (11bit ) • 0xD800-0xDBFF, 0xDC00-xDFFF
  25. 25. • UTF-16 2 1 2U+3000 = 0x3000 0x30, 0x00 UTF-16BEU+3000 = 0x3000 0x00, 0x30 UTF-16LE
  26. 26. BOM(byte order mark)• U+FEFF • U+FFFE • U+FEFF ZERO WIDTH NON- BREAKING SPAEC
  27. 27. UTF-8
  28. 28. CodePoint bit bit U+00-U+7F 0xxx xxx 7bitsU+0080-U+07FF 110y yyyx 10xx xxxx 11bitsU+0800-U+FFFF 1110 yyyy 10yxx xxxx (10xx xxxx) * 1 16bits U+010000-U+1FFFFF 1111 0yyy 10yy xxxx (10xx xxxx) * 2 21bitsU+00200000-U+03FFFFFF 1111 10yy 10yy yxxx (10xx xxxx) * 3 26bitsU+04000000-U+7FFFFFFF 1111 110y 10yy yyxx (10xx xxxx) * 4 31bits x,y ∈ {0,1} y 1
  29. 29. UTF-8• US ASCII US ASCII• • 0x80-0xbf 2 ,• 2
  30. 30. • •
  31. 31. • UTF-8 1• BOM
  32. 32. BOM(byte order mark)• 0xEF, 0xBB, 0xBF• byte order mark• UTF-8
  33. 33. 1-6 2^31UTF-8 (1-4) (2^21)UTF-16 2-4 2^16-2*2^10+2^20UTF-32 4 2^16+2^20
  34. 34. • (CharacterSet) (Encoding)• 1• 1 (UTF-16 ) 2
  35. 35. • The Unicode Consortium(http://www.unicode.org/) • Unicode 6.0.0(http://www.unicode.org/versions/Unicode6.0.0/)• WikiPedia • Unicode(http://ja.wikipedia.org/wiki/Unicode) • UTF-8(http://ja.wikipedia.org/wiki/UTF-8) • UTF-16(http://ja.wikipedia.org/wiki/UTF-16)• Unicode (http://homepage1.nifty.com/nomenclator/ unicode/normalization.htm)

×