Your SlideShare is downloading. ×
0
Java(      Unicode)             2011 4    16            twitter: @zaki50
Who am I•          YAMAZAKI Makoto(twitter: @zaki50)•   Android    •    •   StickyShortcut•           Java
•           (CharacterSet)    (Encoding)•• UTF-16• UTF-8
•   Unicode 6.0    11•
• US ASCII• Shift JIS• JIS X 208• UCS-2• UCS-4
•                        Unicode    • UTF-8    • UTF-16    • UTF-32    • ...               (   : US ASCII, Shift JIS)
Unicode
Unicode•• Xerox              Microsoft, Apple, Sun    Microsystems, HP, JUST System          The Unicode Consortium•      ...
Unicode iso10646••    Unicode
Unicode•       The Unicode Consortium    ( http://www.unicode.org/ )••
•       Unicode    •    •
(            , Ligature)•    •   (U+3075) +   (U+309A) =    •                (             (U+3077) )
Unicode• Unicode  Unicode
• NFC(Normalization Form C) • NFD(Normalization Form D) • NFKC(Normalization Form KC) • NFKD(Normalization Form KD)C(Compo...
•      NFC• MacOS X          (HFS+)             NFD• NFKC, NFKD
Unicode• C(Composition)• D(Decomposition)
Unicode• K(Compatibility)         1     :       (U+3000)        (U+0020)             (     )    (5   )
Unicode     K        KCD
Eclipse
UTF-16
UTF-16•    • Java   String    • Windows
CodePoint                       UTF-16U+0000-U+FFFF xxxx xxxx xxxx xxxx       xxxx xxxx xxxx xxxx U+010000-U      0000 000...
UTF-16•   2•   2
• 1     16       16bit * 2 • 1         16bit                16                      20bit • U+D800-U+DFFF        (11bit   ...
• UTF-16       2       1       2U+3000 = 0x3000       0x30, 0x00   UTF-16BEU+3000 = 0x3000       0x00, 0x30   UTF-16LE
BOM(byte order mark)•                  U+FEFF    • U+FFFE    •                    U+FEFF               ZERO WIDTH NON-    ...
UTF-8
CodePoint                          bit                         bit   U+00-U+7F                         0xxx xxx           ...
UTF-8• US ASCII                US    ASCII•    • 0x80-0xbf   2   ,• 2
•    •
• UTF-8    1•         BOM
BOM(byte order mark)•                   0xEF, 0xBB, 0xBF• byte order mark• UTF-8
1-6          2^31UTF-8         (1-4)        (2^21)UTF-16   2-4     2^16-2*2^10+2^20UTF-32    4         2^16+2^20
•             (CharacterSet)       (Encoding)•             1•             1    (UTF-16                    )     2
• The Unicode Consortium(http://www.unicode.org/)  • Unicode 6.0.0(http://www.unicode.org/versions/Unicode6.0.0/)• WikiPed...
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
Upcoming SlideShare
Loading in...5
×

ぐだ生 Java入門第三回(文字コードの話)(Keynote版)

2,003

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,003
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "ぐだ生 Java入門第三回(文字コードの話)(Keynote版)"

    1. 1. Java( Unicode) 2011 4 16 twitter: @zaki50
    2. 2. Who am I• YAMAZAKI Makoto(twitter: @zaki50)• Android • • StickyShortcut• Java
    3. 3. • (CharacterSet) (Encoding)•• UTF-16• UTF-8
    4. 4. • Unicode 6.0 11•
    5. 5. • US ASCII• Shift JIS• JIS X 208• UCS-2• UCS-4
    6. 6. • Unicode • UTF-8 • UTF-16 • UTF-32 • ... ( : US ASCII, Shift JIS)
    7. 7. Unicode
    8. 8. Unicode•• Xerox Microsoft, Apple, Sun Microsystems, HP, JUST System The Unicode Consortium• iso10646
    9. 9. Unicode iso10646•• Unicode
    10. 10. Unicode• The Unicode Consortium ( http://www.unicode.org/ )••
    11. 11. • Unicode • •
    12. 12. ( , Ligature)• • (U+3075) + (U+309A) = • ( (U+3077) )
    13. 13. Unicode• Unicode Unicode
    14. 14. • NFC(Normalization Form C) • NFD(Normalization Form D) • NFKC(Normalization Form KC) • NFKD(Normalization Form KD)C(Composition, )/D(Decomposition, ) K(Compatibility, )
    15. 15. • NFC• MacOS X (HFS+) NFD• NFKC, NFKD
    16. 16. Unicode• C(Composition)• D(Decomposition)
    17. 17. Unicode• K(Compatibility) 1 : (U+3000) (U+0020) ( ) (5 )
    18. 18. Unicode K KCD
    19. 19. Eclipse
    20. 20. UTF-16
    21. 21. UTF-16• • Java String • Windows
    22. 22. CodePoint UTF-16U+0000-U+FFFF xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx U+010000-U 0000 0000 000u uuuu 1101 10ww wwxx xxxx +0010FFFF xxxx xxxx xxxx xxxx 1101 11xx xxxx xxxx x,u,w ∈ {0,1} wwww = uuuuu - 1
    23. 23. UTF-16• 2• 2
    24. 24. • 1 16 16bit * 2 • 1 16bit 16 20bit • U+D800-U+DFFF (11bit ) • 0xD800-0xDBFF, 0xDC00-xDFFF
    25. 25. • UTF-16 2 1 2U+3000 = 0x3000 0x30, 0x00 UTF-16BEU+3000 = 0x3000 0x00, 0x30 UTF-16LE
    26. 26. BOM(byte order mark)• U+FEFF • U+FFFE • U+FEFF ZERO WIDTH NON- BREAKING SPAEC
    27. 27. UTF-8
    28. 28. CodePoint bit bit U+00-U+7F 0xxx xxx 7bitsU+0080-U+07FF 110y yyyx 10xx xxxx 11bitsU+0800-U+FFFF 1110 yyyy 10yxx xxxx (10xx xxxx) * 1 16bits U+010000-U+1FFFFF 1111 0yyy 10yy xxxx (10xx xxxx) * 2 21bitsU+00200000-U+03FFFFFF 1111 10yy 10yy yxxx (10xx xxxx) * 3 26bitsU+04000000-U+7FFFFFFF 1111 110y 10yy yyxx (10xx xxxx) * 4 31bits x,y ∈ {0,1} y 1
    29. 29. UTF-8• US ASCII US ASCII• • 0x80-0xbf 2 ,• 2
    30. 30. • •
    31. 31. • UTF-8 1• BOM
    32. 32. BOM(byte order mark)• 0xEF, 0xBB, 0xBF• byte order mark• UTF-8
    33. 33. 1-6 2^31UTF-8 (1-4) (2^21)UTF-16 2-4 2^16-2*2^10+2^20UTF-32 4 2^16+2^20
    34. 34. • (CharacterSet) (Encoding)• 1• 1 (UTF-16 ) 2
    35. 35. • The Unicode Consortium(http://www.unicode.org/) • Unicode 6.0.0(http://www.unicode.org/versions/Unicode6.0.0/)• WikiPedia • Unicode(http://ja.wikipedia.org/wiki/Unicode) • UTF-8(http://ja.wikipedia.org/wiki/UTF-8) • UTF-16(http://ja.wikipedia.org/wiki/UTF-16)• Unicode (http://homepage1.nifty.com/nomenclator/ unicode/normalization.htm)
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×