Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unicode

1,223 views

Published on

Unicode. Philippe Marschall. ESUG 2008, Amsterdam

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Unicode

  1. 1. Ãfibercool
  2. 2. Outline • character sets (what) • encodings (how) • a bit intertwined • generalizations • Squeak specific • european examples
  3. 3. Character Sets sets of characters
  4. 4. ASCII • North America, United States • 128 characters
  5. 5. ISO-8859-1 (latin1) • Western Europe • Does not include € (ISO-8859-15 / latin-9) • ASCII + 128 additional charters
  6. 6. Unicode one character set to rule them all
  7. 7. Unicode Latin-1 ASCII
  8. 8. Code point “atom” of text
  9. 9. Part of Unicode ♕ ☠ ☃ ☙
  10. 10. NOT Part of Unicode
  11. 11. Integer (abstract) SmallInteger (-1073741824 1073741823) LargeInteger (∞ - SmallInteger) ranges, no endianness!
  12. 12. String (abstract) ByteString (ISO-8859-1) WideString (Unicode - ISO-8859-1) character set, not encoding!
  13. 13. #leadingChar • mixes abstraction layers • language • presentation
  14. 14. Algorithms • know Unicode • know all the rules • know all the code points • know all the locales
  15. 15. Transformations Fußball FUSSBALL locale dependent!
  16. 16. Collation (ordering) • ABC...RSTUVWXYZ • ÄB...NOÖ...SßTUÜV...YZ • ABC...RSTUVWXYZÅÄÖ locale dependent!
  17. 17. Normalization (what does #= really mean?) • ¨ + a = ä • there are different ones to chose from
  18. 18. PHP 6 will do all of this
  19. 19. Encodings mappings from one space to an other (isomorphisms)
  20. 20. 1:1 •ASCII •ISO-8859-1
  21. 21. ASCII •7 bit •8 bit
  22. 22. ISO-8859-1 16rFC
  23. 23. UTF-32 16rFC 16r00 16r00 16r00 (LE) 16r00 16r00 16r00 16rFC (BE)
  24. 24. UTF-16 16rFC 16r00 (LE) 16r00 16rFC (BE)
  25. 25. UTF-8 16rC3 16rBC
  26. 26. WAKom 1:1 direct mapping from bytes to Charaters
  27. 27. WAKomEncoded* use it with utf-8!
  28. 28. Content-Type: text/html;charset=utf-8 <meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
  29. 29. 2.8: WASession >> #charSet 2.9: /seaside/config
  30. 30. Links • UTF-8 Sampler • Favourite Unicode Codepoints • On the Goodness of Unicode • Characters vs. Bytes
  31. 31. übercool

×