Successfully reported this slideshow.
Your SlideShare is downloading. ×

A test of character; ASCII silly question get a silly ANSI

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 39 Ad
Advertisement

More Related Content

Recently uploaded (20)

Advertisement

A test of character; ASCII silly question get a silly ANSI

  1. 1. A test of character ASCII silly question get a silly ANSI
  2. 2. ASCII What’s wrong with it? • Actually called US-ASCII • [a-zA-Z0-9] + … • It uses 7 bits not 8 bits • Defining 128 ‘characters’ • Not defining the other 128 • Just about covers English script • Minimum level of interoperability
  3. 3. CJK and font encoding for other languages • Upper 128 • Completely different often 2-byte encoding • Standards? Often not
  4. 4. Wingdings, Tengwar, etc., Encoding for symbols fictional languages and silly things.
  5. 5. J and L
  6. 6. Email with ‘J’ and ‘L’
  7. 7. Quotation Marks and Apostrophes • Dumb quotation marks "…" x22 (34th) • Dumb single quotation marks '…' x27 (39th) • Smart quotation marks “…” u201c…u201d • Guillemets français «…» • Grave Accent…apostrophe `…' • Grave Accent…Acute Accent `...´ x60…u00B4 • Heavy double turned quotation mark ornaments ❝…❞ u275D…u275E
  8. 8. Unicode 1,114,112 possible characters Evolving standard
  9. 9. “Face with a look of triumph” 😤 • Wait what?
  10. 10. Encoding • Multibyte • 1 byte space for 256 things • 2 bytes space for 65536 things • 3 bytes 16,777,216 • 4 bytes 4,294,967,296 • UTF encodings • UTF-16 – two bytes • UTF-32 – four bytes • Variable size • UTF-8 can encode lots of things
  11. 11. UTF-8 • 00000000 – 0111111 (just like ASCII) • 11000000,10000000 – 11011111,11111111 • U+0080 – U+07FF • 11100000,10000000,10000000 – 11010000,11111111,11111111 • U+0800 – U+FFFF • 11110000,10000000,10000000,10000000 – … • Bigger code points • Now we can encode things and not break (old stuff) {much}
  12. 12. The good stuff • Good stuff • Room for everything • CJK fits • Hieroglyphs fit • Emojis fit
  13. 13. http://www.itchyfeetcomic.com/2013/12/creative-guesswork.html (With permission, unblurred in original)
  14. 14. The Bad stuff 1 • Lots of ways to represent the same thing • Strange control characters
  15. 15. Hiding code
  16. 16. Names names names • Homoglyphs • You’ve seen www.micros0ft.com • Punycode • www.géant.org -> www.xn--gant-bpa.org • Inserting into existing software …
  17. 17. The Bad stuff 2 • Emojis fit •💩”pile of poo” •😤”face with look of triumph” • face-with-open-mouth-vomiting,Really?
  18. 18. http://www.asciimation.co.nz/index.php

×