Demystifying Unicode - Longhorn PHP 2021

Lead Web Developer for Unleashed Technologies at Unleashed Technologies
Oct. 16, 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
Demystifying Unicode - Longhorn PHP 2021
1 of 116

More Related Content

What's hot

Додаток 22Додаток 22
Додаток 22ymcmb_ua
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUGDuyhai Doan
wreewrerwreewrer
wreewrerJohnHotyn
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!Blanca Mancilla
UGC NET COMPUTER SCIENCE JUNE 2009 PAPER-IIUGC NET COMPUTER SCIENCE JUNE 2009 PAPER-II
UGC NET COMPUTER SCIENCE JUNE 2009 PAPER-IIVIJAY TRIPATHI (DSS Basti)
UGC Net June 2009 Paper 1 Solved ,  Paper 1, Research and Teaching Aptitude, ...UGC Net June 2009 Paper 1 Solved ,  Paper 1, Research and Teaching Aptitude, ...
UGC Net June 2009 Paper 1 Solved , Paper 1, Research and Teaching Aptitude, ...mcrashidkhan

Similar to Demystifying Unicode - Longhorn PHP 2021

32 weight loss tips for men – how men can lose weight32 weight loss tips for men – how men can lose weight
32 weight loss tips for men – how men can lose weightJohnEpps6
Sahih boukhary  1Sahih boukhary  1
Sahih boukhary 1OURAHOU Mohamed
Evolution towards the Internet of EverythingEvolution towards the Internet of Everything
Evolution towards the Internet of EverythingTim Winchcomb
Secretaria de Saúde de Pernambuco negou antecipação de vacina aos rodoviáriosSecretaria de Saúde de Pernambuco negou antecipação de vacina aos rodoviários
Secretaria de Saúde de Pernambuco negou antecipação de vacina aos rodoviáriosJornal do Commercio
Planning v2Planning v2
Planning v2اینترلن
Math Workbook Grade 1 Module.pdfMath Workbook Grade 1 Module.pdf
Math Workbook Grade 1 Module.pdfPuzzleCreator

More from Colin O'Dell

Releasing High Quality Packages - Longhorn PHP 2021Releasing High Quality Packages - Longhorn PHP 2021
Releasing High Quality Packages - Longhorn PHP 2021Colin O'Dell
Releasing High Quality PHP Packages - ConFoo Montreal 2019Releasing High Quality PHP Packages - ConFoo Montreal 2019
Releasing High Quality PHP Packages - ConFoo Montreal 2019Colin O'Dell
Debugging Effectively - ConFoo Montreal 2019Debugging Effectively - ConFoo Montreal 2019
Debugging Effectively - ConFoo Montreal 2019Colin O'Dell
Automating Deployments with Deployer - php[world] 2018Automating Deployments with Deployer - php[world] 2018
Automating Deployments with Deployer - php[world] 2018Colin O'Dell
Releasing High-Quality Packages - php[world] 2018Releasing High-Quality Packages - php[world] 2018
Releasing High-Quality Packages - php[world] 2018Colin O'Dell
Debugging Effectively - DrupalCon Nashville 2018Debugging Effectively - DrupalCon Nashville 2018
Debugging Effectively - DrupalCon Nashville 2018Colin O'Dell

Recently uploaded

KaseSync: Revolutionizing Support Experiences With Community-CRM IntegrationKaseSync: Revolutionizing Support Experiences With Community-CRM Integration
KaseSync: Revolutionizing Support Experiences With Community-CRM IntegrationGrazitti Interactive
Winter 24 Highlights.pdfWinter 24 Highlights.pdf
Winter 24 Highlights.pdfPatrickYANG48
Document WhatsApp MessagingDocument WhatsApp Messaging
Document WhatsApp MessagingGeminate Consultancy Services
Salesforce @AXA.pdfSalesforce @AXA.pdf
Salesforce @AXA.pdfPatrickYANG48
A sighting of sequence function in Practical FP in ScalaA sighting of sequence function in Practical FP in Scala
A sighting of sequence function in Practical FP in ScalaPhilip Schwarz
Road to NODES 2023: Graphing Relational DatabasesRoad to NODES 2023: Graphing Relational Databases
Road to NODES 2023: Graphing Relational DatabasesNeo4j

Demystifying Unicode - Longhorn PHP 2021

Editor's Notes

  1. Questions as we go? Raise hand
  2. Converts characters into electrical signals
  3. Standardized in 1865
  4. Simple device Type a key, sends some numbers, same letter comes out the other side
  5. But there needs to be a standard
  6. Developed in 1960s for teleprinters (“Teletype”) and early computers 7-bit: each letter you type in gets converted into 7 bits
  7. Support for: Upper and lowercase letters Numbers Basic, common symbols More control codes (CR, LF, BS, HT, BEL) (next for examples)
  8. (how to encode/decode)
  9. Something really clever going on here Group by first two bits 4 “pages” or sections, 32 chars each
  10. Letters in alphabetical order, starting at 1 (not random)
  11. Even more clever - converting between upper and lowercase by changing one bit
  12. “Extended ASCII” sounds like a standard, but it’s not
  13. AKA Latin 1 for the Americas, Western Europe, Oceania, and much of Africa
  14. Superset/extension of ISO 8859-1 Adds curly quotation marks De-facto standard for Windows
  15. Aka Latin 2 for Central or Eastern European Languages
  16. UI graphics, science, and math Standard EGA VGA encoding on gfx cards
  17. That’s a lot! However,
  18. In practice, most users only used one standard locally. Which was fine...
  19. Standards proliferation
  20. (Problem) You could add more bits, but that wasted computing resources (which were scarce at the time) for users who only needed Latin or ASCII-like characters
  21. ATTN: 4 vs 5 char convention
  22. Support for 1,114,112 codepoints (0x000000 - 0x10FFFF) Code Planes: Continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh) Codespace: entire range of numerical values available for encoding characters
  23. Code Planes: Continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh) Codespace: entire range of numerical values available for encoding characters Support for 1,114,112 codepoints (0x000000 - 0x10FFFF)
  24. Unicode does not specify how the character / code point should be displayed (or encoded)!
  25. Unicode does not specify how the character / code point should be displayed (or encoded)!
  26. Combining Diacritical Marks
  27. In this example: 5 code points but 4 graphemes GRAPHEME = smallest unit of a writing system Think about putting cursor in this text and selecting something or pressing backspace
  28. “Zalgo text” or “glitch text”
  29. Combining Diacritical Marks
  30. Combining Diacritical Marks
  31. Combining Diacritical Marks
  32. Combining Diacritical Marks
  33. Combining Diacritical Marks
  34. Windows supports 52,000 family combinations
  35. Windows supports 52,000 family combinations
  36. If system lacks dedicated image, individual emojis are shown
  37. Combining Diacritical Marks
  38. Pros: Code points always use some number of bytes; very straight-forward Cons: not very memory efficient, can contain null bytes, not self-synchronizing
  39. BMP = basically everything except emojis and historical scripts
  40. “Surrogate pairs”; values are reserved, no code points with those values
  41. Pros: more memory efficient (most of the time), works well for BMP; is self-synchronizing Cons: 4-byte encoding logic somewhat messy; can contain null bytes
  42. This symbol can be encoded 4 different ways
  43. Intl normalizer class
  44. In UTF-8: 3 bytes for snowman, 1 for space, 1 for each letter c a f e, and 1 for diacritical combining acute accent mark
  45. Now for some fun tricks