SlideShare a Scribd company logo
1 of 16
Unicode
Character Set vs Encoding
● Character Set
A, B, C, D …
● Encoding
D = 0100100
ASCII
● Simple Encoding
A = 0100 0001
B = 0100 0010
a = 0110 0001
b = 0110 0010
Small Character Set
Code Pages
Unicode Character Set
A, B, C …
Ã, ß, Ĉ …
А, Б, В …
‫,ך‬ ऋ, ఋ ...
☺, ☻
Unicode Goals
● Universal – should be the only character set
ever needed
● Semantics – all characters must have well
defined semantics
● Originally intended to be 16 bits
Unicode Encoding
● Originally designed to be 16 bits per character
● UTF-8
● UTF-16
● UTF-32
UTF-32
Everything is 4 bytes:
0001 0110 0011 1001 0001 0110 0011 1001
0000 0000 0000 0000 0000 0000 0100 0001
UTF-8
Everything is as small as we can make it:
A = 0100 0001
à = 1100 0100 1001 1010
● 0xxx xxxx 7 bits
● 110x xxxx 10xx xxxx 11 bits
● 1110 xxxx 10xx xxxx 10xx xxxx 16 bits
● 1111 110x (10xx xxxx) x 5 31 bits
UTF-16
Everything is 2 bytes, except when it is 4 bytes.
0000 0000 0010 0100
1101 1000 0000 0001 1101 1100 0011 0111
32 > 16 > 8
The letter A:
● UTF-8: 0100 0001
● UTF-16: 0000 0000 0100 0001
● UTF-32: 0000 0000 0000 0000
0000 0000 0100 0001
32 > 8 > 16 ???
The letter ☻:
● UTF-8: 0111 0010 1001 1000
1011 1011
● UTF-16: 0010 0110 0011 1011
● UTF-32: 0000 0000 0000 0000
0010 0110 0011 1011
Sorting Unicode Characters
● 0100 0010 > 0100 0001
B > A
● 0x263B > 0x262D
☻> ☭
● Unicode Collation
● Custom Collation
Internal Encoding
● C – byte arrays
● C++ - byte
● Java – UTF-16
● .NET – UTF-16
● JavaScript – UTF-16
● Perl – byte
POM

More Related Content

Viewers also liked

Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals SamiHsDU
 
Digital Image Processing and Edge Detection
Digital Image Processing and Edge DetectionDigital Image Processing and Edge Detection
Digital Image Processing and Edge DetectionSeda Yalçın
 
Ascii and Unicode (Character Codes)
Ascii and Unicode (Character Codes)Ascii and Unicode (Character Codes)
Ascii and Unicode (Character Codes)Project Student
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageJohn Collins
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMike Long
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsHeatherRivers
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)John Collins
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identificationbigshum
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)John Collins
 
My Valentine Gift - YOU Decide
My Valentine Gift - YOU DecideMy Valentine Gift - YOU Decide
My Valentine Gift - YOU DecideSizzlynRose
 
My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3David Sommer
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)John Collins
 

Viewers also liked (20)

Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Character Sets
Character SetsCharacter Sets
Character Sets
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
 
Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals
 
Digital Image Processing and Edge Detection
Digital Image Processing and Edge DetectionDigital Image Processing and Edge Detection
Digital Image Processing and Edge Detection
 
Ascii and Unicode (Character Codes)
Ascii and Unicode (Character Codes)Ascii and Unicode (Character Codes)
Ascii and Unicode (Character Codes)
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any Language
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with Rails
 
Shrunken Head
 Shrunken Head  Shrunken Head
Shrunken Head
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)
 
Glossary
GlossaryGlossary
Glossary
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
 
My Valentine Gift - YOU Decide
My Valentine Gift - YOU DecideMy Valentine Gift - YOU Decide
My Valentine Gift - YOU Decide
 
My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3
 
Verb to be
Verb to beVerb to be
Verb to be
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
 

More from ivoelenchev

More from ivoelenchev (6)

Random
RandomRandom
Random
 
Cdns2
Cdns2Cdns2
Cdns2
 
Cdns
CdnsCdns
Cdns
 
Xml vs json
Xml vs jsonXml vs json
Xml vs json
 
Searching forsounds
Searching forsoundsSearching forsounds
Searching forsounds
 
Windows
WindowsWindows
Windows
 

Recently uploaded

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Unicode

  • 2. Character Set vs Encoding ● Character Set A, B, C, D … ● Encoding D = 0100100
  • 3. ASCII ● Simple Encoding A = 0100 0001 B = 0100 0010 a = 0110 0001 b = 0110 0010
  • 6. Unicode Character Set A, B, C … Ã, ß, Ĉ … А, Б, В … ‫,ך‬ ऋ, ఋ ... ☺, ☻
  • 7. Unicode Goals ● Universal – should be the only character set ever needed ● Semantics – all characters must have well defined semantics ● Originally intended to be 16 bits
  • 8. Unicode Encoding ● Originally designed to be 16 bits per character ● UTF-8 ● UTF-16 ● UTF-32
  • 9. UTF-32 Everything is 4 bytes: 0001 0110 0011 1001 0001 0110 0011 1001 0000 0000 0000 0000 0000 0000 0100 0001
  • 10. UTF-8 Everything is as small as we can make it: A = 0100 0001 Ã = 1100 0100 1001 1010 ● 0xxx xxxx 7 bits ● 110x xxxx 10xx xxxx 11 bits ● 1110 xxxx 10xx xxxx 10xx xxxx 16 bits ● 1111 110x (10xx xxxx) x 5 31 bits
  • 11. UTF-16 Everything is 2 bytes, except when it is 4 bytes. 0000 0000 0010 0100 1101 1000 0000 0001 1101 1100 0011 0111
  • 12. 32 > 16 > 8 The letter A: ● UTF-8: 0100 0001 ● UTF-16: 0000 0000 0100 0001 ● UTF-32: 0000 0000 0000 0000 0000 0000 0100 0001
  • 13. 32 > 8 > 16 ??? The letter ☻: ● UTF-8: 0111 0010 1001 1000 1011 1011 ● UTF-16: 0010 0110 0011 1011 ● UTF-32: 0000 0000 0000 0000 0010 0110 0011 1011
  • 14. Sorting Unicode Characters ● 0100 0010 > 0100 0001 B > A ● 0x263B > 0x262D ☻> ☭ ● Unicode Collation ● Custom Collation
  • 15. Internal Encoding ● C – byte arrays ● C++ - byte ● Java – UTF-16 ● .NET – UTF-16 ● JavaScript – UTF-16 ● Perl – byte
  • 16. POM