Overview of character encoding

Overview of Character Encoding Duy Lam – Dec 2010

Agenda Character Encoding Unicode Encoding problem 2

Definition Character Encoding (character set - charset, character map or code page) is a system to specify: Set of codes (natural numbers or electrical pulses) that represents for characters How to persist characters (such as “hello”) onto disk as a sequence of bytes 4

Life was perfect 7 a = 01100001 ấ = ???????? ä = ???????? a ?ä

Unicode Unicode is a computing industry standard to map every known character to a number (code point) Unicode is one character set that can be encoded several different ways. Common Unicode encoding methods (Unicode Transformation Format and Universal Character Set): UTF-8 (one to four bytes): maximized compatibility with ASCII UTF-16 (UCS-2): variable-width encoding (one or two 16-bit code unit) UTF-32 (UCS-4): fixed-width encoding 8

Unicode mapping table Unicode charts 9

Application Missing understanding 11 UTF-16 encoding UTF-8 encoding UTF-8 encoding

Overview of character encoding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Overview of character encoding

Similar to Overview of character encoding (20)

Recently uploaded

Recently uploaded (20)

Overview of character encoding

Editor's Notes