SlideShare a Scribd company logo
1 of 13
Overview of Character Encoding Duy Lam – Dec 2010
Agenda Character Encoding Unicode Encoding problem 2
Character encoding
Definition Character Encoding (character set - charset, character map or code page) is a system to specify: Set of codes (natural numbers or electrical pulses) that represents for characters How to persist characters (such as “hello”) onto disk as a sequence of bytes 4
Common Encodings 5
Unicode
Life was perfect 7 a = 01100001  ấ = ???????? ä = ???????? a ?ä
Unicode Unicode is a computing industry standard to map every known character to a number (code point) Unicode is one character set that can be encoded several different ways. Common Unicode encoding methods (Unicode Transformation Format and Universal Character Set): UTF-8 (one to four bytes): maximized compatibility with ASCII UTF-16 (UCS-2): variable-width encoding (one or two 16-bit code unit) UTF-32 (UCS-4): fixed-width encoding 8
Unicode mapping table Unicode charts 9
Encoding problem
Application Missing understanding 11 UTF-16 encoding UTF-8 encoding UTF-8 encoding
Demo
End

More Related Content

What's hot

Chapter 1 1
Chapter 1 1Chapter 1 1
Chapter 1 1bolovv
 
ASCII data formats
ASCII data formatsASCII data formats
ASCII data formatsikram niaz
 
4 character encoding-ascii
4 character encoding-ascii4 character encoding-ascii
4 character encoding-asciiirdginfo
 
Lecture6 text
Lecture6   textLecture6   text
Lecture6 textJay Patel
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol tablenadarmispapaulraj
 
Regular expression
Regular expressionRegular expression
Regular expressionRajon
 
Introduction to Internet
Introduction to InternetIntroduction to Internet
Introduction to InternetMiguel Rebollo
 
Input devices in computer graphics
Input devices in computer graphicsInput devices in computer graphics
Input devices in computer graphicsAnu Garg
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler designKuppusamy P
 
Installation of Windows & Linux operating system
Installation of Windows & Linux operating systemInstallation of Windows & Linux operating system
Installation of Windows & Linux operating systemSathishnkl561998
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesHarshita Ved
 

What's hot (20)

Chapter 1 1
Chapter 1 1Chapter 1 1
Chapter 1 1
 
ASCII data formats
ASCII data formatsASCII data formats
ASCII data formats
 
4 character encoding-ascii
4 character encoding-ascii4 character encoding-ascii
4 character encoding-ascii
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Lecture6 text
Lecture6   textLecture6   text
Lecture6 text
 
Bootstrap
BootstrapBootstrap
Bootstrap
 
HTML: Tables and Forms
HTML: Tables and FormsHTML: Tables and Forms
HTML: Tables and Forms
 
File system structure
File system structureFile system structure
File system structure
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
5. Frames & Forms.pdf
5. Frames & Forms.pdf5. Frames & Forms.pdf
5. Frames & Forms.pdf
 
Shadows Effects in CSS
Shadows Effects in CSSShadows Effects in CSS
Shadows Effects in CSS
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol table
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Ascii
AsciiAscii
Ascii
 
Introduction to Internet
Introduction to InternetIntroduction to Internet
Introduction to Internet
 
Input devices in computer graphics
Input devices in computer graphicsInput devices in computer graphics
Input devices in computer graphics
 
Video display devices
Video display devicesVideo display devices
Video display devices
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
 
Installation of Windows & Linux operating system
Installation of Windows & Linux operating systemInstallation of Windows & Linux operating system
Installation of Windows & Linux operating system
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & Databases
 

Viewers also liked

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Duy Lâm
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptDuy Lâm
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web ServicesDuy Lâm
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6Duy Lâm
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureDuy Lâm
 

Viewers also liked (6)

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
 
Mocha
Mocha Mocha
Mocha
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScript
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architecture
 

Similar to Overview of character encoding

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
UTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character EncodingUTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character EncodingBert Pattyn
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlYamagata Yoriyuki
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in ComputersCBAKhan
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Dimelo R&D Team
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems OrganizationLiEdo
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character SystemWebsecurify
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...InSync2011
 

Similar to Overview of character encoding (20)

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
UTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character EncodingUTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character Encoding
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCaml
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in Computers
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Character Sets
Character SetsCharacter Sets
Character Sets
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
Journey of Bsdconv
Journey of BsdconvJourney of Bsdconv
Journey of Bsdconv
 
Unicode basics in python
Unicode basics in pythonUnicode basics in python
Unicode basics in python
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Io
IoIo
Io
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems Organization
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character System
 
Unit iv
Unit ivUnit iv
Unit iv
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

Overview of character encoding

Editor's Notes

  1. ASCII encoding, which specifies how to store English characters in a single byte each (taking up the space in 0-127, leaving 128-255 empty)Microsoft code page is used in pre-Windows NT systems (Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. NT was the first fully 32-bit version of Windows, (Windows 3.1x and Windows 9x, were 16-bit/32-bit hybrids). Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Home Server, Windows Server 2008 and Windows 7 are based on Windows NT, although they are not branded as Windows NT)Reference:http://www.nadcomm.com/fiveunit/fiveunits.htm
  2. In the beginning of computer age, ASCII covers everything you would find on an English keyboard: letters in upper and lower case, numbers, and some common symbols. There was even some room left in the 128 character ASCII mapping for some control character sequences. But the entire world can't quite get by on just these characters. Need a encoding system to help encode characters in languages There are many characters in different existing Chinese, Japanese and Korean (CJK) character sets actually represent the same character. Need an effort to identify them
  3. A character has to be stored in computer as some number. Unicode tries to unify characters from different encodings that represent the same character. For instance, the A in ASCII, the A in ISO-8859-1, and the A in the Japanese encoding SHIFT-JIS all map to the same Unicode character.A character set and a character encoding aren't necessarily the same thing. Unicode is one character set, and has multiple character encodingsThe UTF-8 is most efficient for Strings containing mostly ASCII characters (inWestern countries). UTF-8 and UTF-16 are approximately equivalent for Strings containing mostly characters outside ASCII but inside the the BMP (characters for almost all modern languages, and a large number of special characters). For Strings containing mostly characters outside the BMP, UTF-8, UTF-16, and UTF-32 are approximately equivalent.
  4. Go to Start Menu > Accessories > System Tools > Character MapsUse this tool show characters and their number in Unicode charts