Unicode

538 views
456 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
538
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • ANKIT & SUSHEEL
  • Unicode

    1. 1. UNICODE TRANSFORMATIONFORMATBy ANKIT SHARMA Page 1
    2. 2. INTRODUCTION• Computers at their most basic level just deal with numbers. They store letters, numerals and other characters by assigning a number for each one.• �In the pre-Unicode environment, we had single 8-bit characters sets, which limited us to 256 characters max. No single encoding could contain enough characters to cover all the languages.• �so hundreds of different encoding systems were developed for assigning numbers to characters. Page 2
    3. 3. Cnt…• As a result, these coding systems conflict with each other. That is, two encodings can use the same number for two different characters or different numbers for the same character.• �Any given computer needs to support many different encodings.• �yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Page 3
    4. 4. examples of character encoding systems• examples of character encoding systems• Morse code,• Baudot code,• the American Standard Code for Information Interchange (ASCII)• Unicode. Page 4
    5. 5. WHAT IS UNICODE ? Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages. Page 5
    6. 6. From ASCII to Unicode• �Most character sets and encodings in 70s/80s were modifications or extensions of ASCII• �Most common encodings now a days use single byte per character (SBCS)• �They are all limited to 256 characters• �Due to that, none of them can even cover the letters for the Western European languages Page 6
    7. 7. Where is Unicode Used ?• �The Unicode standards has been adopted by many software and hardware vendors.• �Most OSs support Unicode.• �Unicode is required for international document and data interchange, the Internet and the WWW, and therefore by modern standards such as:• �Java, C#, Perl, Python• �Markup languages such as XML, HTML, XHTML,• �JavaScript, LDAP, CORBA etc. Page 7
    8. 8. UTF-8• �UTF-8 is the 8-bit encoding of Unicode• �It’s a variable-width encoding and also a strict superset of ASCII.• �“Strict superset” means that every character in ASCII is available in UTF-8 with the same corresponding code point value• �1 character = 1byte to 4 bytes in the encoding• �Characters from European scripts: either 1or 2 bytes• �Asian scripts: 3 or 4 bytes Page 8
    9. 9. • �UTF-8 used for UNIX-platforms, HTML and most Internet Browsers• �Main benefits of UTF-8• �compact storage requirements for European scripts• �In general European scripts will occupy less storage on disk and memory• �Ease of migration –since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly. Page 9
    10. 10. UTF-16• �UTF-16 is the 16-bit encoding of Unicode• Basically an extension of UCS-2• �One Unicode character can be 2 or 4 bytes in• �the encoding Characters from European and most Asian scripts are represented in 2 bytes• �Supplementary characters are represented in 4 bytes• �UTF-16 is the main Unicode encoding from Windows 2K Page 10
    11. 11. • �Main benefits of UTF-16:• �More compact storage requirements for Asian scripts (2 bytes for commonly used characters)• �Ideal if European and Asian scripts are used together• �UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) Balance of efficient access to characters and economical use of storage. Page 11
    12. 12. UTF-32• �32-Bit encoding• �Popular when memory space is no concern• �Fixed width (4Byte) Page 12
    13. 13. Unicode @ the Library• �» Display all scripts and characters• �» Record data in all languages• �» Exchange bibliographic data• �» Search in all languages … Page 13
    14. 14. THANK YOU Page 14

    ×