SlideShare a Scribd company logo
1 of 12
XML FOR DUMMIES Book author: Lucinda Dykes and Ed Tittel Slides Prepared by Cong Tan Part 2 : XML and The Web Chapter 6: Adding Character(s) to XML.
Contents About Character Encodings. Introducing Unicode. Character Sets, Fonts, Scripts, and Glyphs. For Each Character, a Code. Key Character Sets. Using Unicode Character s. Finding Character Entity Information.
1. About Character Encodings.   Clearly, the trend is toward longer bit strings to encode character data, so size does matter when representing character data. Here’s why:  A 7-bit string can represent  a maximum of  27  , or 128, different characters…  An  8-bit string can represent a maximum of 28 , or 256, different characters, including everything a 7-bit encoding can handle, and leaves room for  what some experts call higher-order characters.  A 16-bit string can represent a maximum of 216 , or 56.536, different characters.  Some modern computers still use 8-bit encodings to represent most character data.  Windows NT, Window 2000, and Window XP, however, use 16-bit encoding  for internal representations  of text and most global solutions use 16-bit encoding to support all possible languages and characters.
2. Introducing Unicode. Today, Unicode defines just over 96.000 different character codes.  The default, character set used to encode all HTML document on the Web.  Many people —including numerous XML experts —refer to the XML character set as “Unicode”.  Note that XML 1.0, 2nd Edition references Unicode 2.0 and 3.0, and XML 1.1 references Unicode 4.0, whereas the 1st Edition of XML 1.0 references only Unicode 2.0…  For more information about Unicode characters, symbols, history, and the current standard, you can find a plethora of information at the Unicode consortium’s Web site at www.unicode.org.
3. Character Sets, Fonts, Scripts, and Glyphs. To see what’s  in XML scripts  that 7-or 8-bit character encodings can’t cover —which means special symbols or non-Roman alphabets —you’ll need a few extra local ingredients:  A character set that matches the script you’re trying to read and display.  Software that understands the character set  for the script.  An electronic font that allows the character set to be displayed on screen.  All these ingredients are necessary to work with alternate character sets.  Character sets represent a mapping from a script to a set of corresponding numeric character codes.  Fonts represent a collection of glyphs for the numeric  character codes in a character set.  Finally, to create text to match the alphabets used in a script, you need an input tool —such as a text or XML editor —that can work with the character set and its corresponding font.
4. For Each Character, a Code. In the Unicode/ISO 10646 character set,  individual characters correspond to specific 16-bit numbers.  Numeric entities take one of two forms, decimal or hexadecimal. For example:  Each numeric entity in XML has an associated text encoding.  If some specific encoding is not defined in a numeric entity’s definition, the default is an encoding called UTF-8, which stands for Unicode Transformation Format, 8-bit form.  UTF and UCS are mechanisms for implementing Unicode.   UTF versions include UTF-32, UTF-16,UTF-8,UTF-EBCDIC, and UTF-7  UCS versions include UCS-4 and UCS-2.  UTF-16 used mainly for internal processing.  က <!-- &# indicates a decimal number --> ༀ <!-- &#x indicates a hexadecimal number-->
5. Key Character Sets. Most computers today use some variant of the  ASCII, an 8-bit character set that  handles the basic Roman alphabet used for English, along with punctuation, numbers, and simple symbols.  Most European languages match standard ASCII values from 0 to 127 and go on from there to define alternate mappings between character codes and local script characters for values from 128 to 255. Non-Roman alphabets, such as Hebrew, Japanese, and Thai, depend on special character sets that include basic ASCII(0-127, or 0-255) .  A listing of character sets built around the ASCII framework appears in Table 6-1.
Table 6-1 shows that most character sets can render English and German, plus  a collection of other. When choosing a variant of ISO-8859, remember that all the languages you want to include must use Unicode. XML goes beyond such idiosyncratic or customized character sets and uses Unicode.
6. Using Unicode Characters.	   So do many modern word processors —for instance, Word 97, and later versions support a format called encoded text that uses Unicode encoding.  If you  don’t have already access  to such tools and want to save XML file in Unicode format, you must use a conversion tool.  Several different tools , both freeware and commercial products, are available, depending on your OS.  Widely used tools such as Netscape Navigator(version 4.1 or newer) and IE(version 5 or newer) can handle most ISO-8859 variants.  If you want to use  an alternate character encoding, you must identify that encoding in your XML document’s prolog as follows:   Note that XML parsers are required to support only UTF-8 and UTF-16  encodings, so the encoding attribute in an XML document prolog might not work with all such tools. <?xml version=”1.0” encoding=”ISO-8859-9”?>
7. Finding Character Entity Information.  Resource :  The Unicode Standard, version 4.0 or you can  also find plenty of encoding information online, for example: www.unicode.org/ucd/  You’ll also find the XHTML entity lists useful  in this context:  Latin-1: www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent.  Special: www.w3.org/TR/xhtml1/DTD/xhtml-special.ent  Symbols: www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
THE END

More Related Content

What's hot

What's hot (16)

Io
IoIo
Io
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Your Guide to be a Software Engineer
Your Guide to be a Software EngineerYour Guide to be a Software Engineer
Your Guide to be a Software Engineer
 
Strings and encodings
Strings and encodingsStrings and encodings
Strings and encodings
 
Adam Goucher I18n And L10n
Adam Goucher   I18n And L10nAdam Goucher   I18n And L10n
Adam Goucher I18n And L10n
 
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
 
Character Encoding issue with PHP
Character Encoding issue with PHPCharacter Encoding issue with PHP
Character Encoding issue with PHP
 
Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
SignWriting in Unicode Next
SignWriting in Unicode NextSignWriting in Unicode Next
SignWriting in Unicode Next
 
Python
PythonPython
Python
 
Introduction to W3C I18N Best Practices
Introduction to W3C I18N Best PracticesIntroduction to W3C I18N Best Practices
Introduction to W3C I18N Best Practices
 
Unicode and kurdish fonts
Unicode and kurdish fontsUnicode and kurdish fonts
Unicode and kurdish fonts
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 

Viewers also liked

Xilokastro
XilokastroXilokastro
Xilokastro
Adonios
 
Xi coneia pucallpa 2010
Xi coneia pucallpa 2010Xi coneia pucallpa 2010
Xi coneia pucallpa 2010
UNFV
 
Xener krp 100601_1
Xener krp 100601_1Xener krp 100601_1
Xener krp 100601_1
xenersystems
 
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
elarcoestandar
 
Xerox annual reports 2002
Xerox annual reports  2002Xerox annual reports  2002
Xerox annual reports 2002
finance15
 
Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013
ayanda hlope
 
Xii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De EducadoresXii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De Educadores
Anaclara Dalla Valle
 
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
Roberto Terzi
 

Viewers also liked (20)

X Laran Ax
X Laran AxX Laran Ax
X Laran Ax
 
Xerradamotivacional
XerradamotivacionalXerradamotivacional
Xerradamotivacional
 
Xing Sardegna newsletter gennaio 2010
Xing Sardegna newsletter gennaio 2010Xing Sardegna newsletter gennaio 2010
Xing Sardegna newsletter gennaio 2010
 
Xonar2010
Xonar2010Xonar2010
Xonar2010
 
Xerrada Salou Maig 08 Definitiu
Xerrada Salou Maig 08 DefinitiuXerrada Salou Maig 08 Definitiu
Xerrada Salou Maig 08 Definitiu
 
Xilokastro
XilokastroXilokastro
Xilokastro
 
Xi coneia pucallpa 2010
Xi coneia pucallpa 2010Xi coneia pucallpa 2010
Xi coneia pucallpa 2010
 
Xornadas “Software Libre contra a crise” - Ferrol
Xornadas “Software Libre contra a crise” - FerrolXornadas “Software Libre contra a crise” - Ferrol
Xornadas “Software Libre contra a crise” - Ferrol
 
Xml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert JostenXml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert Josten
 
Xines Catala Jb
Xines Catala JbXines Catala Jb
Xines Catala Jb
 
XOOPS Securilty flow
XOOPS Securilty flowXOOPS Securilty flow
XOOPS Securilty flow
 
Xequemate 31
Xequemate 31Xequemate 31
Xequemate 31
 
Place of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena LlumiquingaPlace of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena Llumiquinga
 
Xener krp 100601_1
Xener krp 100601_1Xener krp 100601_1
Xener krp 100601_1
 
Xina Voyage(A De Mello)
Xina Voyage(A De Mello)Xina Voyage(A De Mello)
Xina Voyage(A De Mello)
 
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
 
Xerox annual reports 2002
Xerox annual reports  2002Xerox annual reports  2002
Xerox annual reports 2002
 
Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013
 
Xii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De EducadoresXii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De Educadores
 
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
 

Similar to Xml For Dummies Chapter 6 Adding Character(S) To Xml

Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
Ulf Mattsson
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
Milind Patil
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
Ulf Mattsson
 
Computers and text
Computers and textComputers and text
Computers and text
chitcharonko
 
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
ssusercf82c42
 

Similar to Xml For Dummies Chapter 6 Adding Character(S) To Xml (20)

Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Unicode Primer for the Uninitiated
Unicode Primer for the UninitiatedUnicode Primer for the Uninitiated
Unicode Primer for the Uninitiated
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Understanding Character Encodings
Understanding Character EncodingsUnderstanding Character Encodings
Understanding Character Encodings
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
What is Python Interpreter.pptx
What is Python Interpreter.pptxWhat is Python Interpreter.pptx
What is Python Interpreter.pptx
 
Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)
 
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALAEXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
 
Unicode
UnicodeUnicode
Unicode
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
 
Computers and text
Computers and textComputers and text
Computers and text
 
4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
 
chapter-2.pptx
chapter-2.pptxchapter-2.pptx
chapter-2.pptx
 
How To Build And Launch A Successful Globalized App From Day One Or All The ...
How To Build And Launch A Successful Globalized App From Day One  Or All The ...How To Build And Launch A Successful Globalized App From Day One  Or All The ...
How To Build And Launch A Successful Globalized App From Day One Or All The ...
 
Character sets and alphabets
Character sets and alphabetsCharacter sets and alphabets
Character sets and alphabets
 

More from phanleson

Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
phanleson
 

More from phanleson (20)

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Xml For Dummies Chapter 6 Adding Character(S) To Xml

  • 1. XML FOR DUMMIES Book author: Lucinda Dykes and Ed Tittel Slides Prepared by Cong Tan Part 2 : XML and The Web Chapter 6: Adding Character(s) to XML.
  • 2. Contents About Character Encodings. Introducing Unicode. Character Sets, Fonts, Scripts, and Glyphs. For Each Character, a Code. Key Character Sets. Using Unicode Character s. Finding Character Entity Information.
  • 3. 1. About Character Encodings. Clearly, the trend is toward longer bit strings to encode character data, so size does matter when representing character data. Here’s why: A 7-bit string can represent a maximum of 27 , or 128, different characters… An 8-bit string can represent a maximum of 28 , or 256, different characters, including everything a 7-bit encoding can handle, and leaves room for what some experts call higher-order characters. A 16-bit string can represent a maximum of 216 , or 56.536, different characters. Some modern computers still use 8-bit encodings to represent most character data. Windows NT, Window 2000, and Window XP, however, use 16-bit encoding for internal representations of text and most global solutions use 16-bit encoding to support all possible languages and characters.
  • 4. 2. Introducing Unicode. Today, Unicode defines just over 96.000 different character codes. The default, character set used to encode all HTML document on the Web. Many people —including numerous XML experts —refer to the XML character set as “Unicode”. Note that XML 1.0, 2nd Edition references Unicode 2.0 and 3.0, and XML 1.1 references Unicode 4.0, whereas the 1st Edition of XML 1.0 references only Unicode 2.0… For more information about Unicode characters, symbols, history, and the current standard, you can find a plethora of information at the Unicode consortium’s Web site at www.unicode.org.
  • 5. 3. Character Sets, Fonts, Scripts, and Glyphs. To see what’s in XML scripts that 7-or 8-bit character encodings can’t cover —which means special symbols or non-Roman alphabets —you’ll need a few extra local ingredients: A character set that matches the script you’re trying to read and display. Software that understands the character set for the script. An electronic font that allows the character set to be displayed on screen. All these ingredients are necessary to work with alternate character sets. Character sets represent a mapping from a script to a set of corresponding numeric character codes. Fonts represent a collection of glyphs for the numeric character codes in a character set. Finally, to create text to match the alphabets used in a script, you need an input tool —such as a text or XML editor —that can work with the character set and its corresponding font.
  • 6. 4. For Each Character, a Code. In the Unicode/ISO 10646 character set, individual characters correspond to specific 16-bit numbers. Numeric entities take one of two forms, decimal or hexadecimal. For example: Each numeric entity in XML has an associated text encoding. If some specific encoding is not defined in a numeric entity’s definition, the default is an encoding called UTF-8, which stands for Unicode Transformation Format, 8-bit form. UTF and UCS are mechanisms for implementing Unicode. UTF versions include UTF-32, UTF-16,UTF-8,UTF-EBCDIC, and UTF-7 UCS versions include UCS-4 and UCS-2. UTF-16 used mainly for internal processing. က <!-- &# indicates a decimal number --> ༀ <!-- &#x indicates a hexadecimal number-->
  • 7. 5. Key Character Sets. Most computers today use some variant of the ASCII, an 8-bit character set that handles the basic Roman alphabet used for English, along with punctuation, numbers, and simple symbols. Most European languages match standard ASCII values from 0 to 127 and go on from there to define alternate mappings between character codes and local script characters for values from 128 to 255. Non-Roman alphabets, such as Hebrew, Japanese, and Thai, depend on special character sets that include basic ASCII(0-127, or 0-255) . A listing of character sets built around the ASCII framework appears in Table 6-1.
  • 8.
  • 9. Table 6-1 shows that most character sets can render English and German, plus a collection of other. When choosing a variant of ISO-8859, remember that all the languages you want to include must use Unicode. XML goes beyond such idiosyncratic or customized character sets and uses Unicode.
  • 10. 6. Using Unicode Characters. So do many modern word processors —for instance, Word 97, and later versions support a format called encoded text that uses Unicode encoding. If you don’t have already access to such tools and want to save XML file in Unicode format, you must use a conversion tool. Several different tools , both freeware and commercial products, are available, depending on your OS. Widely used tools such as Netscape Navigator(version 4.1 or newer) and IE(version 5 or newer) can handle most ISO-8859 variants. If you want to use an alternate character encoding, you must identify that encoding in your XML document’s prolog as follows: Note that XML parsers are required to support only UTF-8 and UTF-16 encodings, so the encoding attribute in an XML document prolog might not work with all such tools. <?xml version=”1.0” encoding=”ISO-8859-9”?>
  • 11. 7. Finding Character Entity Information. Resource : The Unicode Standard, version 4.0 or you can also find plenty of encoding information online, for example: www.unicode.org/ucd/ You’ll also find the XHTML entity lists useful in this context: Latin-1: www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent. Special: www.w3.org/TR/xhtml1/DTD/xhtml-special.ent Symbols: www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent